CN108804536B

CN108804536B - Man-machine conversation and strategy generation method, equipment, system and storage medium

Info

Publication number: CN108804536B
Application number: CN201810421418.8A
Authority: CN
Inventors: 谢韬
Original assignee: Ecovacs Commercial Robotics Co Ltd
Current assignee: Ecovacs Commercial Robotics Co Ltd
Priority date: 2018-05-04
Filing date: 2018-05-04
Publication date: 2022-10-04
Anticipated expiration: 2038-05-04
Also published as: CN108804536A

Abstract

The embodiment of the application provides a method, equipment, a system and a storage medium for generating a man-machine conversation and a strategy. In the embodiment of the application, the slot filling is combined with the finite-state machine, the multi-group slot-value pairs with the dialogue significance and the dialogue states corresponding to the multi-group slot-value pairs in the dialogue scene are generated in the slot filling mode, and then the finite-state machine model is constructed based on the multi-group slot-value pairs and the corresponding dialogue states.

Description

Man-machine conversation and strategy generation method, device, system and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a system, and a storage medium for generating a human-machine interaction and a policy.

Background

With the development of artificial intelligence, a man-machine conversation system, which is a computer system capable of conducting continuous conversation with a person, has appeared. The man-machine conversation system mainly comprises five functional parts: speech recognition, language understanding, dialog management, language generation, and speech synthesis. Dialog management is the core function of a human-computer dialog system, which controls the whole dialog process between the user and the system, and determines all the actions of the system, so the design perfection of dialog management is related to the performance of the whole human-computer dialog system.

In the prior art, a relatively simple man-machine conversation system generally adopts a finite state machine to realize conversation management, namely, a finite state machine is used for representing conversation in a conversation scene and actions such as transition and action among conversation states. In combination with the characteristics of the finite state machine, when the finite state machine is used for realizing dialog management, the dialog state can be flexibly expanded, but the difficulty of constructing the finite state machine is higher along with the increase of the complexity of the dialog task, which results in that the finite state machine has relatively less application in the complex dialog task.

Disclosure of Invention

Aspects of the present application provide a method, device, system and storage medium for generating a human-machine dialog and a policy, so as to reduce implementation difficulty of a finite-state machine in a dialog scenario and improve a utilization rate of the finite-state machine in the dialog scenario.

The embodiment of the application provides a dialog management policy generation method, which comprises the following steps:

determining a plurality of semantic slots applicable to the dialogue scene and candidate slot values corresponding to the semantic slots based on semantic understanding of the dialogue scene;

combining the candidate slot values corresponding to the semantic slots to obtain a plurality of sets of slot-value pairs with conversational significance, wherein each set comprises the slot-value pairs corresponding to the semantic slots;

generating a plurality of dialog states corresponding to the plurality of sets of slot-value pairs according to the semantics represented by the sets of slot-value pairs respectively;

and constructing a finite-state machine model according to the plurality of conversation states and the plurality of sets of slot-value pairs so as to utilize a finite-state machine form to carry out conversation management on the man-machine conversation process in the conversation scene.

The embodiment of the present application further provides a man-machine conversation method, including:

acquiring man-machine conversation data in a conversation scene;

acquiring input information which can trigger a finite state machine to carry out conversation state transition from the man-machine conversation data according to the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the conversation scene;

controlling the finite state machine to jump to the next conversation state from the current conversation state according to the input information;

and outputting response data of the man-machine conversation data according to the related data of the next conversation state.

An embodiment of the present application further provides a human-computer interaction device, including: a memory and a processor;

a memory for storing a computer program;

the processor to execute the computer program to:

generating a plurality of dialog states corresponding to the plurality of sets of slot-value pairs according to the semantics of the respective expression of the plurality of sets of slot-value pairs;

Embodiments of the present application also provide a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:

determining a plurality of semantic slots applicable to a dialog scene and candidate slot values corresponding to the plurality of semantic slots based on semantic understanding of the dialog scene;

the memory for storing a computer program;

the processor to execute the computer program to:

acquiring man-machine conversation data in a conversation scene;

Embodiments of the present application also provide a computer-readable storage medium storing computer instructions, which when executed by one or more processors, cause the one or more processors to perform actions comprising:

acquiring man-machine conversation data in a conversation scene;

controlling the finite-state machine to jump to the next conversation state from the current conversation state according to the input information;

An embodiment of the present application further provides a human-computer interaction system, including: a server and a terminal device;

the terminal device is used for receiving man-machine conversation data input by a user in a conversation scene, sending the man-machine conversation data to the server, receiving response data corresponding to the man-machine conversation data returned by the server and outputting the response data to the user;

the server is used for receiving the man-machine conversation data sent by the terminal equipment, and acquiring input information capable of triggering the finite-state machine to carry out conversation state transfer from the man-machine conversation data according to the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the conversation scene; controlling the finite state machine to jump to the next conversation state from the current conversation state according to the input information; and returning response data of the man-machine conversation data to the terminal equipment according to the related data of the next conversation state.

In the embodiment of the application, the slot filling is combined with the finite-state machine, firstly, a plurality of sets of slot-value pairs with conversation significance and conversation states corresponding to the plurality of sets of slot-value pairs in a conversation scene are generated in a slot filling mode, then, a finite-state machine model is constructed based on the plurality of sets of slot-value pairs and the corresponding conversation states, in the process, the slot filling is utilized to realize the advantages of flexibility, simplicity and the like, the realization of the conversation state in the conversation scene can be simplified, the construction difficulty of the finite-state machine model can be further reduced, the conversation management can be finally carried out in the finite-state machine mode, the advantage of the finite-state machine can be fully played in various conversation scenes, and the simpler and more flexible conversation management can be realized.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic structural diagram of a human-machine interaction system according to an exemplary embodiment of the present application;

FIG. 2 is a state diagram of a finite state machine according to an exemplary embodiment of the present application;

fig. 3 is a schematic flowchart of a dialog management policy generation method according to another exemplary embodiment of the present application;

FIG. 4 is a flowchart illustrating a human-machine conversation method according to another exemplary embodiment of the present application;

fig. 5a is a schematic structural diagram of a family accompanying robot chat system corresponding to an application scenario 1 according to yet another exemplary embodiment of the present application;

FIG. 5b is a simplified schematic diagram of a human-machine dialog process applicable to various application scenarios, according to yet another exemplary embodiment of the present application;

fig. 5c is a schematic structural diagram of another family accompanying robot chat system corresponding to the application scenario 1 according to yet another exemplary embodiment of the present application;

fig. 5d is a schematic structural diagram of a banking self-service system corresponding to the application scenario 2 according to another exemplary embodiment of the present application;

fig. 5e is a schematic structural diagram of a network ticket booking system corresponding to an application scenario 3 according to yet another exemplary embodiment of the present application;

fig. 6a is a schematic structural diagram of a dialog management policy generation apparatus according to yet another exemplary embodiment of the present application;

fig. 6b is a schematic structural diagram of a human-machine interaction device according to another exemplary embodiment of the present application;

fig. 7a is a schematic structural diagram of a human-machine interaction device according to yet another exemplary embodiment of the present application;

fig. 7b is a schematic structural diagram of another human-machine interaction device according to still another exemplary embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In some embodiments of the application, slot filling is combined with a finite-state machine, firstly, a plurality of sets of slot-value pairs with conversation significance and conversation states corresponding to the plurality of sets of slot-value pairs in the conversation scene are generated in a slot filling mode, and then a finite-state machine model is constructed based on the plurality of sets of slot-value pairs and the corresponding conversation states.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic structural diagram of a man-machine interaction system according to an exemplary embodiment of the present application. As shown in fig. 1, the man-machine interaction system 10 includes: a server 10a and a terminal device 10b. The server 10a and the terminal device 10b presented in fig. 1 are only exemplary and do not limit the implementation form of the two.

In this embodiment, the server 10a and the terminal device 10b may be connected by a wired or wireless network. Alternatively, the server 10a may be communicatively connected to the terminal device 10b through a mobile network, and accordingly, the network format of the mobile network may be any one of 2G (GSM), 2.5G (GPRS), 3G (WCDMA, TD-SCDMA, CDMA2000, UTMS), 4G (LTE), 4G + (LTE +), wiMax, and the like. Alternatively, the server 10a may be communicatively connected to the terminal device 10b via bluetooth, wiFi, infrared, internet, etc.

In the present embodiment, the server 10a is mainly responsible for voice recognition, language understanding, session management, language generation, voice synthesis, and other functions during a human-computer session, and cooperates with the terminal device 10b to implement a human-computer session. One or more servers 10a may be provided. The present embodiment does not limit the implementation form of the server 10a. For example, in some alternative embodiments, the server 10a may be a conventional server, a cloud host, a virtual center, or the like. The server 10a mainly includes a processor, a hard disk, a memory, a system bus, and the like, and is similar to a general computer architecture.

In this embodiment, the terminal device 10b is an electronic device that faces a user and can perform voice interaction with the user. In some optional embodiments, the terminal device 10b may be a smart phone, a tablet computer, a personal computer, a wearable device, a smart audio device, etc. installed with various voice interaction type application software. In other alternative embodiments, the terminal device 10b may be various voice interactive self-service terminals and self-service machines, such as self-service registration/payment machines in hospitals, self-service cash dispensers in banks, automatic ticket dispensers in scenes such as subways, stations, airports, and the like. In still other application scenarios, the terminal device 10b may be some intelligent machine supporting voice interaction, for example, a family accompanying robot supporting voice interaction, a chat robot, a sweeping robot, a navigation/following robot, a robot providing ordering service, and the like.

Regardless of the physical form of the terminal device 10b, in general, the terminal device 10b typically includes at least one processing unit and at least one memory. The number of processing units and memories depends on the configuration and type of the terminal device 10b. The Memory may include volatile, such as RAM, non-volatile, such as Read-Only Memory (ROM), flash Memory, etc., or both. The memory typically stores an Operating System (OS), one or more application software, such as voice interaction software, and program data. In addition to the processing unit and the memory, some terminal devices 10b may also include a network card chip, an IO bus, an audio/video component, and other basic configurations. Optionally, depending on the implementation of the terminal device 10b, the terminal device 10b may also include some peripheral devices, such as a keyboard, a mouse, a stylus, a printer, etc. These peripheral devices are well known in the art and will not be described in detail herein.

In this embodiment, the server 10a and the terminal device 10b may be deployed in various dialog scenarios and are responsible for completing the human-machine dialog process in the corresponding scenarios. For example, the server 10a and the terminal device 10b may be deployed in a hospital scene and are responsible for implementing a man-machine conversation process in a voice self-help registration process. For another example, the server 10a and the terminal device 10b may be deployed in a station, a subway, or an airport, and are responsible for implementing a man-machine conversation process in a voice self-service ticket booking process. For another example, the server 10a and the terminal device 10b may be deployed in a banking scenario and are responsible for implementing a man-machine conversation process in a voice self-service withdrawal process.

In any conversation scenario, the general process of the human-computer conversation implemented by the server 10a and the terminal device 10b is as follows:

the user may interact with the terminal device 10b in natural language to express his or her needs or intentions to the terminal device 10b. For example, the user may input to the terminal device 10b human-machine conversation data such as "i want to withdraw 2000", "i want to order a train ticket from shanghai", "i want to hang the number of pedigree doctor", and the like. The man-machine interaction data may be voice data input by a user in a natural language manner, or may be non-voice data such as text data. The terminal device 10b receives the human-computer conversation data input by the user in the conversation scene, and transmits the human-computer conversation data to the server 10a. The server 10a receives the human-machine conversation data transmitted by the terminal device 10b, recognizes the user intention corresponding to the human-machine conversation data, gives response data corresponding to the user intention, and returns the response data to the terminal device 10b. And the terminal equipment 10b receives the response data returned by the server 10a and outputs the response data to the user, so that a round of man-machine conversation process is completed.

Alternatively, if the man-machine conversation data is voice data, the server 10a may specifically perform a series of processing such as voice recognition, language understanding, conversation management, language generation, and voice synthesis on the man-machine conversation data, and finally obtain response data corresponding to the man-machine conversation data. Among them, speech recognition (ASR) refers to a process of converting original speech data input by a user into text data. Language understanding refers to the process of converting recognized text data into a machine-understandable semantic representation. Dialog management refers to a process of determining what action should be taken and what kind of response data is given based on the dialog state, and it is simply understood that the server 10a needs to determine what meaning it should express from the semantic representation understood by the language. Language generation refers to a process of converting the meaning that the server 10a needs to express into text data. Speech synthesis refers to a process of converting text data into speech data.

It should be noted that if the human-computer interaction data input by the user is text data, speech recognition is not necessary. Alternatively, speech synthesis may not be performed. That is, speech recognition and speech synthesis are two optional operations in the human-machine dialog process.

In the above operations of speech recognition, speech understanding, dialog management, speech generation, speech synthesis, etc., the dialog management is a core function of the human-machine dialog system, which controls the whole dialog process between the user and the human-machine dialog system 10, determines all the actions of the human-machine dialog system 10, and the design perfection of the dialog management is related to the performance of the whole human-machine dialog system 10. Therefore, in this embodiment, attention is focused on the implementation process of session management. The server 10a may be implemented by various technologies with respect to speech recognition, language understanding, language generation, and speech synthesis, and the present embodiment is not limited thereto.

In the present embodiment, the server 10a implements session management by using a finite state machine, that is, represents session states in a session scene and manages behaviors such as transitions and actions between the session states. The dialog scenario in this embodiment may be relatively simple or relatively complex. If the dialog scenes are complex, for example, the number of dialog turns is large, or the dialog states are large, the implementation difficulty of constructing the finite state machine by adopting the existing method is large, and the use of the finite state machine is limited.

In order to solve the problem of difficulty in implementing a finite-state machine, in this embodiment, slot filling is combined in the finite-state machine building process, slot filling is used to generate multiple sets of slot-value pairs having conversational significance and conversational states corresponding to the multiple sets of slot-value pairs in a conversational scene, slot filling is used to implement advantages of flexibility, simplicity and the like, so that implementation of the conversational states in the conversational scene is simplified, and further, the difficulty in building the finite-state machine model can be reduced. The finite state machine construction process combined with slot filling is as follows:

first, based on semantic understanding of a dialog scene, a plurality of semantic slots (slots) applicable to the dialog scene and candidate slot values (values) corresponding to the plurality of semantic slots are determined in the form of slot filling. Semantic slots refer to expressions that parse text data into semantic representations that can be understood by a machine. The candidate slot values refer to possible values of the semantic slots, and each semantic slot may correspond to a plurality of different candidate slot values.

According to different dialogue scenes, the semantic slots and the candidate slot values corresponding to the semantic slots are different. For example, taking a "flight booking" scenario as an example, the semantic slot may include "departure city", "departure time", "destination city", etc., and the candidate slot value corresponding to the "departure city" of the semantic slot may include "beijing", "shanghai", etc., the candidate slot value corresponding to the "departure time" of the semantic slot may include "eight am", "two pm", etc., and the candidate slot value corresponding to the "destination city" of the semantic slot may include "harbin", "wuhan", "sheng", etc. For another example, taking the "withdrawal" scenario as an example, the semantic slot may include "withdrawal", "amount", "medium", etc., and the candidate slot value corresponding to the semantic slot "withdrawal" may include "null", "confirmation", "cancel", etc., the candidate slot value corresponding to the semantic slot "amount" may include "less than twenty thousand", "more than fifty thousand", etc., and the candidate slot value corresponding to the semantic slot "medium" may include "bank card", "passbook", etc.

Because a plurality of candidate slot values corresponding to each semantic slot may be provided, the candidate slot values corresponding to a plurality of semantic slots are combined, so that a plurality of sets of slot-value pairs with conversational significance can be obtained. Wherein, a semantic slot and a corresponding candidate slot value can form a slot-value pair (slot-value pair). Each group includes a respective one of a plurality of semantic slot-value pairs, and candidate slot values in slot-value pairs in different groups are not identical. To facilitate understanding of the concept of "group" in each group of slot-value pairs, semantic slots and candidate slot values in the "withdrawal" scenario are used as examples for explanation. Assuming that semantic slots in a withdrawal scene include withdrawal, sum and media, candidate slot values corresponding to the withdrawal comprise null and confirmation, candidate slot values corresponding to the sum of semantic slots include less than twenty thousand and more than twenty thousand, and candidate slot values corresponding to the media of semantic slots may include bank cards and passbooks, the candidate slot values corresponding to the semantic slots are combined to obtain a plurality of sets of slot-value pairs with conversational significance as shown in table 1 below.

TABLE 1

In table 1 above, each row corresponding to the three columns "withdraw", "amount", "medium" represents a set of slot-value pairs. As can be seen from Table 1, the candidate bin values within different groups are not exactly the same. Moreover, as can be seen from table 1, the semantic slots and the candidate slots of the semantic slot in each group of slot-value pairs can be combined to represent a definite semantic, and the semantic expressed by different groups is also different.

Based on the above, the dialog state corresponding to each group of slot-value pairs can be generated according to the semantics of the respective representations of the group of slot-value pairs, so that a plurality of dialog states are obtained, and the plurality of dialog states and the group of slot-value pairs have corresponding relations. As shown in Table 1, the last column represents the dialog state for each set of slot-value pairs. As can be seen from table 1, the dialog state is the representation of the semantics embodied by the corresponding set of slot-value pairs.

After obtaining the plurality of dialog state and the plurality of sets of slot-value pairs in the dialog scene, a finite-state machine model can be constructed according to the plurality of dialog state and the plurality of sets of slot-value pairs so as to perform dialog management on the man-machine dialog process in the dialog scene in the form of a finite-state machine. The finite-state machine model is mainly used for describing a plurality of dialogue states and information such as transition, transition conditions and corresponding actions among the dialogue states. Alternatively, the finite state machine model may be a static description document, such as, but not limited to, a configuration file of a finite state machine. Of course, the finite state machine model may also be implemented in other forms.

In the embodiment, the dialog state to be managed is generated for the finite-state machine by using the slot filling mode, and the slot filling is used for realizing the advantages of flexibility, simplicity and the like, so that the realization of the dialog state in the dialog scene is simplified, the construction difficulty of the finite-state machine model can be further reduced, the advantages of the finite-state machine can be conveniently and fully exerted in various dialog scenes, and the dialog management can be realized more simply and flexibly.

In some alternative embodiments, server 10a maps a plurality of dialog states to a plurality of state nodes in a finite state machine model; adding a bidirectional edge between any two state nodes in the plurality of state nodes; and generating a transfer condition when any two state nodes are transferred according to the difference between the two groups of slot-value pairs corresponding to any two state nodes so as to construct a finite-state machine model.

Alternatively, the finite-state machine corresponding to the finite-state machine model can be visually represented by a state diagram. Taking the "cash withdrawal" scenario shown in table 1 as an example, a state diagram corresponding to a finite state machine model is shown in fig. 2. In fig. 2, taking the case where the session state is shifted from "withdraw 2 ten thousand or less" to "withdraw 2 ten thousand or less by the bank card" as an example, the session state is shifted when "withdraw 2 ten thousand or less to (to) withdraw 2 ten thousand or less by the bank card" is satisfied.

Further, when a dialog state needs to be extended, a new dialog state and a set of slot-value pairs corresponding to the dialog state may be generated. Then, adding a new state node in the finite-state machine model, adding a bidirectional edge between the new state node and each existing state node, and generating a transfer condition when the new state node and each existing state node are transferred according to the difference between two groups of slot-value pairs corresponding to the new state node and each existing state node, thereby realizing the extension of the conversation state.

Referring to fig. 2, in a conversation state of "the bank card withdraws less than 2 ten thousand", if the response data given by the server 10a for the previous conversation is "please go to the ATM for self-service withdrawal", at this time, the user says "can go to the counter and do so", which is a conversation state that does not exist in fig. 2. If the slot filling mode is adopted for conversation management, a certain semantic slot is difficult to define, but the finite-state machine is adopted for conversation management in the embodiment of the application, so that a 'counter-handling' conversation state can be added in the finite-state machine model, and the triggering condition and the corresponding language material can be defined adaptively. Therefore, the dialog state can be flexibly expanded based on the finite-state machine, and only the finite-state machine model needs to be updated when the finite-state machine is used.

Further, after obtaining the finite-state machine model, based on the finite-state machine model, the server 10a may perform session management on each human-machine session process in the session scene in the form of a finite-state machine. The server 10a needs to perform language understanding on human-machine conversation data in the human-machine conversation process to convert into a semantic representation recognizable by a machine before performing conversation management on the human-machine conversation process in the conversation scene in the form of a finite state machine. Language understanding generally relies on a corpus in a conversational scenario.

In order to facilitate the server 10a to correctly perform language understanding on the human-computer conversation data, the present embodiment further combines the slot filling form to form a corpus in the conversation scene, so as to perform language understanding on the human-computer conversation data summarized in the human-computer conversation process, and provide the required input information for the finite-state machine.

In this embodiment, a corpus is constructed based on the corpus corresponding to each slot-value pair in the dialog scene. In this embodiment, not only the slot filling corpus corresponding to each slot-value pair needs to be obtained, but also the slot canceling corpus needs to be added to each slot-value pair. The corresponding bin-value pairs of bin-filling corpora and cancellation bin corpora may form a corpus of the dialog scenario. The slot filling linguistic data meet the slot-value pair requirement and have positive significance; the language material for canceling the slot is a language material which does not accord with the requirement of the slot-value pair and has negative meaning. In the embodiment, the cancellation slot corpora are added, so that each dialog state can be mutually transferred, and a fully-connected finite-state machine model is formed.

Taking the withdrawal scenario in table 1 as an example, each slot-value pair in the dialog scenario, and the slot filling corpus and the slot canceling corpus corresponding to each slot-value pair are as shown in table 2 below:

TABLE 2

In table 2, the slot filling corpus and the slot canceling corpus are only given as examples, and those skilled in the art can understand that the slot filling corpus and the slot canceling corpus are not limited to those shown in table 2.

In the embodiment, the linguistic data in the dialog scene are managed from the dimension of the slot-value pair, rather than the transition relation between each dialog state, the management dimension of the linguistic data is relatively few, the dialog scene is simpler to answer in management and easy to realize, the dialog management is further simplified, and the realization cost of the dialog management is reduced.

Based on the above-mentioned corpus and model of finite state machine, the server 10a and the terminal device 10b can cooperate to perform session management on the human-computer session process in the form of finite state machine. The man-machine conversation process based on the finite-state machine is as follows:

the user inputs man-machine conversation data to the terminal device 10b. The terminal device 10b receives the human-computer conversation data input by the user, and transmits the human-computer conversation data to the server 10a.

The server 10a receives the human machine conversation data transmitted by the terminal device 10b. If the man-machine conversation data is voice data, the server 10a converts the man-machine conversation data from a voice type to a text type through a voice recognition technology, and then performs language understanding on the man-machine conversation data of the text type based on a corpus in a conversation scene, namely, a filling-slot corpus and a canceling-slot corpus corresponding to each slot-value pair in the conversation scene, and obtains input information capable of triggering a finite state machine to perform conversation state transfer. If the man-machine conversation data is text data, the server 10a may directly perform language understanding on the man-machine conversation data based on the corpus in the conversation scene, that is, the slot-value pair corresponding to the slot-filling corpus and the slot-canceling corpus in the conversation scene, and obtain input information from the corpus, which may trigger the finite-state machine to perform conversation state transition.

For example, in the case of a dialog state of "withdraw 2 or less", if the user says "three hundred/thousand/\8230", "etc., in conjunction with the state diagram shown in fig. 2 and the slot filling and canceling language material shown in table 2, the server 10a recognizes that the" medium "is the" bank card ", and obtains input information that can trigger the finite state machine to transition from the dialog state of" withdraw 2 or less "to the dialog state of" withdraw 2 or less "on the bank card. If the user says "amount of money wrong"/is wrong for 8230in the dialog state of "withdraw 2 ten thousand or less", the server 10a recognizes that "amount of money" is reset and obtains input information that can trigger the finite state machine to transition from the dialog state of "withdraw 2 ten thousand or less" to the dialog state of "withdraw". The language understanding process can be realized by adopting methods such as keywords, regular expressions, classifiers and the like.

In some alternative embodiments, language understanding of human-computer dialog data is facilitated to be more rapid and convenient based on a corpus in a dialog scenario. The language understanding model can be trained in advance according to the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the corpus, and the language understanding model is used for acquiring input information required by the finite-state machine from man-machine conversation data in the man-machine conversation process.

In one embodiment, the first language understanding model may be trained according to the slot filling corpora and the slot canceling corpora corresponding to each slot-value pair in the corpus. The first language understanding model is used for extracting the slot-value pairs contained in the man-machine conversation data, and the extracted slot-value pairs can be used as the input information. Accordingly, the server 10a may run the first language understanding model based on the human machine dialog data, thereby obtaining the slot-value pairs contained in the human machine dialog data as input information required by the finite state machine. In this embodiment, the finite state machine needs to convert the input information into recognizable transition conditions.

In another embodiment, the second language understanding model may be trained according to the corresponding relations between the filling and canceling slot corpora and the plurality of sets of slot-value pairs and the plurality of dialog states in the corpus. The second language understanding model is used for acquiring a transition condition from the man-machine interaction data as input information required by the finite state machine. For example, the corresponding relationship between the slot filling corpus and the cancellation slot corpus corresponding to each slot-value pair and the transition condition in the finite-state machine can be established according to the corresponding relationship between the multiple sets of slot-value pairs and multiple dialog states, and model training is performed according to the corresponding relationship, so that a second language understanding model capable of directly obtaining the transition condition required by the finite-state machine from the man-machine dialog data can be obtained. Accordingly, the server 10a can run the second language understanding model based on the man-machine interaction data so that the transition condition in the finite state machine is used as input information required for the finite state machine. In this alternative embodiment, the finite state machine may be directly recognized without the need to make a transition to the input information.

After obtaining the input information, the server 10a may control the finite state machine to jump from the current session state to the next session state according to the input information, determine response data corresponding to the man-machine session data according to data related to the next session state, and send the response data to the terminal device 10b.

Optionally, the relevant data of the next dialog state may include state description of the next dialog state, corresponding action, and the like. These relevant data may express what kind of response is to be made by the server 10a, and then response data corresponding to the man-machine conversation data may be determined based on these relevant data. Alternatively, the server 10a may obtain the response data from the corpus, or may automatically generate the response data.

The terminal device 10b can receive the response data returned from the server 10a and output the response data to the user. Alternatively, the terminal device 10b may play the response data to the user by voice, or may display the response data to the user through the display screen.

In the embodiment, the slot filling is combined with the finite-state machine, so that the difficulty in constructing a finite-state machine model is reduced, the dialog management can be performed in the form of the finite-state machine, the advantage of the finite-state machine in various dialog scenes can be fully exerted, and the simpler and more flexible dialog management can be realized.

The embodiment of the application provides some method embodiments besides the man-machine conversation system. The method embodiments respectively describe a generation process of a finite state machine model and a man-machine conversation process based on a finite state machine.

Fig. 3 is a flowchart illustrating a dialog management policy generation method according to another exemplary embodiment of the present application. As shown in fig. 3, the method includes:

301. based on semantic understanding of a dialog scene, a plurality of semantic slots applicable to the dialog scene and candidate slot values corresponding to the plurality of semantic slots are determined.

302. And combining the candidate slot values corresponding to the semantic slots to obtain multiple sets of slot-value pairs with conversational significance, wherein each set comprises the slot-value pairs corresponding to the semantic slots.

303. And generating a plurality of dialog states corresponding to the plurality of sets of slot-value pairs according to the semantics expressed by the plurality of sets of slot-value pairs respectively.

304. And constructing a finite-state machine model according to the plurality of conversation states and the plurality of sets of slot-value pairs so as to utilize a finite-state machine form to carry out conversation management on the man-machine conversation process in the conversation scene.

For a detailed description of steps 301-304, reference may be made to the description of the above system embodiments.

In the embodiment, slot filling is combined in the construction process of the finite-state machine, multiple sets of slot-value pairs with conversational significance and conversational states corresponding to the multiple sets of slot-value pairs in a conversational scene are generated by using the slot filling, the realization of the conversational states in the conversational scene is simplified by using the slot filling to realize the advantages of flexibility, simplicity and the like, the construction difficulty of the finite-state machine model can be further reduced, the advantages of the finite-state machine in various conversational scenes can be fully exerted conveniently, and more simple and flexible conversational management can be realized.

In some alternative embodiments, one implementation of step 304 above includes: mapping the plurality of dialog states to a plurality of state nodes in a finite state machine model; adding a bidirectional edge between any two state nodes in the plurality of state nodes; and generating a transfer condition when any two state nodes are transferred according to the difference between the two groups of slot-value pairs corresponding to any two state nodes so as to construct a finite-state machine model.

In some alternative embodiments, after the finite state machine model is constructed, if the dialog state needs to be expanded, a new dialog state and a set of slot-value pairs corresponding to the dialog state may be generated; then, adding a new state node in the finite-state machine model, adding a bidirectional edge between the new state node and each existing state node, and generating a transfer condition when the new state node and each existing state node are transferred according to the difference between two groups of slot-value pairs corresponding to the new state node and each existing state node, thereby realizing the extension of the conversation state. Therefore, the dialog state can be flexibly expanded based on the finite-state machine, and only the finite-state machine model needs to be updated when the finite-state machine is used.

Further, after obtaining the finite-state machine model, based on the finite-state machine model, the dialog management of each man-machine dialog process in the dialog scene can be performed in the form of a finite-state machine. Before carrying out dialog management on the man-machine dialog process in the dialog scene in the form of a finite-state machine, language understanding needs to be carried out on man-machine dialog data in the man-machine dialog process so as to convert the man-machine dialog data into semantic representations which can be recognized by a machine. Language understanding generally relies on a corpus in a conversational scenario. Based on the above, the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the dialog scene can be obtained to form a corpus; and training a language understanding model according to the filling slot linguistic data and the canceling slot linguistic data corresponding to each slot-value pair in the corpus, wherein the language understanding model is used for acquiring input information required by the finite state machine from man-machine dialogue data.

In one embodiment, the first language understanding model may be trained based on the corresponding bin-filling corpus and the cancellation bin corpus for each bin-value pair in the corpus. The first language understanding model is used for extracting the slot-value pairs contained in the man-machine conversation data, and the extracted slot-value pairs can be used as the input information. In this embodiment, the finite state machine needs to convert the input information into recognizable transition conditions.

In another embodiment, the second language understanding model may be trained according to the corresponding relations between the filling and canceling slot corpora and the plurality of sets of slot-value pairs and the plurality of dialog states in the corpus. The second language understanding model is used for acquiring a transition condition from the man-machine interaction data as input information required by the finite state machine. For example, according to the corresponding relationship between multiple sets of slot-value pairs and multiple dialog states, the corresponding relationship between the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair and the transition conditions in the finite state machine can be established, and model training is performed according to the corresponding relationship, so that a second language understanding model capable of directly obtaining the transition conditions needed by the finite state machine from the man-machine dialog data can be obtained.

Alternatively, after the finite state machine model is constructed by the method shown in fig. 3, a man-machine conversation can be performed based on the finite state machine by the method shown in fig. 4. It should be noted that the finite state machine in the man-machine conversation process shown in fig. 4 can be constructed in the manner shown in fig. 3, but is not limited to the manner shown in fig. 3.

Fig. 4 is a flowchart illustrating a man-machine interaction method according to another exemplary embodiment of the present application. As shown in fig. 4, the method includes:

401. and acquiring man-machine conversation data in the conversation scene.

402. And acquiring input information capable of triggering the finite-state machine to carry out conversation state transition from the man-machine conversation data according to the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the conversation scene.

403. And controlling the finite state machine to jump from the current conversation state to the next conversation state according to the input information.

404. And outputting response data of the man-machine conversation data according to the related data of the next conversation state.

In an alternative embodiment, the implementation of step 402 includes:

operating a first language understanding model according to the man-machine conversation data to obtain a slot-value pair contained in the man-machine conversation data as input information; or

Running a second language understanding model according to the man-machine conversation data to obtain a transfer condition in the finite state machine as input information;

the first language understanding model or the second language understanding model is obtained by pre-training the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the conversation scene.

In the embodiment, the dialog management is performed in the form of the finite-state machine, which is beneficial to fully playing the advantages of the finite-state machine in various dialog scenes and realizing simpler and more flexible dialog management.

It should be noted that, in some application scenarios, the logic of the method shown in fig. 3 and fig. 4 may be deployed at a server side in the human-machine conversation system shown in fig. 1, and executed by the server, but is not limited thereto. For example, as the terminal technology develops and the terminal device becomes more and more powerful, the method logic shown in fig. 3 and fig. 4 may also be deployed at the terminal device side without deploying a server, which is beneficial to simplifying the implementation architecture of the human-computer conversation system. The following describes an exemplary technical solution of the embodiment of the present application with reference to the above two deployment embodiments and some specific application scenarios.

Application scenario 1:

in a home scenario, a home accompanying robot may be configured. The home accompanying robot can attend old people or children instead of adults, and can liberate adults therefrom. The family accompanying robot can accompany the old or children to play games, read, chat, remind the old to take medicine and the like. Taking a chat scene as an example, the family accompanying robot can be used as a chat object to chat with the user according to the set chat scene. In this embodiment, a finite state machine model responsible for conversation state management in a chat scene is deployed on the family accompanying robot, and the finite state machine model is constructed by using the method in the above embodiment.

When a user needs to chat, the family accompanying robot can be awakened from a dormant state or a standby state in a voice mode, a touch mode or a physical key mode and the like, and then the family accompanying robot enters a chat process. As shown in fig. 5a, the user may say a sentence to the family accompanying robot 50a, for example, "have a new movie show in the last week". The family accompanying robot 50a receives the voice data input by the user, then performs man-machine conversation processing according to the conversation processing flow shown in fig. 5b for the "new movie show last" spoken by the user, and finally outputs an answer. The dialog processing procedure shown in fig. 5b includes: speech recognition, language understanding, dialog management, language generation and speech synthesis. Wherein, in the dialogue management section, transitions and actions between dialogue states are managed in the form of a finite state machine based on a finite state machine model constructed in advance. As shown in fig. 5b, the dialog states in the finite state machine model may include slot fill defined dialog states and post-extended dialog states.

Among them, the answer given by the family accompanying robot 50a may be recent information related to movies, such as information of recently shown movies, information of introduction, lead actor, etc. of recently shown and comparatively fired movies, information of recently shown foreign action movies, etc. Alternatively, if the user's question is out of the category of the set chat scenario or the corpus is insufficient, the home attendant robot 50a may give answers such as "unknown", "unclear", and the like.

Alternatively, in order to simplify the family accompanying robot, the conversation processing function shown in fig. 5b may be deployed to a cloud server for implementation. Based on this, another man-machine interaction system corresponding to the application scenario 1 can be obtained, as shown in fig. 5c, including: the family accompanying robot 50c and the cloud server 50d have the finite state machine model constructed by the method in the above embodiment deployed on the cloud server 50d.

In the system shown in fig. 5c, when the user needs to chat, the home accompanying robot 50c may be waken up from a sleep or standby state by voice, touch, or physical button, and then enter into a chat process. The process of accompanying the user to chat comprises the following steps: the user says a sentence, for example, "did there have been a new movie showing in the last week". The home attendant robot 50c may transmit this to the cloud server 50d. The cloud server 50d performs man-machine conversation processing for "there is a new movie showing in the last week" spoken by the user according to the conversation processing flow shown in fig. 5b, and finally obtains an answer, for example, "there is a large piece of information about a recent action showing abroad", and returns the answer to the home accompanying robot 50c, and the home accompanying robot 50c plays the answer to the user.

The slot filling is combined in the construction process of the finite-state machine, so that a conversation management scheme based on the finite-state machine can be applied to a robot accompanying scene, conversation management is carried out in the form of the finite-state machine, the advantage of the finite-state machine is fully exerted in the robot accompanying scene, more simple and flexible conversation management is realized, the robot accompanying effect is more ideal, and the user experience is further improved.

Application scenario 2:

in order to facilitate the user to transact the services such as depositing and withdrawing money and checking and inquiring, the existing banks all adopt self-service business systems, as shown in fig. 5d, the bank self-service business system includes a bank server 50e, and self-service depositing and withdrawing machines 50f and self-service inquiring machines 50g which are deployed in many places. The self-service deposit and withdrawal machine 50f and the self-service inquiry machine 50g have a man-machine conversation function, can interact with a user, further combine a conversation processing service provided by the bank server 50e, and can meet business requirements of the user for depositing and withdrawing, inquiring and the like. The bank server 50e is deployed with a finite state machine model responsible for session state management in a bank self-service scenario, which is constructed by the method in the above-described embodiment.

In the banking self-service system shown in FIG. 5d, the user can speak his/her service requirement to the self-service depositing and dispensing machine 50f or the self-service enquiring machine 50 g. For example, a user may speak a withdrawal request, such as "withdraw," to the ATM 50f. The automated teller machine 50f transmits the withdrawal request "withdrawal" of the user to the bank server 50e. The bank server 50e performs a man-machine conversation process for "withdrawal" spoken by the user according to a conversation process flow shown in fig. 5b, and obtains an answer, for example, "what amount the amount of withdrawal is asked", and returns the answer "what amount the amount of withdrawal is asked" to the self-service teller machine 50f. The self-service teller machine 50f plays the answer "what amount to ask for withdrawal of money" to the user.

The user continues to speak a withdrawal amount, e.g., "take three thousand" to the automated teller machine 50 f; the automatic teller machine 50f sends the withdrawal amount of the user "take three thousand" to the bank server 50e, and the bank server 50e continues to perform the man-machine interaction processing for "take three thousand" spoken by the user according to the interaction processing flow shown in fig. 5b, and obtains an answer, for example, "please input a withdrawal password," and returns the answer "please input a withdrawal password" to the automatic teller machine 50f. The whole withdrawal process can be executed in sequence according to a pre-constructed finite-state machine model until the withdrawal is successful or fails. In fig. 5d, the subsequent part is not shown.

The slot filling is combined in the construction process of the finite-state machine, so that the finite-state machine-based conversation management scheme can be applied to a bank self-service scene, conversation management is performed in the form of the finite-state machine, the advantage of the finite-state machine is fully exerted in the bank self-service scene, the conversation management is simpler and more flexible, the efficiency of the bank self-service is higher, the human-computer interaction is smoother, and the user experience is improved.

Application scenario 3:

with the development of internet technology, users can enjoy various services without going out. Taking the online ticket purchasing as an example, if the user needs to go on a business trip, travel or go home on a holiday to visit a family, the user can directly purchase tickets, air tickets and the like through the network, so that the time can be greatly saved.

The network ticket booking system shown in fig. 5e comprises: the system comprises a user terminal 50g and a booking server 50h of a passenger transport system; the user terminal 50g establishes a communication connection with the ticket booking server 50h through the internet. The ticket booking server 50h is deployed with a finite-state machine model responsible for conversation management in the network ticket booking scenario, and the finite-state machine model is constructed by adopting the method in the above embodiment.

When a user needs to order a ticket, opening ticket ordering software installed on a user terminal 50 g; then, the booking software is put forward its booking requirement, such as "book air ticket to Shanghai". Optionally, the user may manually input his or her own ticket booking requirement, or may speak his or her own ticket booking requirement in a voice manner. The reservation software transmits the user's reservation demand "reserve tickets to shanghai" to the reservation server 50h. The ticket booking server 50h performs a man-machine interaction process for the user's ticket booking request "booking air tickets to shanghai" according to the interaction process flow shown in fig. 5b, obtains an answer, for example, "asking for air tickets at certain points", and returns the answer "asking for air tickets at certain points" to the ticket booking software. The ticket booking software plays or displays the answer 'ask for a ticket of a certain point' to the user.

The user continues to speak to the booking software its booking time, e.g., "10 am tomorrow"; the reservation software transmits the reservation time "10 am tomorrow" requested by the user to the reservation server 50h. The ticket booking server 50h continues the man-machine conversation process for "10 am tomorrow" at the time of booking the ticket requested by the user according to the conversation process flow shown in fig. 5b, and obtains the answer, for example, "where the asking departure place is", and returns the answer "where the asking departure place is" to the ticket booking software. The ticketing software plays or displays the answer "where to ask for departure place" to the user. The whole ticket booking process can be sequentially executed according to a finite-state machine model which is constructed in advance until ticket booking is successful or fails. In fig. 5e, the subsequent part is not shown.

The slot filling is combined in the construction process of the finite-state machine, so that a session management scheme based on the finite-state machine can be applied to a network ticket buying service scene, session management is performed in the form of the finite-state machine, the advantages of the finite-state machine can be fully exerted in the network ticket buying service scene, simpler and more flexible session management is realized, the network ticket buying efficiency is higher, the human-computer interaction is smoother, and the user experience is improved.

It should be noted that the execution subjects of the steps of the methods provided in the above embodiments may be the same device, or different devices may be used as the execution subjects of the methods. For example, the execution subjects of steps 401 to 403 may be device a; for another example, the execution subject of

steps

401 and 402 may be device a, and the execution subject of step 403 may be device B; and so on.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations occurring in a specific order are included, but it should be clearly understood that these operations may be executed out of the order occurring herein or in parallel, and the sequence numbers of the operations, such as 401, 402, etc., are used merely to distinguish various operations, and the sequence numbers themselves do not represent any execution order. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

Fig. 6a is a schematic structural diagram of a session management policy generation apparatus according to yet another exemplary embodiment of the present application. As shown in fig. 6a, the apparatus comprises: a determination module 61, an acquisition module 62, a generation module 63 and a construction module 64.

A determining module 61, configured to determine, based on semantic understanding of the dialog scene, a plurality of semantic slots applicable to the dialog scene and candidate slot values corresponding to the plurality of semantic slots;

an obtaining module 62, configured to combine candidate slot values corresponding to multiple semantic slots to obtain multiple sets of slot-value pairs with conversational significance, where each set includes slot-value pairs corresponding to multiple semantic slots respectively;

a generating module 63, configured to generate a plurality of dialog states corresponding to the plurality of sets of slot-value pairs according to semantics respectively represented by the plurality of sets of slot-values;

and the building module 64 is used for building a finite-state machine model according to the plurality of conversation states and the plurality of sets of slot-value pairs so as to perform conversation management on the man-machine conversation process in the conversation scene in a finite-state machine mode.

In some alternative embodiments, construction module 64, when constructing the finite state machine model, is specifically configured to: mapping the plurality of dialog states to a plurality of state nodes in a finite state machine model; adding a bidirectional edge between any two state nodes in the plurality of state nodes; and generating a transfer condition when any two state nodes are transferred according to the difference between the two groups of slot-value pairs corresponding to any two state nodes so as to construct a finite-state machine model.

In some optional embodiments, the obtaining module 62 is further configured to: acquiring a slot filling corpus and a slot canceling corpus corresponding to each slot-value pair in a conversation scene to form a corpus; and training a language understanding model according to the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the corpus, wherein the language understanding model is used for acquiring input information required by the finite-state machine from man-machine dialogue data.

Further, the obtaining module 62 is specifically configured to, when training the language understanding model:

training a first language understanding model according to the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the corpus, wherein the first language understanding model is used for extracting the slot-value pairs contained in the man-machine conversation data as input information; or

And training a second language understanding model according to the corresponding relation between the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the corpus and between the multiple sets of slot-value pairs and the multiple dialog states, wherein the second language understanding model is used for acquiring a transfer condition from the man-machine dialog data as input information.

In some alternative embodiments, the building module 64 is further configured to: adding a new state node in the finite-state machine model according to the new conversation state and a group of slot-value pairs corresponding to the new conversation state; and adding a bidirectional edge between the new state node and each existing state node, and generating a transfer condition when the new state node and each existing state node are transferred according to the difference between the two groups of slot-value pairs corresponding to the new state node and each existing state node.

In some optional embodiments, the obtaining module 62 is further configured to: and acquiring man-machine conversation data in the conversation scene. Accordingly, the dialog management policy generation apparatus further includes: the system comprises a language understanding module, a conversation management module and a language generation module.

And the language understanding module is used for acquiring input information which can trigger a finite state machine to carry out conversation state transition from the man-machine conversation data according to the filling slot linguistic data and the canceling slot linguistic data corresponding to each slot-value pair in the conversation scene.

And the conversation management module is used for controlling the finite state machine to jump from the current conversation state to the next conversation state according to the input information.

And the language generation module is used for outputting response data of the man-machine conversation data according to the relevant data of the next conversation state.

Further, the dialog management policy generation apparatus may further include: the device comprises a voice recognition module and a voice synthesis module. And the voice recognition module is used for converting the man-machine conversation data into text data and providing the text data to the language understanding model. And the voice synthesis module is used for converting the response data generated by the language generation module into voice data.

Having described the internal functions and structure of the session management policy generation apparatus as described above, as shown in fig. 6b, in practice, the session management generation apparatus may be implemented as a human-machine session device including: a memory 601 and a processor 602.

The memory 601 is used for storing a computer program and may be configured to store other various data to support operations on the human-machine conversation device. Examples of such data include instructions for any application or method operating on the human dialog device, contact data, phonebook data, messages, pictures, videos, and so forth.

The memory 601 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

A processor 602, coupled to the memory 601, for executing the computer programs in the memory 601 to:

determining a plurality of semantic slots suitable for the dialogue scene and candidate slot values corresponding to the semantic slots based on semantic understanding of the dialogue scene;

combining the candidate slot values corresponding to the plurality of semantic slots to obtain a plurality of sets of slot-value pairs with conversational significance, wherein each set comprises the slot-value pairs corresponding to the plurality of semantic slots;

generating a plurality of dialogue states corresponding to the plurality of sets of slot-value pairs according to the semantics expressed by the plurality of sets of slot-value pairs;

and constructing a finite state machine model according to the plurality of conversation states and the plurality of sets of slot-value pairs so as to perform conversation management on the man-machine conversation process in the conversation scene in the form of a finite state machine.

In some alternative embodiments, processor 602, when constructing the finite state machine model, is specifically configured to: mapping the plurality of dialog states to a plurality of state nodes in a finite state machine model; adding a bidirectional edge between any two state nodes in the plurality of state nodes; and generating a transfer condition when any two state nodes are transferred according to the difference between the two groups of slot-value pairs corresponding to any two state nodes so as to construct a finite-state machine model.

In some optional embodiments, the processor 602 is further configured to: acquiring a slot filling corpus and a slot canceling corpus corresponding to each slot-value pair in a conversation scene to form a corpus; and training a language understanding model according to the filling slot linguistic data and the canceling slot linguistic data corresponding to each slot-value pair in the corpus, wherein the language understanding model is used for acquiring input information required by the finite state machine from man-machine dialogue data.

Further, the processor 602, when training the language understanding model, is specifically configured to:

training a first language understanding model according to the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the corpus, wherein the first language understanding model is used for extracting the slot-value pairs contained in the man-machine conversation data as input information; or alternatively

In some optional embodiments, the processor 602 is further configured to: adding a new state node in the finite-state machine model according to the new dialogue state and a group of slot-value pairs corresponding to the new dialogue state; and adding a bidirectional edge between the new state node and each existing state node, and generating a transfer condition when the new state node and each existing state node are transferred according to the difference between the two groups of slot-value pairs corresponding to the new state node and each existing state node.

In some optional embodiments, the processor 602 is further configured to: acquiring man-machine conversation data in the conversation scene; acquiring input information which can trigger a finite state machine to carry out conversation state transition from the man-machine conversation data according to the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the conversation scene; controlling the finite state machine to jump to the next conversation state from the current conversation state according to the input information; and outputting response data of the man-machine conversation data according to the related data of the next conversation state.

Further, the processor 602 is further configured to: the man-machine conversation data is converted into text data, and the response data is converted into voice data.

Further, as shown in fig. 6b, the human-machine dialog device further includes: communication components 603, power components 604, and the like. Only some of the components are schematically shown in fig. 6b, and it is not intended that the human-machine dialog device only includes the components shown in fig. 6 b.

In some application scenarios, the human-machine conversation device shown in fig. 6b may be a server, and for example, may be a conventional server, a cloud host, a virtual center, or other server devices.

In other application scenarios, the man-machine conversation device shown in fig. 6b may be a terminal device, such as a smart phone, a tablet computer, a personal computer, a wearable device, an intelligent sound device, etc. which are installed with various voice interaction application software, or may be various voice interaction self-service terminals and self-service machines, such as a self-service registration/payment machine of a hospital, a self-service teller machine of a bank, an automatic ticket taker in scenes of a subway, a station, an airport, etc.; or may be a family accompanying robot, a chat robot, a sweeping robot, a navigation/following robot, a robot providing ordering service, etc. supporting voice interaction.

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing a computer program, which when executed by one or more processors, cause the one or more processors to perform actions comprising:

combining the candidate slot values corresponding to the semantic slots to obtain a plurality of sets of slot-value pairs with conversational significance, wherein each set comprises the respective slot-value pairs corresponding to the semantic slots;

In addition to the actions described above, the one or more processors may perform other actions that may be performed by the server in other embodiments described above.

Fig. 7a is a schematic structural diagram of a human-machine interaction device according to another exemplary embodiment of the present application. As shown in fig. 7a, the apparatus comprises: an acquisition module 71, a language understanding module 72, a dialogue management module 73 and a language synthesis module 74.

An obtaining module 71, configured to obtain human-computer conversation data in a conversation scene;

a language understanding module 72, configured to obtain, from the human-computer dialogue data, input information that can trigger a finite state machine to perform dialogue state transition according to the slot-filling corpus and the slot-canceling corpus corresponding to each slot-value pair in the dialogue scene;

the dialog management module 73 is used for controlling the finite-state machine to jump from the current dialog state to the next dialog state according to the input information;

and a language synthesis module 74, configured to output response data of the man-machine conversation data according to the data related to the next conversation state.

In an alternative embodiment, language understanding module 72 is specifically configured to:

operating a first language understanding model according to the man-machine conversation data to obtain a slot-value pair contained in the man-machine conversation data as the input information; or

Running a second language understanding model according to the man-machine conversation data to obtain a transfer condition in the finite state machine as the input information;

and the first language understanding model or the second language understanding model is obtained by pre-training the corresponding slot filling linguistic data and the slot canceling linguistic data according to each slot-value pair in the conversation scene.

In an alternative embodiment, the apparatus further comprises: and constructing a module. The building module is used for:

determining a plurality of semantic slots applicable to the dialog scene and candidate slot values corresponding to the semantic slots based on semantic understanding of the dialog scene;

constructing a finite state machine model from the plurality of dialog states and the plurality of sets of slot-value pairs.

Having described the internal functions and structure of the human-machine interaction device, as shown in fig. 7b, in practice, the human-machine interaction device may be implemented as a human-machine interaction apparatus including: a memory 701 and a processor 702.

A memory 701 for storing a computer program and may be configured to store various other data to support operations on the human dialog device. Examples of such data include instructions for any application or method operating on the human dialog device, contact data, phonebook data, messages, pictures, videos, and so forth.

The memory 701 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

A processor 702, coupled to the memory 701, for executing the computer program in the memory 701 for:

acquiring man-machine conversation data in a conversation scene;

In an alternative embodiment, the processor 702 is specifically configured to:

running a first language understanding model according to the man-machine conversation data to obtain a slot-value pair contained in the man-machine conversation data as the input information; or alternatively

In an alternative embodiment, the processor 702 is further configured to:

determining a plurality of semantic slots applicable to the dialog scene and candidate slot values corresponding to the plurality of semantic slots based on semantic understanding of the dialog scene;

Further, as shown in fig. 7b, the human-machine dialog device further includes: communications component 703, display 704, power component 705, audio component 706, and the like. Only some of the components are schematically shown in fig. 7b, and it is not intended that the human-machine dialog device only includes the components shown in fig. 7 b.

In some application scenarios, the human-machine conversation device shown in fig. 7b may be a server, and for example, may be a conventional server, a cloud host, a virtual center, or other server device.

In other application scenarios, the man-machine conversation device shown in fig. 7b may be a terminal device, such as a smart phone, a tablet computer, a personal computer, a wearable device, an intelligent sound device, etc. which are installed with various voice interaction application software, or may be various voice interaction self-service terminals and self-service machines, such as a self-service registration/payment machine of a hospital, a self-service teller machine of a bank, an automatic ticket taker in scenes of a subway, a station, an airport, etc.; or may be a family accompanying robot, a chat robot, a sweeping robot, a navigation/following robot, a robot providing ordering service, etc. supporting voice interaction.

acquiring man-machine conversation data in a conversation scene;

In addition to the actions described above, the one or more processors may also perform other actions that may be performed by the terminal device in other embodiments described above.

The communication components of fig. 6b and 7b described above are configured to facilitate communication between the device in which the communication component is located and other devices in a wired or wireless manner. The device in which the communication component is located may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further comprises a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

The display in fig. 7b described above includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The power supply components of fig. 6b and 7b described above provide power to the various components of the device in which the power supply components are located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.

The audio component of fig. 7b, described above, may be configured to output and/or input an audio signal. For example, the audio component includes a Microphone (MIC) configured to receive an external audio signal when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A dialog management policy generation method, comprising:

constructing a finite state machine model according to the plurality of conversation states and the plurality of sets of slot-value pairs so as to perform conversation management on a man-machine conversation process in the conversation scene in a finite state machine mode;

wherein said constructing a finite state machine model from said plurality of dialog states and said plurality of sets of slot-value pairs comprises: mapping the plurality of dialog states to a plurality of state nodes in the finite state machine model; adding a bidirectional edge between any two state nodes in the plurality of state nodes; and generating a transfer condition during transfer between any two state nodes according to the difference between the two sets of slot-value pairs corresponding to any two state nodes so as to construct the finite-state machine model.

2. The method of claim 1, further comprising:

acquiring a slot filling corpus and a slot canceling corpus corresponding to each slot-value pair in the conversation scene to form a corpus;

and training a language understanding model according to the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the corpus, wherein the language understanding model is used for acquiring input information required by the finite state machine from man-machine dialogue data.

3. The method according to claim 2, wherein the training of the language understanding model according to the slot-filling corpus and the slot-canceling corpus corresponding to each slot-value pair in the corpus comprises:

training a first language understanding model according to the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the corpus, wherein the first language understanding model is used for extracting the slot-value pairs contained in the man-machine conversation data as the input information; or

And training a second language understanding model according to the filling slot linguistic data and the canceling slot linguistic data corresponding to each slot-value pair in the corpus and the corresponding relation between the multiple sets of slot-value pairs and the plurality of dialogue states, wherein the second language understanding model is used for acquiring a transfer condition from the man-machine dialogue data as the input information.

4. The method according to any one of claims 1-3, further comprising, after constructing the finite state machine model from the plurality of dialog states and the plurality of sets of slot-value pairs:

adding a new state node in the finite-state machine model according to a new dialogue state and a group of slot-value pairs corresponding to the new dialogue state;

and adding a bidirectional edge between the new state node and each existing state node, and generating a transfer condition when the new state node and each existing state node are transferred according to the difference between the two groups of slot-value pairs corresponding to the new state node and each existing state node.

5. A method for human-computer interaction, comprising:

acquiring man-machine conversation data in a conversation scene;

outputting response data of the man-machine conversation data according to the related data of the next conversation state; wherein the finite state machine is constructed according to the steps in the method of claim 1.

6. The method according to claim 5, wherein the obtaining input information that can trigger a finite state machine to perform dialog state transition from the human-computer dialog data according to the slot filling corpus and the slot canceling corpus corresponding to each slot-value pair in the dialog scene comprises:

Running a second language understanding model according to the man-machine conversation data to obtain a transfer condition in the finite-state machine as the input information;

and the first language understanding model or the second language understanding model is obtained by pre-training the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the conversation scene.

7. The method according to claim 5 or 6, before controlling the finite state machine to jump from a current dialog state to a next dialog state according to the input information, further comprising:

8. A human-computer dialog device, comprising: a memory and a processor;

a memory for storing a computer program;

the processor to execute the computer program to:

constructing a finite-state machine model according to the plurality of conversation states and the plurality of sets of slot-value pairs so as to perform conversation management on the man-machine conversation process in the conversation scene in a finite-state machine form;

wherein the processor is specifically configured to: mapping the plurality of dialog states to a plurality of state nodes in the finite state machine model; adding a bidirectional edge between any two state nodes in the plurality of state nodes; and generating a transfer condition during transfer between any two state nodes according to the difference between the two sets of slot-value pairs corresponding to any two state nodes so as to construct the finite-state machine model.

9. The human-computer dialog device of claim 8 wherein the processor is further configured to:

and training a language understanding model according to the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the corpus, wherein the language understanding model is used for acquiring input information required by the finite-state machine from man-machine dialogue data.

10. A human-computer interaction device according to claim 9, wherein the processor is specifically configured to:

training a first language understanding model according to the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the corpus, wherein the first language understanding model is used for extracting the slot-value pairs contained in the man-machine conversation data to serve as the input information; or alternatively

11. The human-computer dialog device of any of claims 8-10 wherein the processor is further configured to:

adding a new state node in the finite-state machine model according to the new dialogue state and a group of slot-value pairs corresponding to the new dialogue state;

12. The human-computer dialog device of any of claims 8-10 wherein the processor is further configured to:

acquiring man-machine conversation data in the conversation scene;

13. A computer-readable storage medium storing computer instructions, which when executed by one or more processors, cause the one or more processors to perform acts comprising:

wherein said constructing a finite state machine model from said plurality of dialog states and said plurality of sets of slot-value pairs comprises: mapping the plurality of dialog states to a plurality of state nodes in the finite state machine model; adding a bidirectional edge between any two state nodes in the plurality of state nodes; and generating a transfer condition when the two state nodes are transferred according to the difference between the two groups of slot-value pairs corresponding to the two state nodes so as to construct the finite-state machine model.

14. A human-machine dialog device, comprising: a memory and a processor;

the memory for storing a computer program;

the processor to execute the computer program to:

acquiring man-machine conversation data in a conversation scene;

15. The human-computer dialog device of claim 14, wherein the processor is specifically configured to:

16. The human-computer dialog device of claim 14 wherein the processor is further configured to:

and constructing a finite state machine model according to the plurality of conversation states and the plurality of sets of slot-value pairs so as to perform conversation management on the man-machine conversation process in the conversation scene in a finite state machine mode.

17. A human-computer dialog device according to any of claims 14-16 characterized in that the human-computer dialog device comprises at least one of the following:

intelligent robot, self-service machine, self-service terminal, intelligent terminal and self-service vending machine.

18. A computer-readable storage medium storing computer instructions, which when executed by one or more processors, cause the one or more processors to perform acts comprising:

acquiring man-machine conversation data in a conversation scene;

19. A human-computer dialog system, comprising: a server and a terminal device;

the terminal device is used for receiving human-computer conversation data input by a user in a conversation scene, sending the human-computer conversation data to the server, receiving response data corresponding to the human-computer conversation data returned by the server and outputting the response data to the user;

the server is used for receiving the man-machine conversation data sent by the terminal equipment, and acquiring input information capable of triggering the finite-state machine to carry out conversation state transfer from the man-machine conversation data according to the slot filling linguistic data and the slot canceling linguistic data corresponding to each slot-value pair in the conversation scene; controlling the finite state machine to jump to the next conversation state from the current conversation state according to the input information; according to the related data of the next conversation state, response data of the man-machine conversation data is returned to the terminal equipment; wherein the finite state machine is constructed according to the steps in the method of claim 1.

20. The system of claim 19, wherein the server is further configured to: