WO2023065629A1 - 一种对话管理方法、系统、终端及存储介质 - Google Patents

一种对话管理方法、系统、终端及存储介质 Download PDF

Info

Publication number
WO2023065629A1
WO2023065629A1 PCT/CN2022/089566 CN2022089566W WO2023065629A1 WO 2023065629 A1 WO2023065629 A1 WO 2023065629A1 CN 2022089566 W CN2022089566 W CN 2022089566W WO 2023065629 A1 WO2023065629 A1 WO 2023065629A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
taskflow
dialogue
flowchart
user
Prior art date
Application number
PCT/CN2022/089566
Other languages
English (en)
French (fr)
Inventor
梁必志
黄天来
叶怡周
吴星
马骏
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023065629A1 publication Critical patent/WO2023065629A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • This application relates to the technical field of voice processing of artificial intelligence, in particular to a dialogue management method, system, terminal and storage medium.
  • dialog management systems have made great progress in terms of performance and user experience.
  • the inventor realizes that the existing dialog management systems are rule-based methods, but lack a relatively general rule programming framework and platform.
  • domain experts design the dialog scenarios that the dialog management system can express, and the management rules can be implemented by code logic, or hidden in the dialog tree structure and dialog frame.
  • VoiceXML a markup language applied to voice browsing
  • voice browser mainly composed of voice browser, voice recognition, voice synthesis and VoiceXML gateway.
  • VoiceXML can be used to establish WEB-based voice applications and Serve.
  • the inventor realizes that the portability and flexibility of this voice interaction method are poor, the actual system development is difficult, and the dialogue process writing and debugging are relatively complicated.
  • This application provides a dialog management method, system, terminal, and storage medium, aiming to solve the problems of poor portability and flexibility, difficult actual system development, and complicated dialog process writing and debugging in existing voice interaction methods. technical problem.
  • a dialog management method comprising:
  • the taskflow flowchart is composed of an API interface node, a SLOTS slot filling node, a SCRIPT script node, an NLG reply node, and a JUDGE judgment node, and the The taskflow flowchart includes at least one user intent in the dialog flow;
  • a dialogue management system including:
  • Process drawing module used to draw the taskflow flowchart according to the dialog logic, and store the taskflow flowchart in the database; the taskflow flowchart is judged by the API interface node, SLOTS slot filling node, SCRIPT script node, NLG reply node and JUDGE Nodes, and the taskflow flowchart includes at least one user intention in the dialog flow;
  • Process acquisition module used to identify the user's intention according to the user's voice stream data during the man-machine dialogue, and acquire the taskflow flow chart according to the user's intention;
  • Process analysis module used to analyze the taskflow flow chart to obtain the dialogue process
  • Process execution module used for executing dialogue logic according to the dialogue process.
  • a terminal includes a processor and a memory coupled to the processor, wherein,
  • the memory stores program instructions for realizing the above-mentioned dialog management method
  • the processor is configured to execute the program instructions stored in the memory to perform the session management operations.
  • Another technical solution adopted by the embodiment of the present application is: a storage medium storing program instructions executable by a processor, and the program instructions are used to execute the above-mentioned dialog management method.
  • the dialog management method, system, terminal and storage medium of the embodiment of the present application store the pre-drawn TaskFlow flow chart in the dialog management module, and load the corresponding TaskFlow process according to the user's intention during the dialog process Diagram or sub-process and execute the dialog flow, improving the flexibility of dialog management.
  • this application designs the nodes used to draw the TaskFlow flow chart, drags each node to the designated position during the drawing process of the TaskFlow flow chart, and connects the nodes at different positions according to the dialogue logic, which greatly improves the Drawing efficiency.
  • the dialog flow can be divided into multiple sub-flows according to different user intentions, and a corresponding TaskFlow flow chart can be drawn for each sub-flow.
  • the degree of coupling improves the application flexibility of the TaskFlow flowchart, greatly reduces the difficulty of development and maintenance, and can handle more complex dialogue management scenarios.
  • FIG. 1 is a schematic flowchart of a dialog management method according to a first embodiment of the present application
  • FIG. 2 is a schematic flowchart of a dialog management method according to a second embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a dialogue management system according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a terminal structure according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a storage medium according to an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a dialog management method according to a first embodiment of the present application.
  • the dialog management method of the first embodiment of the present application includes the following steps:
  • the taskflow flowchart is the pre-drawn dialogue logic.
  • five nodes are designed to realize different functions, which are composed of five nodes: API interface node, SLOTS slot filling node, SCRIPT script node, NLG reply node, and JUDGE judgment node , the functions of each node are as follows:
  • API interface node used to obtain business information through remote calls in the dialogue process.
  • An API interface node can be created by configuring the remote service URL (Uniform Resource Locator, Uniform Resource Locator), interface input parameters (key-value format) and output parameters.
  • the API interface node shields the detailed information of the remote call through the API gateway service, and does not need to care about the differences between different remote callers, which greatly improves the user experience and the editing efficiency of the dialogue process.
  • SLOTS slot filling node used to collect slot information (that is, key information that needs to be collected from users) during the execution of the dialogue process, and perform multi-slot slot filling through NLU empowerment.
  • slot information that is, key information that needs to be collected from users
  • NLU empowerment During the execution of the dialogue process, the SLOTS slot-filling node continuously traverses all slots.
  • TTS Text To Speech, from text to speech
  • the user Input information under the guidance of TTS voice
  • NLU entity extracts the user's intention according to the user input information, fill the unfilled slot through the SLOTS slot filling node.
  • SCRIPT script node It is used to obtain dialog status information through embedded groovy scripts during the execution of the dialog process, and to control and modify the dialog status information to realize the customization requirements of the dialog process. That is, the execution logic of the general dialog management engine is developed using static Java language and edited with TaskFlow, while the processing of business logic in specific domains is processed by the embedded groovy script in the SCRIPT script node, thus decoupling the system operation and the business domain .
  • the groovy script can obtain the dialogue state information through the session.get('key') statement, and after processing, set the updated dialogue state information through the session.set('key', value) statement. Take the identification of user gender status information as an example:
  • NLG Reply Node It is used to assign dynamically changing data and generate reply speech in a templated manner during the execution of the dialogue process.
  • Sentence templates include several short sentences containing variables, which are dynamically updated by data information, generated by relevant business rules, and finally spliced into a well-structured and complete speech.
  • JUDGE judgment node used to control the direction of the dialogue process according to the configured conditional expressions during the execution of the dialogue process.
  • the JUDGE judgment node supports the configuration of multiple conditional expressions with priority.
  • the variables supported by the conditional expression are provided by the current dialog state information.
  • the conditional expression is executed according to the priority. If the value of the conditional expression is true, then Control the direction of the dialogue flow to jump to this branch and continue to execute the dialogue flow.
  • the taskflow flow chart is divided into two types: a flow chart including a complete dialogue process and a sub-flow chart including at least one user intention; wherein, the flow chart including a complete dialogue process is aimed at business
  • a flow chart including a complete dialogue process is aimed at business
  • task-based dialogue often requires multiple rounds of interaction to collect more information.
  • the more complex the business logic the more nodes the drawn TaskFlow flowchart will have, making the development of the TaskFlow flowchart more difficult. .
  • the dialog process is divided into multiple (at least two) sub-processes according to user intentions, so that each sub-process corresponds to at least one user intention, and the corresponding sub-processes are drawn.
  • Subflowchart When sub-flows need to be used, drag the sub-flow chart corresponding to the user's intention to the editing interface to share and reuse, avoiding meaningless duplication of labor, and greatly reducing the difficulty of development and maintenance.
  • the drawing method of the TaskFlow flowchart (including the flowchart and the sub-flowchart) in the embodiment of the present application is as follows: drag and drop the above-mentioned five kinds of nodes to the specified positions on the editing interface, and after configuring each node, according to the dialog logic Connect the nodes at different positions to get the drawn TaskFlow flowchart, and store the drawn TaskFlow flowchart in JSON (JavaScript Object Notation, JS Object Notation, a lightweight data exchange format) format to the dialog In the database of the management module, it is read by the dialogue management module when the dialogue task is executed.
  • JSON JavaScript Object Notation
  • JS Object Notation a lightweight data exchange format
  • S11 During the man-machine dialogue, identify the user's intention according to the user's voice stream data, and obtain the taskflow flow chart according to the user's intention;
  • ASR Automatic Speech Recognition, automatic speech recognition technology
  • NLU Natural Language Understanding, natural language understanding
  • the dialog logic execution process is as follows: first push the root node of the TaskFlow flow chart into the stack, and then start execution: first execute the top node element of the stack, and judge whether the current node is a non-leaf node (control node), if the current node is a non-leaf node If the current node is a leaf node, continue to push a child node into the stack; if the current node is a leaf node, execute the specific operation of the node and return the dialog status information. At the same time, judge whether the current node needs to wait for the user's reply, if not, return the execution status information; if necessary, create the status information object input by the user, and fill its slot after recognizing the user input.
  • the dialog management method of the first embodiment of the present application stores the pre-drawn TaskFlow flowchart in the dialog management module, and during the dialog process, loads the corresponding TaskFlow flowchart according to the user's intention and executes the dialog process, which improves the dialogue process.
  • the flexibility of management greatly improves the drawing efficiency.
  • FIG. 2 is a schematic flowchart of a dialog management method according to a second embodiment of the present application.
  • the dialog management method of the second embodiment of the present application includes the following steps:
  • S21 Perform voice-to-character transliteration recognition on the voice stream data through ASR to obtain corresponding text data, and process the text data through the NLU algorithm to obtain user intentions, and transfer the user intentions to the dialogue management module;
  • the dialog management module loads the taskflow flowchart according to the user's intention, and uses the Jackson tool to analyze the taskflow flowchart to obtain the dialog flow;
  • the taskflow flowchart is the pre-drawn dialogue logic.
  • five nodes are designed to realize different functions, which are composed of five nodes: API interface node, SLOTS slot filling node, SCRIPT script node, NLG reply node, and JUDGE judgment node , the functions of each node are as follows:
  • API interface node used to obtain business information through remote calls in the dialogue process.
  • An API interface node can be created by configuring the remote service URL (Uniform Resource Locator, Uniform Resource Locator), interface input parameters (key-value format) and output parameters.
  • the API interface node shields the detailed information of the remote call through the API gateway service, and does not need to care about the differences between different remote callers, which greatly improves the user experience and the editing efficiency of the dialogue process.
  • SLOTS slot filling node used to collect slot information (that is, key information that needs to be collected from users) during the execution of the dialogue process, and perform multi-slot slot filling through NLU empowerment.
  • slot information that is, key information that needs to be collected from users
  • NLU empowerment During the execution of the dialogue process, the SLOTS slot-filling node continuously traverses all slots.
  • TTS Text To Speech, from text to speech
  • the user Input information under the guidance of TTS voice
  • NLU entity extracts the user's intention according to the user input information, fill the unfilled slot through the SLOTS slot filling node.
  • SCRIPT script node It is used to obtain dialog status information through embedded groovy scripts during the execution of the dialog process, and to control and modify the dialog status information to realize the customization requirements of the dialog process. That is, the execution logic of the general dialog management engine is developed using static Java language and edited with TaskFlow, while the processing of business logic in specific domains is processed by the embedded groovy script in the SCRIPT script node, thus decoupling the system operation and the business domain .
  • the groovy script can obtain the dialogue state information through the session.get('key') statement, and after processing, set the updated dialogue state information through the session.set('key', value) statement. Take the identification of user gender status information as an example:
  • NLG Reply Node It is used to assign dynamically changing data and generate reply speech in a templated manner during the execution of the dialogue process.
  • Sentence templates include several short sentences containing variables, which are dynamically updated by data information, generated by relevant business rules, and finally spliced into a well-structured and complete speech.
  • JUDGE judgment node used to control the direction of the dialogue process according to the configured conditional expressions during the execution of the dialogue process.
  • the JUDGE judgment node supports the configuration of multiple conditional expressions with priority.
  • the variables supported by the conditional expression are provided by the current dialog state information.
  • the conditional expression is executed according to the priority. If the value of the conditional expression is true, then Control the direction of the dialogue flow to jump to this branch and continue to execute the dialogue flow.
  • the taskflow flow chart is divided into two types: a flow chart including a complete dialogue process and a sub-flow chart including at least one user intention; wherein, the flow chart including a complete dialogue process is aimed at business
  • a flow chart including a complete dialogue process is aimed at business
  • task-based dialogue often requires multiple rounds of interaction to collect more information.
  • the more complex the business logic the more nodes the drawn TaskFlow flowchart will have, making the development of the TaskFlow flowchart more difficult. .
  • the dialog process is divided into multiple (at least two) sub-processes according to user intentions, so that each sub-process corresponds to at least one user intention, and the corresponding sub-processes are drawn.
  • Subflowchart When sub-flows need to be used, drag the sub-flow chart corresponding to the user's intention to the editing interface to share and reuse, avoiding meaningless duplication of labor, and greatly reducing the difficulty of development and maintenance.
  • the drawing method of the TaskFlow flowchart (including the flowchart and the sub-flowchart) in the embodiment of the present application is as follows: drag and drop the above-mentioned five kinds of nodes to the specified positions on the editing interface, and after configuring each node, according to the dialog logic Connect the nodes at different positions to get the drawn TaskFlow flowchart, and store the drawn TaskFlow flowchart in JSON (JavaScript Object Notation, JS Object Notation, a lightweight data exchange format) format to the dialog In the database of the management module, it is read by the dialogue management module when the dialogue task is executed.
  • JSON JavaScript Object Notation
  • JS Object Notation a lightweight data exchange format
  • the dialog management module obtains the taskflow flowchart from the database, and needs to parse it into an execution object that the dialog management module can recognize.
  • the Jackson tool is used to analyze the taskflow flowchart in JSON format, generate java objects of each node, obtain the dialog flow for the dialog management module, and save the dialog flow through a tree structure.
  • the node information in the tree structure is shown in Table 1:
  • Table 1 Tree structure node information
  • Attributes name illustrate type node type nodeName node name subNodes collection of child nodes array type nextNode next node id
  • the dialog logic execution process is as follows: first push the root node of the TaskFlow flow chart into the stack, and then start execution: first execute the top node element of the stack, and judge whether the current node is a non-leaf node (control node), if the current node is a non-leaf node If the current node is a leaf node, continue to push a child node into the stack; if the current node is a leaf node, execute the specific operation of the node and return the dialog status information. At the same time, judge whether the current node needs to wait for the user's reply, if not, return the execution status information; if necessary, create the status information object input by the user, and fill its slot after recognizing the user input.
  • S24 Determine whether to run to the SLOTS slot filling node, if it runs to the SLOTS slot filling node, execute S25; otherwise, re-execute S23;
  • S26 Convert the reply speech into voice stream data through TTS technology, and then output it to the user through the telephone platform;
  • the dialog management method of the second embodiment of the present application stores the pre-drawn TaskFlow flowchart in the dialog management module, and during the dialog process, loads the corresponding TaskFlow flowchart or sub-flowchart according to the user's intention and executes the dialog flow , improving the flexibility of dialog management.
  • this application designs the nodes used to draw the TaskFlow flow chart, drags each node to the designated position during the drawing process of the TaskFlow flow chart, and connects the nodes at different positions according to the dialogue logic, which greatly improves the Drawing efficiency.
  • the dialogue flow can be divided into multiple sub-flows according to different user intentions, and corresponding sub-flowcharts can be drawn for each sub-flow. All the sub-flowcharts can be shared and reused, reducing the degree of coupling between them and other modules.
  • the application flexibility of the TaskFlow flowchart is improved, the difficulty of development and maintenance is greatly reduced, and it can handle more complex dialogue management scenarios.
  • the corresponding summary information is obtained based on the result of the dialog management method.
  • the summary information is obtained by hashing the result of the dialog management method, for example, by using the sha256s algorithm.
  • Uploading summary information to the blockchain guarantees its security and fairness and transparency to users. The user can download this summary information from the blockchain in order to verify whether the result of the dialog management method has been tampered with.
  • the blockchain referred to in this example is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain (Blockchain), essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • FIG. 3 is a schematic structural diagram of a dialog management system according to an embodiment of the present application.
  • the dialog management system 40 of the embodiment of the present application includes:
  • Process drawing module 41 used to draw the taskflow flowchart according to the dialogue logic, and store the taskflow flowchart in the database;
  • the taskflow flowchart is composed of API interface node, SLOTS slot filling node, SCRIPT script node, NLG reply node and JUDGE judgment node Composed, and the taskflow flow chart includes at least one user intention in the dialog flow;
  • Process acquisition module 42 used to identify user intentions according to the user's voice stream data during man-machine dialogue, and acquire the corresponding taskflow flow chart according to the user intentions;
  • Process analysis module 43 for analyzing the taskflow flow chart to obtain the dialogue process
  • Process execution module 44 used for executing dialogue logic according to the dialogue process.
  • FIG. 4 is a schematic diagram of a terminal structure in an embodiment of the present application.
  • the terminal 50 includes a processor 51 and a memory 52 coupled to the processor 51 .
  • the memory 52 stores program instructions for realizing the above-mentioned dialog management method.
  • the processor 51 is used to execute the program instructions stored in the memory 52 to perform the following steps; draw the taskflow flow chart according to the dialogue logic, and store the taskflow flow chart in the database; the taskflow flow chart is composed of API interface nodes, SLOTS slot filling nodes, and SCRIPT script nodes , NLG reply node and JUDGE judgment node, and the taskflow flow chart includes at least one user intention in the dialogue process; during the man-machine dialogue, the user intention is identified according to the user's voice stream data, and the corresponding taskflow process is obtained according to the user intention Figure; analyze the taskflow flow chart to obtain the dialogue flow; execute the dialogue logic according to the dialogue flow.
  • drawing the taskflow flow chart according to the dialog logic includes: configuring the API interface node, SLOTS slot filling node, SCRIPT script node, NLG reply node and JUDGE judgment node to the designated positions respectively, and connecting the nodes at different positions according to the dialog logic Line to get the drawn taskflow flow chart; API interface node is used to obtain business information through remote calls in the dialogue process; SLOTS slot filling node is used to collect slot information and fill slots in the execution dialogue process; SCRIPT script node It is used to obtain dialogue state information through embedded groovy scripts during the execution of the dialogue process, and to control and modify the dialogue state information; the NLG reply node is used to generate reply words in a templated way during the execution of the dialogue process; JUDGE The judgment node is used to control the direction of the dialogue process according to the configured conditional expressions during the execution of the dialogue process.
  • drawing the taskflow flowchart according to the dialog logic further includes: the taskflow flowchart is divided into a flowchart including a complete dialog flow and a sub-flowchart including at least one user intention.
  • storing the taskflow flowchart in the database specifically includes: storing the taskflow flowchart in the database in JSON format.
  • identifying the user's intention based on the user's voice stream data includes: obtaining the user's voice stream data; using automatic speech recognition technology to perform voice-to-character transcription recognition on the voice stream data to obtain corresponding text data; Text data is processed to get user intent.
  • executing the dialogue logic according to the dialogue flow includes: starting from the first node in the taskflow flow chart, first pushing the root node of the TaskFlow flow chart into the stack, and starting to execute from the top node element of the stack, judging whether the current node is a non-leaf node, if If the current node is a non-leaf node, continue to push a child node into the stack; if the current node is a leaf node, execute the operation of the current node and return the dialog status information; determine whether the current node needs to wait for the user to reply, if not, return to execute State information; if necessary, create the user's input state information, and fill its slot after recognizing the user's reply information; push the next triggered node in the TaskFlow flowchart into the stack and execute the dialog logic, and clear the stack Nodes that have been executed in .
  • the processor 51 may also be referred to as a CPU (Central Processing Unit, central processing unit).
  • the processor 51 may be an integrated circuit chip with signal processing capabilities.
  • the processor 51 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components .
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the terminal in the embodiment of the present application uses the processor to execute the program instructions stored in the memory and control the dialog management method stored in the memory to quantify the performance degree of each acoustic feature on each emotion label, and then when the emotion label changes, according to The quantitative index calculates the sensitivity of each acoustic feature as the emotion label is converted, and filters out the acoustic features whose sensitivity is less than the sensitivity threshold, and performs dialogue management according to the filtered acoustic features.
  • the embodiment of the present application takes into account the flexibility of the application, can improve the accuracy of the dialogue management, and at the same time reduce the workload in the actual application scenario.
  • FIG. 5 is a schematic structural diagram of a storage medium according to an embodiment of the present application.
  • the storage medium of the embodiment of the present application stores a program file 61 capable of realizing the following steps: draw the taskflow flowchart according to the dialogue logic, and store the taskflow flowchart in the database; the taskflow flowchart is composed of API interface nodes, SLOTS slot filling nodes, and SCRIPT scripts node, NLG reply node and JUDGE judgment node, and the taskflow flow chart includes at least one user intention in the dialogue process; during the man-machine dialogue, the user intention is identified according to the user's voice stream data, and the corresponding taskflow is obtained according to the user intention Flowchart; analyze the taskflow flowchart to obtain the dialogue flow; execute the dialogue logic according to the dialogue flow.
  • drawing the taskflow flow chart according to the dialogue logic includes: configuring the API interface node, SLOTS slot filling node, SCRIPT script node, NLG reply node and JUDGE judgment node to the designated positions respectively, and connecting the nodes at different positions according to the dialogue logic Line to get the drawn taskflow flowchart; API interface node is used to obtain business information through remote calls in the dialogue process; SLOTS slot filling node is used to collect slot information and fill slots in the execution dialogue process; SCRIPT script node It is used to obtain dialogue state information through embedded groovy scripts during the execution of the dialogue process, and to control and modify the dialogue state information; the NLG reply node is used to generate reply words in a templated manner during the execution of the dialogue process; JUDGE The judgment node is used to control the direction of the dialogue process according to the configured conditional expressions during the execution of the dialogue process.
  • drawing the taskflow flowchart according to the dialog logic further includes: the taskflow flowchart is divided into a flowchart including a complete dialog flow and a sub-flowchart including at least one user intention.
  • storing the taskflow flowchart in the database specifically includes: storing the taskflow flowchart in the database in JSON format.
  • identifying the user's intention based on the user's voice stream data includes: obtaining the user's voice stream data; using automatic speech recognition technology to perform voice-to-character transcription recognition on the voice stream data to obtain corresponding text data; Text data is processed to get user intent.
  • executing the dialogue logic according to the dialogue flow includes: starting from the first node in the taskflow flow chart, first pushing the root node of the TaskFlow flow chart into the stack, and starting to execute from the top node element of the stack, judging whether the current node is a non-leaf node, if If the current node is a non-leaf node, continue to push a child node into the stack; if the current node is a leaf node, execute the operation of the current node and return the dialog status information; determine whether the current node needs to wait for the user to reply, if not, return to execute State information; if necessary, create the user's input state information, and fill its slot after recognizing the user's reply information; push the next triggered node in the TaskFlow flowchart into the stack and execute the dialog logic, and clear the stack Nodes that have been executed in .
  • the program file 61 can be stored in the above-mentioned storage medium in the form of a software product, and the computer-readable storage medium can be non-volatile or volatile, and it includes several instructions to make a computer device (It may be a personal computer, a server, or a network device, etc.) or a processor (processor) executes all or part of the steps of the methods in various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. , or terminal devices such as computers, servers, mobile phones, and tablets.
  • the storage medium of the embodiment of the present application executes the dialog management method through the stored program instructions in the processor to quantify the performance degree of each acoustic feature on each emotional label, and then calculates each acoustic feature according to the quantitative index when the emotional label changes.
  • the embodiment of the present application takes into account the flexibility of the application, can improve the accuracy of dialogue management, and at the same time reduce the workload in actual application scenarios.
  • the disclosed system, device and method can be implemented in other ways.
  • the system embodiments described above are only illustrative.
  • the division of units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units. The above is only the implementation mode of this application, and does not limit the scope of patents of this application. Any equivalent structure or equivalent process transformation made by using the contents of this application specification and drawings, or directly or indirectly used in other related technical fields, All are included in the scope of patent protection of the present application in the same way.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Machine Translation (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请公开了一种对话管理方法、系统、终端及存储介质。所述方法包括:根据对话逻辑绘制taskflow流程图,将所述taskflow流程图存储于数据库中;所述taskflow流程图由API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点以及JUDGE判断节点组成,且所述taskflow流程图包括对话流程中的至少一个用户意图;在人机对话时,根据用户的语音流数据识别出用户意图,并根据所述用户意图获取对应的taskflow流程图;对所述taskflow流程图进行解析,获取对话流程;根据所述对话流程执行对话逻辑。本申请实施例提高了TaskFlow流程图的应用灵活性,大大降低了开发和维护难度,能够应对更多复杂的对话管理场景。

Description

一种对话管理方法、系统、终端及存储介质
本申请要求于2021年10月22日提交中国专利局、申请号为202111235000.6,发明名称为“一种对话管理方法、系统、终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能之语音处理技术领域,特别是涉及一种对话管理方法、系统、终端及存储介质。
背景技术
近年来,随着智能语音和自然语言处理等相关技术的不断进步,对话管理系统在性能和用户体验方面有了长足的提升。发明人意识到,现有的对话管理系统都属于基于规则的方法,而缺乏一种较为通用的规则编程框架和平台。通常由领域专家设计出对话管理系统能够表述的对话场景,管理规则可以由代码逻辑实现,或隐藏在对话树结构和对话框架中。
当下较为流行的语音交互方式为VoiceXML(一种应用于语音浏览的标记语言),它主要由语音浏览器、语音识别、语音合成和VoiceXML网关等部分组成,利用VoiceXML可以建立基于WEB的语音应用和服务。然而发明人意识到该语音交互方式的移植性及灵活性较差、实际系统开发难度大、对话流程编写及调试较为复杂。
发明内容
本申请提供了一种对话管理方法、系统、终端及存储介质,旨在解决现有的语音交互方式存在的移植性及灵活性较差、实际系统开发难度大、对话流程编写及调试较为复杂等技术问题。
为解决上述技术问题,本申请采用的技术方案为:
一种对话管理方法,包括:
根据对话逻辑绘制taskflow流程图,将所述taskflow流程图存储于数据库中;所述taskflow流程图由API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点以及JUDGE判断节点组成,且所述taskflow流程图包括对话流程中的至少一个用户意图;
在人机对话时,根据用户的语音流数据识别出用户意图,并根据所述用户意图获取taskflow流程图;
对所述taskflow流程图进行解析,获取对话流程;
根据所述对话流程执行对话逻辑,返回与所述用户意图相对应的回复话术。
本申请实施例采取的另一技术方案为:一种对话管理系统,包括:
流程绘制模块:用于根据对话逻辑绘制taskflow流程图,将所述taskflow流程图存储于数据库中;所述taskflow流程图由API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点以及JUDGE判断节点组成,且所述taskflow流程图包括对话流程中的至少一个用户意图;
流程获取模块:用于在人机对话时,根据用户的语音流数据识别出用户意图, 并根据所述用户意图获取taskflow流程图;
流程解析模块:用于对所述taskflow流程图进行解析,获取对话流程;
流程执行模块:用于根据所述对话流程执行对话逻辑。
本申请实施例采取的又一技术方案为:一种终端,所述终端包括处理器、与所述处理器耦接的存储器,其中,
所述存储器存储有用于实现上述的对话管理方法的程序指令;
所述处理器用于执行所述存储器存储的所述程序指令以执行所述对话管理操作。
本申请实施例采取的又一技术方案为:一种存储介质,存储有处理器可运行的程序指令,所述程序指令用于执行上述的对话管理方法。
本申请的有益效果是:本申请实施例的对话管理方法、系统、终端及存储介质通过将预先绘制的TaskFlow流程图存储在对话管理模块中,在对话过程中,根据用户意图加载对应的TaskFlow流程图或子流程并执行对话流程,提高了对话管理的灵活性。同时,本申请通过设计用于绘制TaskFlow流程图的节点,在TaskFlow流程图绘制过程中,通过拖拽各个节点到指定位置,并根据对话逻辑对不同位置的节点进行连线,极大的提高了绘制效率。本申请实施例可根据不同的用户意图将对话流程划分为多个子流程,分别为每个子流程绘制对应的TaskFlow流程图,所有子流程的TaskFlow流程图可共享复用,降低了其与其他模块的耦合度,提高了TaskFlow流程图的应用灵活性,大大降低了开发和维护难度,能够应对更多复杂的对话管理场景。
附图说明
图1是本申请第一实施例的对话管理方法的流程示意图;
图2是本申请第二实施例的对话管理方法的流程示意图;
图3是本申请实施例对话管理系统的结构示意图;
图4是本申请实施例的终端结构示意图;
图5是本申请实施例的存储介质结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请的一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参阅图1,是本申请第一实施例的对话管理方法的流程示意图。本申请第一实施例的对话管理方法包括以下步骤:
S10:根据对话逻辑绘制taskflow流程图,将taskflow流程图存储于数据库中;
本步骤中,taskflow流程图即为预先绘制的对话逻辑。本申请实施例中,为了提高绘制效率,设计了用于实现不同功能的五个节点,分别为:API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点、JUDGE判断节点五个节点组成,各个节点的功能如下:
API接口节点:用于在对话流程中通过远程调用获取业务信息。通过配置远程服务的URL(Uniform Resource Locator,统一资源定位器)、接口入参(key-value格式)和出参即可创建一个API接口节点。本申请实施例中,在绘制taskflow流程 图时,API接口节点通过API网关服务屏蔽远程调用的细节信息,无需在意不同远程调用方的差异,大大提高了用户体验和对话流程的编辑效率。
SLOTS填槽节点:用于在执行对话流程中收集槽位信息(即需要向用户收集的关键信息),通过NLU赋能进行多槽位填槽。在对话流程执行过程中,SLOTS填槽节点通过不断遍历所有槽位,当找到未填充槽位时,将该槽位对应的澄清话术输出给TTS(Text To Speech,从文本到语音),用户在TTS语音的引导下输入信息,并通过NLU实体根据用户输入信息抽取用户意图后,通过SLOTS填槽节点对该未填充槽位进行填充。
SCRIPT脚本节点:用于在对话流程执行过程中通过内嵌的groovy脚本获取对话状态信息,并对对话状态信息进行控制修改,实现对话进程的定制化需求。即,将通用的对话管理引擎执行逻辑使用静态的Java语言开发和TaskFlow编辑,而具体领域业务逻辑的处理则SCRIPT脚本节点中内嵌的groovy脚本进行处理,从而将系统运行和业务领域进行解耦。具体的,groovy脚本可以通过session.get(‘key’)语句获取对话状态信息,经过处理后,通过session.set(‘key’,value)语句设置更新后的对话状态信息。以识别用户性别状态信息为例:
defg=session.get('gender')
session.set('gender',g==0?'女':'男')
NLG回复节点:用于在对话流程执行过程中通过模板化的方式,赋予动态变化的数据,生成回复话术。句子模板包括若干个含有变量的短句,变量由数据信息动态地保持更新,并由相关业务规则进行生成,最终拼接为结构良好的完整话术。
JUDGE判断节点:用于在对话流程执行过程中根据配置的条件表达式控制对话流程的走向。JUDGE判断节点支持配置多个具有优先级的条件表达式,条件表达式支持的变量由当前对话状态信息提供,节点运行时,根据优先级执行条件表达式,如果条件表达式的值为真,则控制对话流程的走向跳转至该分支并继续执行对话流程。
本申请实施例中,根据对话任务的复杂性将taskflow流程图分为包括完整对话流程的流程图以及包括至少一个用户意图的子流程图两种;其中,包括完整对话流程的流程图针对于业务逻辑较为简单、流程图节点数量较少的对话任务。而在实际应用中,任务型对话往往需要通过多轮交互以收集较多的信息,业务逻辑越复杂,所绘制的TaskFlow流程图的节点数量就会越多,导致TaskFlow流程图的开发难度较高。因此,本申请实施例针对业务逻辑较为复杂的对话任务,根据用户意图将对话流程划分为多个(至少两个)子流程,使每个子流程对应至少一个用户意图,并绘制各个子流程对应的子流程图。在需要使用子流程时,将与用户意图相对应的子流程图拖拽到编辑界面即可共享复用,避免了无意义的重复劳动,大大降低了开发和维护难度。
进一步地,本申请实施例的TaskFlow流程图(包括流程图及子流程图)绘制方式为:在编辑界面将上述五种节点拖拽到指定位置,并对各个节点进行配置后,根据对话逻辑对不同位置上的节点进行连线,得到绘制好的TaskFlow流程图,并将绘制好的TaskFlow流程图以JSON(JavaScript Object Notation,JS对象简谱,一种轻量级的数据交换格式)格式存储到对话管理模块的数据库中,在执行对话任务时供对话管理模块读取。
S11:在人机对话时,根据用户的语音流数据识别出用户意图,并根据用户意图获取taskflow流程图;
本步骤中,在获取到用户的语音流数据后,通过ASR(Automatic Speech Recognition,自动语音识别技术)对语音流数据进行音转字转写识别,得到对应的文本数据,并通过NLU(Natural Language Understanding,自然语言理解)算法对文本数据进行处理,得到用户意图。
S12:对taskflow流程图进行解析,获取对话流程;
本步骤中,通过Jackson工具对taskflow流程图进行解析以获取对话流程。
S13:根据对话流程执行对话逻辑;
本步骤中,对话逻辑执行过程具体为:先将TaskFlow流程图的根节点入栈,然后开始执行:首先执行栈顶节点元素,并判断当前节点是否是非叶子节点(控制节点),如果当前节点是非叶子节点,则继续将一个子节点入栈;如果当前节点是叶子节点,则执行该节点的具体操作,并返回对话状态信息。同时,判断当前节点是否需要等待用户的回复,如果不需要,返回执行状态信息;如果需要,则创建用户输入的状态信息对象,并在识别到用户输入后,对其槽位进行填充。输入完成后,将TaskFlow流程图中下一个被触发的节点压入任务栈中,重新对该节点执行上述过程,并清除栈中已执行过的节点。循环执行上述过程,直到完成对话任务。
基于上述,本申请第一实施例的对话管理方法通过将预先绘制的TaskFlow流程图存储在对话管理模块中,在对话过程中,根据用户意图加载对应的TaskFlow流程图并执行对话流程,提高了对话管理的灵活性,极大的提高了绘制效率。
请参阅图2,是本申请第二实施例的对话管理方法的流程示意图。本申请第二实施例的对话管理方法包括以下步骤:
S20:通过对话引擎获取用户的语音流数据;
S21:通过ASR对语音流数据进行音转字转写识别,得到对应的文本数据,并通过NLU算法对文本数据进行处理,得到用户意图,将用户意图传入对话管理模块;
S22:对话管理模块根据用户意图加载taskflow流程图,并采用Jackson工具对taskflow流程图进行解析,获取对话流程;
本步骤中,taskflow流程图即为预先绘制的对话逻辑。本申请实施例中,为了提高绘制效率,设计了用于实现不同功能的五个节点,分别为:API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点、JUDGE判断节点五个节点组成,各个节点的功能如下:
API接口节点:用于在对话流程中通过远程调用获取业务信息。通过配置远程服务的URL(Uniform Resource Locator,统一资源定位器)、接口入参(key-value格式)和出参即可创建一个API接口节点。本申请实施例中,在绘制taskflow流程图时,API接口节点通过API网关服务屏蔽远程调用的细节信息,无需在意不同远程调用方的差异,大大提高了用户体验和对话流程的编辑效率。
SLOTS填槽节点:用于在执行对话流程中收集槽位信息(即需要向用户收集的关键信息),通过NLU赋能进行多槽位填槽。在对话流程执行过程中,SLOTS填槽节点通过不断遍历所有槽位,当找到未填充槽位时,将该槽位对应的澄清话术输出给TTS(Text To Speech,从文本到语音),用户在TTS语音的引导下输入信息,并通过NLU实体根据用户输入信息抽取用户意图后,通过SLOTS填槽节点对该未填充槽位进行填充。
SCRIPT脚本节点:用于在对话流程执行过程中通过内嵌的groovy脚本获取对话状态信息,并对对话状态信息进行控制修改,实现对话进程的定制化需求。即,将通用的对话管理引擎执行逻辑使用静态的Java语言开发和TaskFlow编辑,而具 体领域业务逻辑的处理则SCRIPT脚本节点中内嵌的groovy脚本进行处理,从而将系统运行和业务领域进行解耦。具体的,groovy脚本可以通过session.get(‘key’)语句获取对话状态信息,经过处理后,通过session.set(‘key’,value)语句设置更新后的对话状态信息。以识别用户性别状态信息为例:
defg=session.get('gender')
session.set('gender',g==0?'女':'男')
NLG回复节点:用于在对话流程执行过程中通过模板化的方式,赋予动态变化的数据,生成回复话术。句子模板包括若干个含有变量的短句,变量由数据信息动态地保持更新,并由相关业务规则进行生成,最终拼接为结构良好的完整话术。
JUDGE判断节点:用于在对话流程执行过程中根据配置的条件表达式控制对话流程的走向。JUDGE判断节点支持配置多个具有优先级的条件表达式,条件表达式支持的变量由当前对话状态信息提供,节点运行时,根据优先级执行条件表达式,如果条件表达式的值为真,则控制对话流程的走向跳转至该分支并继续执行对话流程。
本申请实施例中,根据对话任务的复杂性将taskflow流程图分为包括完整对话流程的流程图以及包括至少一个用户意图的子流程图两种;其中,包括完整对话流程的流程图针对于业务逻辑较为简单、流程图节点数量较少的对话任务。而在实际应用中,任务型对话往往需要通过多轮交互以收集较多的信息,业务逻辑越复杂,所绘制的TaskFlow流程图的节点数量就会越多,导致TaskFlow流程图的开发难度较高。因此,本申请实施例针对业务逻辑较为复杂的对话任务,根据用户意图将对话流程划分为多个(至少两个)子流程,使每个子流程对应至少一个用户意图,并绘制各个子流程对应的子流程图。在需要使用子流程时,将与用户意图相对应的子流程图拖拽到编辑界面即可共享复用,避免了无意义的重复劳动,大大降低了开发和维护难度。
进一步地,本申请实施例的TaskFlow流程图(包括流程图及子流程图)绘制方式为:在编辑界面将上述五种节点拖拽到指定位置,并对各个节点进行配置后,根据对话逻辑对不同位置上的节点进行连线,得到绘制好的TaskFlow流程图,并将绘制好的TaskFlow流程图以JSON(JavaScript Object Notation,JS对象简谱,一种轻量级的数据交换格式)格式存储到对话管理模块的数据库中,在执行对话任务时供对话管理模块读取。
对话管理模块从数据库中获取到taskflow流程图,需要将其解析为对话管理模块可以识别的执行对象。本申请实施例通过Jackson工具对JSON格式的taskflow流程图进行解析,生成各个节点的java对象,得到供对话管理模块使用的对话流程,并通过树状结构保存对话流程。树状结构中的节点信息如表1所示:
表1:树状结构节点信息
属性 名称 说明
type 节点类型
nodeName 节点名称
subNodes 子节点集合 数组类型
nextNode 下一个节点id
S23:根据taskflow流程图的对话流程执行对话逻辑:从taskflow流程图中的第一个节点开始,依次将各个节点入栈运行;
本步骤中,对话逻辑执行过程具体为:先将TaskFlow流程图的根节点入栈,然后开始执行:首先执行栈顶节点元素,并判断当前节点是否是非叶子节点(控制节点),如果当前节点是非叶子节点,则继续将一个子节点入栈;如果当前节点是叶子节点,则执行该节点的具体操作,并返回对话状态信息。同时,判断当前节点是否需要等待用户的回复,如果不需要,返回执行状态信息;如果需要,则创建用户输入的状态信息对象,并在识别到用户输入后,对其槽位进行填充。输入完成后,将TaskFlow流程图中下一个被触发的节点压入任务栈中,重新对该节点执行上述过程,并清除栈中已执行过的节点。循环执行上述过程,直到完成对话任务。
S24:判断是否运行到SLOTS填槽节点,如果运行到SLOTS填槽节点,执行S25;否则,重新执行S23;
S25:暂停执行对话逻辑,通过对话引擎返回与用户意图相对应的回复话术;
S26:通过TTS技术将回复话术转换为语音流数据后通过电话平台输出至用户;
S27:判断对话任务是否结束,如果没有结束,重新执行S20,否则,执行S280;
S280:对话结束。
基于上述,本申请第二实施例的对话管理方法通过将预先绘制的TaskFlow流程图存储在对话管理模块中,在对话过程中,根据用户意图加载对应的TaskFlow流程图或子流程图并执行对话流程,提高了对话管理的灵活性。同时,本申请通过设计用于绘制TaskFlow流程图的节点,在TaskFlow流程图绘制过程中,通过拖拽各个节点到指定位置,并根据对话逻辑对不同位置的节点进行连线,极大的提高了绘制效率。本申请实施例可根据不同的用户意图将对话流程划分为多个子流程,分别为每个子流程绘制对应的子流程图,所有子流程图可共享复用,降低了其与其他模块的耦合度,提高了TaskFlow流程图的应用灵活性,大大降低了开发和维护难度,能够应对更多复杂的对话管理场景。
在一个可选的实施方式中,还可以:将所述的对话管理方法的结果上传至区块链中。
具体地,基于所述的对话管理方法的结果得到对应的摘要信息,具体来说,摘要信息由所述的对话管理方法的结果进行散列处理得到,比如利用sha256s算法处理得到。将摘要信息上传至区块链可保证其安全性和对用户的公正透明性。用户可以从区块链中下载得该摘要信息,以便查证所述的对话管理方法的结果是否被篡改。本示例所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
请参阅图3,是本申请实施例对话管理系统的结构示意图。本申请实施例对话管理系统40包括:
流程绘制模块41:用于根据对话逻辑绘制taskflow流程图,将taskflow流程图存储于数据库中;所述taskflow流程图由API接口节点、SLOTS填槽节点、SCRIPT 脚本节点、NLG回复节点以及JUDGE判断节点组成,且所述taskflow流程图包括对话流程中的至少一个用户意图;
流程获取模块42:用于在人机对话时,根据用户的语音流数据识别用户意图,并根据所述用户意图获取对应的taskflow流程图;
流程解析模块43:用于对所述taskflow流程图进行解析,获取对话流程;
流程执行模块44:用于根据所述对话流程执行对话逻辑。
请参阅图4,为本申请实施例的终端结构示意图。该终端50包括处理器51、与处理器51耦接的存储器52。
存储器52存储有用于实现上述对话管理方法的程序指令。
处理器51用于执行存储器52存储的程序指令以执行如下步骤;根据对话逻辑绘制taskflow流程图,将taskflow流程图存储于数据库中;taskflow流程图由API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点以及JUDGE判断节点组成,且taskflow流程图包括对话流程中的至少一个用户意图;在人机对话时,根据用户的语音流数据识别出用户意图,并根据用户意图获取对应的taskflow流程图;对taskflow流程图进行解析,获取对话流程;根据对话流程执行对话逻辑。
其中,根据对话逻辑绘制taskflow流程图包括:将API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点和JUDGE判断节点分别配置到指定位置,并根据对话逻辑对不同位置上的节点进行连线,得到绘制好的taskflow流程图;API接口节点用于在对话流程中通过远程调用获取业务信息;SLOTS填槽节点用于在执行对话流程中收集槽位信息并进行槽位填充;SCRIPT脚本节点用于在对话流程执行过程中通过内嵌的groovy脚本获取对话状态信息,并对对话状态信息进行控制修改;NLG回复节点用于在对话流程执行过程中通过模板化的方式生成回复话术;JUDGE判断节点用于在对话流程执行过程中根据配置的条件表达式控制对话流程的走向。
其中,根据对话逻辑绘制taskflow流程图还包括:taskflow流程图分为包括完整对话流程的流程图以及包括至少一个用户意图的子流程图。
其中,taskflow流程图存储于数据库中具体为:将taskflow流程图以JSON格式存储到数据库中。
其中,根据用户的语音流数据识别出用户意图包括:获取用户的语音流数据;通过自动语音识别技术对语音流数据进行音转字转写识别,得到对应的文本数据;通过自然语言理解算法对文本数据进行处理,得到用户意图。
其中,根据对话流程执行对话逻辑包括:从taskflow流程图中的第一个节点开始,先将TaskFlow流程图的根节点入栈,从栈顶节点元素开始执行,判断当前节点是否是非叶子节点,如果当前节点是非叶子节点,则继续将一个子节点入栈;如果当前节点是叶子节点,则执行当前节点的操作,并返回对话状态信息;判断当前节点是否需要等待用户回复,如果不需要,返回执行状态信息;如果需要,创建用户的输入状态信息,并在识别到用户的回复信息后,对其槽位进行填充;将TaskFlow流程图中下一个被触发的节点入栈并执行对话逻辑,清除栈中已执行过的节点。
其中,处理器51还可以称为CPU(Central Processing Unit,中央处理单元)。处理器51可能是一种集成电路芯片,具有信号的处理能力。处理器51还可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
本申请实施例的终端通过处理器执行存储器中存储的程序指令并控制存储在存储器中的对话管理方法对各声学特征在各情感标签上的表现程度进行量化,然后在 情感标签发生变化时,根据该量化指标计算各个声学特征随情感标签转换而变化的灵敏度,并过滤掉灵敏度小于灵敏度阈值的声学特征,根据过滤后的声学特征进行对话管理。本申请实施例兼顾了应用的灵活性,能够提升对话管理的准确率,同时降低了在实际应用场景中的工作载荷。
请参阅图5,为本申请实施例的存储介质的结构示意图。本申请实施例的存储介质存储有能够实现如下步骤的程序文件61:根据对话逻辑绘制taskflow流程图,将taskflow流程图存储于数据库中;taskflow流程图由API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点以及JUDGE判断节点组成,且taskflow流程图包括对话流程中的至少一个用户意图;在人机对话时,根据用户的语音流数据识别出用户意图,并根据用户意图获取对应的taskflow流程图;对taskflow流程图进行解析,获取对话流程;根据对话流程执行对话逻辑。
其中,根据对话逻辑绘制taskflow流程图包括:将API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点和JUDGE判断节点分别配置到指定位置,并根据对话逻辑对不同位置上的节点进行连线,得到绘制好的taskflow流程图;API接口节点用于在对话流程中通过远程调用获取业务信息;SLOTS填槽节点用于在执行对话流程中收集槽位信息并进行槽位填充;SCRIPT脚本节点用于在对话流程执行过程中通过内嵌的groovy脚本获取对话状态信息,并对对话状态信息进行控制修改;NLG回复节点用于在对话流程执行过程中通过模板化的方式生成回复话术;JUDGE判断节点用于在对话流程执行过程中根据配置的条件表达式控制对话流程的走向。
其中,根据对话逻辑绘制taskflow流程图还包括:taskflow流程图分为包括完整对话流程的流程图以及包括至少一个用户意图的子流程图。
其中,taskflow流程图存储于数据库中具体为:将taskflow流程图以JSON格式存储到数据库中。
其中,根据用户的语音流数据识别出用户意图包括:获取用户的语音流数据;通过自动语音识别技术对语音流数据进行音转字转写识别,得到对应的文本数据;通过自然语言理解算法对文本数据进行处理,得到用户意图。
其中,根据对话流程执行对话逻辑包括:从taskflow流程图中的第一个节点开始,先将TaskFlow流程图的根节点入栈,从栈顶节点元素开始执行,判断当前节点是否是非叶子节点,如果当前节点是非叶子节点,则继续将一个子节点入栈;如果当前节点是叶子节点,则执行当前节点的操作,并返回对话状态信息;判断当前节点是否需要等待用户回复,如果不需要,返回执行状态信息;如果需要,创建用户的输入状态信息,并在识别到用户的回复信息后,对其槽位进行填充;将TaskFlow流程图中下一个被触发的节点入栈并执行对话逻辑,清除栈中已执行过的节点。
其中,该程序文件61可以以软件产品的形式存储在上述存储介质中,所述计算机可读存储介质可以是非易失性,也可以是易失性,其包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施方式方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质,或者是计算机、服务器、手机、平板等终端设备。
本申请实施例的存储介质通过存储的处理器中的程序指令执行对话管理方法对各声学特征在各情感标签上的表现程度进行量化,然后在情感标签发生变化时,根据该量化指标计算各个声学特征随情感标签转换而变化的灵敏度,并过滤掉灵敏度小于灵敏度阈值的声学特征,根据过滤后的声学特征进行对话管理。本申请实施例 兼顾了应用的灵活性,能够提升对话管理的准确率,同时降低了在实际应用场景中的工作载荷。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的系统实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。以上仅为本申请的实施方式,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种对话管理方法,其中,包括:
    根据对话逻辑绘制taskflow流程图,将所述taskflow流程图存储于数据库中;所述taskflow流程图由API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点以及JUDGE判断节点组成,且所述taskflow流程图包括对话流程中的至少一个用户意图;
    在人机对话时,根据用户的语音流数据识别出用户意图,并根据所述用户意图获取对应的taskflow流程图;
    对所述taskflow流程图进行解析,获取对话流程;
    根据所述对话流程执行对话逻辑。
  2. 根据权利要求1所述的对话管理方法,其中,所述根据对话逻辑绘制taskflow流程图包括:
    将所述API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点和JUDGE判断节点分别配置到指定位置,并根据对话逻辑对不同位置上的节点进行连线,得到绘制好的taskflow流程图;
    所述API接口节点用于在对话流程中通过远程调用获取业务信息;
    所述SLOTS填槽节点用于在执行对话流程中收集槽位信息并进行槽位填充;
    所述SCRIPT脚本节点用于在对话流程执行过程中通过内嵌的groovy脚本获取对话状态信息,并对对话状态信息进行控制修改;
    NLG回复节点:用于在对话流程执行过程中通过模板化的方式生成回复话术;
    所述JUDGE判断节点用于在对话流程执行过程中根据配置的条件表达式控制对话流程的走向。
  3. 根据权利要求2所述的对话管理方法,其中,所述根据对话逻辑绘制taskflow流程图还包括:
    所述taskflow流程图分为包括完整对话流程的流程图以及包括至少一个用户意图的子流程图。
  4. 根据权利要求3所述的对话管理方法,其中,所述taskflow流程图存储于数据库中具体为:
    将所述taskflow流程图以JSON格式存储到数据库中。
  5. 根据权利要求1所述的对话管理方法,其中,所述根据用户的语音流数据识别出用户意图包括:
    获取用户的语音流数据;
    通过自动语音识别技术对所述语音流数据进行音转字转写识别,得到对应的文本数据;
    通过自然语言理解算法对所述文本数据进行处理,得到用户意图。
  6. 根据权利要求3所述的对话管理方法,其中,所述对所述taskflow流程图进行解析包括:
    采用Jackson对所述JSON格式的taskflow流程图进行解析,生成各个节点的java对象,得到对话流程。
  7. 根据权利要求1至6任一项所述的对话管理方法,其中,所述根据所述对话流程执行对话逻辑包括:
    从所述taskflow流程图中的第一个节点开始,先将TaskFlow流程图的根节点 入栈,从栈顶节点元素开始执行,判断当前节点是否是非叶子节点,如果当前节点是非叶子节点,则继续将一个子节点入栈;如果当前节点是叶子节点,则执行当前节点的操作,并返回对话状态信息;
    判断当前节点是否需要等待用户回复,如果不需要,返回执行状态信息;如果需要,创建用户的输入状态信息,并在识别到用户的回复信息后,对其槽位进行填充;
    将所述TaskFlow流程图中下一个被触发的节点入栈并执行对话逻辑,清除栈中已执行过的节点。
  8. 一种对话管理系统,其中,包括:
    流程绘制模块:用于根据对话逻辑绘制taskflow流程图,将所述taskflow流程图存储于数据库中;所述taskflow流程图由API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点以及JUDGE判断节点组成,且所述taskflow流程图包括对话流程中的至少一个用户意图;
    流程获取模块:用于在人机对话时,根据用户的语音流数据识别出用户意图,并根据所述用户意图获取taskflow流程图;
    流程解析模块:用于对所述taskflow流程图进行解析,获取对话流程;
    流程执行模块:用于根据所述对话流程执行对话逻辑。
  9. 一种终端,其中,所述终端包括处理器、与所述处理器耦接的存储器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如下步骤:根据对话逻辑绘制taskflow流程图,将所述taskflow流程图存储于数据库中;所述taskflow流程图由API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点以及JUDGE判断节点组成,且所述taskflow流程图包括对话流程中的至少一个用户意图;在人机对话时,根据用户的语音流数据识别出用户意图,并根据所述用户意图获取对应的taskflow流程图;对所述taskflow流程图进行解析,获取对话流程;根据所述对话流程执行对话逻辑。
  10. 根据权利要求9所述的终端,其中,所述根据对话逻辑绘制taskflow流程图包括:
    将所述API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点和JUDGE判断节点分别配置到指定位置,并根据对话逻辑对不同位置上的节点进行连线,得到绘制好的taskflow流程图;
    所述API接口节点用于在对话流程中通过远程调用获取业务信息;
    所述SLOTS填槽节点用于在执行对话流程中收集槽位信息并进行槽位填充;
    所述SCRIPT脚本节点用于在对话流程执行过程中通过内嵌的groovy脚本获取对话状态信息,并对对话状态信息进行控制修改;
    NLG回复节点:用于在对话流程执行过程中通过模板化的方式生成回复话术;
    所述JUDGE判断节点用于在对话流程执行过程中根据配置的条件表达式控制对话流程的走向。
  11. 根据权利要求10所述的终端,其中,所述根据对话逻辑绘制taskflow流程图还包括:
    所述taskflow流程图分为包括完整对话流程的流程图以及包括至少一个用户意图的子流程图。
  12. 根据权利要求11所述的终端,其中,所述taskflow流程图存储于数据库中具体为:
    将所述taskflow流程图以JSON格式存储到数据库中。
  13. 根据权利要求9所述的终端,其中,所述根据用户的语音流数据识别出用户意图包括:
    获取用户的语音流数据;
    通过自动语音识别技术对所述语音流数据进行音转字转写识别,得到对应的文本数据;
    通过自然语言理解算法对所述文本数据进行处理,得到用户意图。
  14. 根据权利要求9至13任一项所述的终端,其中,所述根据所述对话流程执行对话逻辑包括:
    从所述taskflow流程图中的第一个节点开始,先将TaskFlow流程图的根节点入栈,从栈顶节点元素开始执行,判断当前节点是否是非叶子节点,如果当前节点是非叶子节点,则继续将一个子节点入栈;如果当前节点是叶子节点,则执行当前节点的操作,并返回对话状态信息;
    判断当前节点是否需要等待用户回复,如果不需要,返回执行状态信息;如果需要,创建用户的输入状态信息,并在识别到用户的回复信息后,对其槽位进行填充;
    将所述TaskFlow流程图中下一个被触发的节点入栈并执行对话逻辑,清除栈中已执行过的节点。
  15. 一种存储介质,其中,存储有能够实现如下步骤的程序文件,所述步骤包括:根据对话逻辑绘制taskflow流程图,将所述taskflow流程图存储于数据库中;所述taskflow流程图由API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点以及JUDGE判断节点组成,且所述taskflow流程图包括对话流程中的至少一个用户意图;在人机对话时,根据用户的语音流数据识别出用户意图,并根据所述用户意图获取对应的taskflow流程图;对所述taskflow流程图进行解析,获取对话流程;根据所述对话流程执行对话逻辑。
  16. 根据权利要求15所述的存储介质,其中,所述根据对话逻辑绘制taskflow流程图包括:
    将所述API接口节点、SLOTS填槽节点、SCRIPT脚本节点、NLG回复节点和JUDGE判断节点分别配置到指定位置,并根据对话逻辑对不同位置上的节点进行连线,得到绘制好的taskflow流程图;
    所述API接口节点用于在对话流程中通过远程调用获取业务信息;
    所述SLOTS填槽节点用于在执行对话流程中收集槽位信息并进行槽位填充;
    所述SCRIPT脚本节点用于在对话流程执行过程中通过内嵌的groovy脚本获取对话状态信息,并对对话状态信息进行控制修改;
    NLG回复节点:用于在对话流程执行过程中通过模板化的方式生成回复话术;
    所述JUDGE判断节点用于在对话流程执行过程中根据配置的条件表达式控制对话流程的走向。
  17. 根据权利要求16所述的存储介质,其中,所述根据对话逻辑绘制taskflow流程图还包括:
    所述taskflow流程图分为包括完整对话流程的流程图以及包括至少一个用户意图的子流程图。
  18. 根据权利要求17所述的存储介质,其中,所述taskflow流程图存储于数据库中具体为:
    将所述taskflow流程图以JSON格式存储到数据库中。
  19. 根据权利要求17所述的存储介质,其中,所述根据用户的语音流数据识别 出用户意图包括:
    获取用户的语音流数据;
    通过自动语音识别技术对所述语音流数据进行音转字转写识别,得到对应的文本数据;
    通过自然语言理解算法对所述文本数据进行处理,得到用户意图。
  20. 根据权利要求15至19任一项所述的存储介质,其中,所述根据所述对话流程执行对话逻辑包括:
    从所述taskflow流程图中的第一个节点开始,先将TaskFlow流程图的根节点入栈,从栈顶节点元素开始执行,判断当前节点是否是非叶子节点,如果当前节点是非叶子节点,则继续将一个子节点入栈;如果当前节点是叶子节点,则执行当前节点的操作,并返回对话状态信息;
    判断当前节点是否需要等待用户回复,如果不需要,返回执行状态信息;如果需要,创建用户的输入状态信息,并在识别到用户的回复信息后,对其槽位进行填充;
    将所述TaskFlow流程图中下一个被触发的节点入栈并执行对话逻辑,清除栈中已执行过的节点。
PCT/CN2022/089566 2021-10-22 2022-04-27 一种对话管理方法、系统、终端及存储介质 WO2023065629A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111235000.6A CN113935337A (zh) 2021-10-22 2021-10-22 一种对话管理方法、系统、终端及存储介质
CN202111235000.6 2021-10-22

Publications (1)

Publication Number Publication Date
WO2023065629A1 true WO2023065629A1 (zh) 2023-04-27

Family

ID=79283877

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089566 WO2023065629A1 (zh) 2021-10-22 2022-04-27 一种对话管理方法、系统、终端及存储介质

Country Status (2)

Country Link
CN (1) CN113935337A (zh)
WO (1) WO2023065629A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628141A (zh) * 2023-07-24 2023-08-22 科大讯飞股份有限公司 信息处理方法、装置、设备及存储介质
CN117251553A (zh) * 2023-11-15 2023-12-19 知学云(北京)科技股份有限公司 基于自定义插件和大语言模型的智能学习交互方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113935337A (zh) * 2021-10-22 2022-01-14 平安科技(深圳)有限公司 一种对话管理方法、系统、终端及存储介质
CN114582314B (zh) * 2022-02-28 2023-06-23 江苏楷文电信技术有限公司 基于asr的人机音视频交互逻辑模型设计方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170213545A1 (en) * 2016-01-22 2017-07-27 Electronics And Telecommunications Research Institute Self-learning based dialogue apparatus and method for incremental dialogue knowledge
CN108763568A (zh) * 2018-06-05 2018-11-06 北京玄科技有限公司 智能机器人交互流程的管理方法、多轮对话方法及装置
CN110472030A (zh) * 2019-08-08 2019-11-19 网易(杭州)网络有限公司 人机交互方法、装置和电子设备
CN110704594A (zh) * 2019-09-27 2020-01-17 北京百度网讯科技有限公司 基于人工智能的任务型对话交互处理方法、装置
CN113935337A (zh) * 2021-10-22 2022-01-14 平安科技(深圳)有限公司 一种对话管理方法、系统、终端及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170213545A1 (en) * 2016-01-22 2017-07-27 Electronics And Telecommunications Research Institute Self-learning based dialogue apparatus and method for incremental dialogue knowledge
CN108763568A (zh) * 2018-06-05 2018-11-06 北京玄科技有限公司 智能机器人交互流程的管理方法、多轮对话方法及装置
CN110472030A (zh) * 2019-08-08 2019-11-19 网易(杭州)网络有限公司 人机交互方法、装置和电子设备
CN110704594A (zh) * 2019-09-27 2020-01-17 北京百度网讯科技有限公司 基于人工智能的任务型对话交互处理方法、装置
CN113935337A (zh) * 2021-10-22 2022-01-14 平安科技(深圳)有限公司 一种对话管理方法、系统、终端及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628141A (zh) * 2023-07-24 2023-08-22 科大讯飞股份有限公司 信息处理方法、装置、设备及存储介质
CN116628141B (zh) * 2023-07-24 2023-12-01 科大讯飞股份有限公司 信息处理方法、装置、设备及存储介质
CN117251553A (zh) * 2023-11-15 2023-12-19 知学云(北京)科技股份有限公司 基于自定义插件和大语言模型的智能学习交互方法
CN117251553B (zh) * 2023-11-15 2024-02-27 知学云(北京)科技股份有限公司 基于自定义插件和大语言模型的智能学习交互方法

Also Published As

Publication number Publication date
CN113935337A (zh) 2022-01-14

Similar Documents

Publication Publication Date Title
WO2023065629A1 (zh) 一种对话管理方法、系统、终端及存储介质
AU2017238633B2 (en) Efficient state machines for real-time dataflow programming
US11321535B2 (en) Hierarchical annotation of dialog acts
JP2023520420A (ja) チャットボットのために不均衡なトレーニングデータを取り扱うためのバッチング技術
CN112487157A (zh) 用于聊天机器人的基于模板的意图分类
JP7357166B2 (ja) 対話ロボット生成方法、対話ロボット管理プラットフォーム及び記憶媒体
CN112199477A (zh) 对话管理方案和对话管理语料的构建方法
US10673789B2 (en) Bot-invocable software development kits to access legacy systems
US20230095673A1 (en) Extracting key information from document using trained machine-learning models
CN111930912A (zh) 对话管理方法及系统、设备和存储介质
CN114186108A (zh) 一种面向电力物资业务场景的多模态人机交互系统
CN113378579A (zh) 一种语音录入结构化数据的方法、系统及电子设备
Sam et al. A robust methodology for building an artificial intelligent (ai) virtual assistant for payment processing
CN112582073B (zh) 医疗信息获取方法、装置、电子设备和介质
US20230139397A1 (en) Deep learning techniques for extraction of embedded data from documents
US11669307B2 (en) Code injection from natural language derived intent
US11876756B2 (en) Graph-based natural language generation for conversational systems
US11316807B2 (en) Microservice deployment in multi-tenant environments
CN115129865A (zh) 一种工单分类方法、装置、电子设备和存储介质
CN114925206A (zh) 人工智能体、语音信息识别方法、存储介质和程序产品
CN113518160A (zh) 视频生成方法、装置、设备及存储介质
CN111104118A (zh) 一种基于aiml的自然语言指令执行方法及系统
US20230186024A1 (en) Text Processing Method, Device and Storage Medium
CN116521155B (zh) 基于JSON描述动态生成Restful接口的方法
US20240061833A1 (en) Techniques for augmenting training data for aggregation and sorting database operations in a natural language to database query system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22882254

Country of ref document: EP

Kind code of ref document: A1