WO2021027198A1 - 语音对话处理方法及装置 - Google Patents

语音对话处理方法及装置 Download PDF

Info

Publication number
WO2021027198A1
WO2021027198A1 PCT/CN2019/123937 CN2019123937W WO2021027198A1 WO 2021027198 A1 WO2021027198 A1 WO 2021027198A1 CN 2019123937 W CN2019123937 W CN 2019123937W WO 2021027198 A1 WO2021027198 A1 WO 2021027198A1
Authority
WO
WIPO (PCT)
Prior art keywords
dialogue
voice
dialog
user
flow
Prior art date
Application number
PCT/CN2019/123937
Other languages
English (en)
French (fr)
Inventor
董鑫
戴中原
初敏
顾寒
Original Assignee
苏州思必驰信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州思必驰信息科技有限公司 filed Critical 苏州思必驰信息科技有限公司
Priority to EP19941639.7A priority Critical patent/EP4016330A4/en
Priority to US17/635,489 priority patent/US20220293089A1/en
Priority to JP2022510069A priority patent/JP7274043B2/ja
Publication of WO2021027198A1 publication Critical patent/WO2021027198A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • G06Q30/015Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk
    • G06Q30/016After-sales
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4936Speech interaction details
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/527Centralised call answering arrangements not requiring operator intervention
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0638Interactive procedures
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/39Electronic components, circuits, software, systems or apparatus used in telephone systems using speech synthesis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/40Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2203/00Aspects of automatic or semi-automatic exchanges
    • H04M2203/35Aspects of automatic or semi-automatic exchanges related to information services provided via a voice call
    • H04M2203/355Interactive dialogue design tools, features or methods

Definitions

  • This application belongs to the technical field of intelligent customer service, for example, it relates to a voice dialogue processing method and device.
  • AI Artificial Intelligence
  • Human-computer interaction is a technology that studies the interaction between humans and computer systems.
  • AI Artificial Intelligence
  • AI dialogue system is a new type of human-computer interaction, which uses natural speech or natural language for input and output.
  • AI dialogue systems have been widely used in scenarios such as smart phones, smart homes, smart cars, and smart customer service.
  • the dialogue business process and content required by different manufacturers are also different, so each manufacturer needs to customize its own voice dialogue system.
  • the complexity of business logic creates the complexity of the dialogue process development of the AI dialogue system, which leads to the inefficiency of the custom development of dialogue services.
  • the manufacturer needs to maintain and modify the business process or content, it may need to repeatedly develop or modify the code logic, which is not conducive to the later maintenance of the business process.
  • the present application provides a voice dialogue processing method and device, which are used to solve at least one of the above technical problems.
  • the embodiments of the present application provide a voice dialogue processing method, which is applied to a voice customer service server.
  • the method includes: determining the voice semantics corresponding to the user voice to be processed; and determining the voice semantics based on the dialogue management engine Reply sentence, the training sample set of the dialogue management engine is constructed according to a dialogue service customization file including at least one dialogue flow, and the dialogue flow includes a plurality of dialogue nodes in a set order; according to the determined reply sentence Generate a customer service voice for replying to the user voice.
  • the embodiment of the present application provides a voice customer service method, which is applied to a dialogue flow design server.
  • the method includes: obtaining a dialogue flow design request from a dialogue flow design client, and determining at least one corresponding to the dialogue flow design request A dialogue flow, wherein the dialogue flow includes a plurality of dialogue nodes with a set sequence; according to the at least one dialogue flow, a dialogue service customization file is generated; the dialogue service customization file is sent to the voice customer service server to construct a dialogue
  • the training sample set of the management engine enables the voice customer service server to perform voice customer service based on the dialogue management engine.
  • an embodiment of the present application provides a voice dialogue processing device, which includes: a voice semantic determination unit for determining the voice semantics corresponding to the user voice to be processed; a dialogue management engine invoking unit for determining based on the dialogue management engine For the voice-semantic reply sentence, the training sample set of the dialogue management engine is constructed according to a dialogue service customization file including at least one dialogue flow, and the dialogue flow includes a plurality of dialogue nodes in a set order;
  • the customer service voice generating unit is configured to generate a customer service voice for replying to the user's voice according to the determined reply sentence.
  • an embodiment of the present application provides a voice dialogue processing device, including: a dialogue flow determining unit, configured to obtain a dialogue flow design request from a dialogue flow design client, and determine at least one dialogue corresponding to the dialogue flow design request Flow, wherein the dialog flow includes a plurality of dialog nodes with a set sequence; a business customization file generating unit, configured to generate a dialog business customization file based on the at least one dialog flow; a business customization file sending unit, configured to send all The dialog service customization file is sent to the voice customer service server to construct a training sample set on the dialog management engine, so that the voice customer service server executes the voice customer service service based on the dialog management engine.
  • an embodiment of the present application provides an electronic device, which includes: at least one processor, and a memory communicatively connected with the at least one processor, wherein the memory stores the memory that can be used by the at least one processor.
  • the executed instructions are executed by the at least one processor, so that the at least one processor can execute the steps of the foregoing method.
  • an embodiment of the present application provides a storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the foregoing method are implemented.
  • This application uses a dialogue management engine to determine the answer sentences corresponding to the speech semantics of the user’s voice.
  • the training sample set of the dialogue management engine is based on a dialogue service customization file that includes at least one dialogue flow with multiple dialogue nodes in a set sequence. It is constructed; the dialog business customization file is simple and convenient to make and does not involve modification of code logic, making the development of dialog business customization easier and more efficient.
  • iterative operations on the dialog management engine can be completed, which is convenient for business Post-maintenance of the process.
  • FIG. 1 is a flowchart of a voice dialogue processing method according to an embodiment of the application
  • Figure 2 shows a flowchart of an example of a voice dialogue processing method applied to a voice customer service server
  • Fig. 3 shows a schematic diagram of an example of a dialog flow according to this embodiment.
  • FIG. 4 shows an exemplary schematic structural diagram of a voice dialogue processing method suitable for applying an embodiment of the present application
  • FIG. 5 shows a flow chart of the principle of modeling the dialogue management engine in the online dialogue system of the embodiment of the present application
  • Figure 6 shows a flow chart of a voice customer service method applied to the dialogue process design server
  • Figures 7A-7Z respectively show examples of screenshot interfaces of the dialog design client in different states during the process of building a dialog flow
  • Fig. 8 shows a structural block diagram of a voice dialogue processing device according to an embodiment of the present application.
  • program modules include routines, programs, objects, elements, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • This application can also be practiced in distributed computing environments. In these distributed computing environments, remote processing devices connected through a communication network perform tasks.
  • program modules can be located in local and remote computer storage media including storage devices.
  • module refers to related entities applied to a computer, such as hardware, a combination of hardware and software, software or software in execution, etc.
  • an element can be, but is not limited to, a process, a processor, an object, an executable element, an execution thread, a program, and/or a computer running on a processor.
  • applications or scripts running on the server, and the server can all be components.
  • One or more elements can be in the process and/or thread of execution, and the elements can be localized on one computer and/or distributed between two or more computers, and can be run by various computer-readable media .
  • the component can also be based on a signal with one or more data packets, for example, a signal from a data that interacts with another component in a local system, a distributed system, and/or through a signal on the Internet to interact with other systems. Local and/or remote process to communicate.
  • the term “engine” refers to a structure formed after encapsulating at least one algorithm model.
  • the term “intention” refers to the classification of sentences. For example, the user sentence “goodbye” should be classified as the intent "end of conversation”.
  • the exemplary process of the voice dialogue processing method of the embodiment of the present application involves communication interaction between multiple execution subjects, such as the dialogue flow design client 10, the dialogue flow design server 20, and the voice
  • the customer service server 30 and the user terminal 40 communicate with each other.
  • the client and the user terminal may be any type of terminal device such as a notebook computer, a tablet computer, a mobile phone, etc., for example, they may be a terminal device installed with a specific application program.
  • the server may refer to a server in a central communication network architecture or a master node device for providing services in a peer-to-peer communication network architecture.
  • the process 100 of the voice dialogue processing method of the embodiment of the present application includes:
  • Step 101 The dialog flow design client 10 generates a dialog flow design request.
  • the dialog flow design client 10 receives user operations of the manufacturer's voice service personnel, and generates a corresponding dialog flow design request according to the user operations.
  • the dialog flow design request may be a request instruction for one or more designed dialog flows, and the dialog flow may include a plurality of dialog nodes in a set order.
  • the voice service personnel of the manufacturer can draw a plurality of dialog processes for one dialog item on the dialog process design client 10, and generate a dialog flow design request according to the multiple dialog processes.
  • the dialog flow design client can be configured to have a graphical interface for the user to drag and adjust the dialog node frame, so that the voice service personnel of the manufacturer can quickly implement the construction of the dialog process by dragging the dialog node frame.
  • Step 102 The dialog flow design client 10 sends a dialog flow design request to the dialog flow design server 20.
  • Step 103 The dialogue flow design server 20 generates a dialogue service customization file according to the dialogue flow design request.
  • the dialogue flow design server 20 parses at least one dialogue flow in the dialogue flow design request, and automatically recognizes the node content and node type of each ordered node in the dialogue flow. Then, a dialog service customization file is generated, and the dialog service customization file may be a json file.
  • the dialog flow design server 20 can determine the dialog business customization file through multiple interactive operations (for example, multiple dialog flow design requests) with the dialog flow design client 10.
  • Step 104 The dialogue process design server 20 sends the dialogue service customization file to the voice customer service server 30.
  • Step 105 The voice customer service server 30 trains the dialogue management engine based on the dialogue service customization file.
  • the voice customer service server 30 may generate a training sample set for the dialogue management engine based on the dialogue service customization file, and then use the training sample set to train the dialogue management engine.
  • Step 106 The user terminal 40 sends the user's voice to the voice customer service server 30.
  • the conversation management engine after the conversation management engine is trained, it can go online.
  • the voice customer service service terminal after going online can receive the user voice from the user terminal to provide customer service for the user.
  • Step 107 The voice customer service server 30 determines the customer service voice for the user's voice.
  • the voice customer service server 30 can determine the voice semantics corresponding to the user's voice to be processed, for example, by applying ASR (Automatic Speech Recognition) functional modules and NLU (Natural Language Understanding) functional modules. Then, the voice customer service server 30 determines the answer sentence for the voice semantics by calling the dialogue management engine, and generates a customer service voice for answering the user's voice according to the determined answer sentence. In the process of generating the customer service voice using the reply sentence, the voice customer service server 30 may be implemented by applying the NLG (Natural Language Generation) function module and the TTS (Text To Speech, speech synthesis) function module.
  • NLG Natural Language Generation
  • Step 108 The voice customer service server 30 sends the customer service voice to the user terminal 40.
  • the above steps 106-108 can be repeated to provide customer service voices for multiple times until a satisfactory customer service service is provided to the user.
  • the embodiment shown in FIG. 1 is only used as an example, and some of the steps in the above process are optional or replaceable.
  • the voice customer service server may also directly obtain the dialogue service customization file locally.
  • the process of the voice dialogue processing method applied to the voice customer service server 30 includes:
  • Step 201 Determine the voice semantics corresponding to the user voice to be processed.
  • Step 202 Determine a reply sentence for voice semantics based on the dialog management engine, where the dialog management engine includes a general dialog management model and a business dialog management model.
  • Step 203 Generate a customer service voice for answering the user's voice according to the determined reply sentence.
  • the dialogue management engine includes a general dialogue management model for handling general conversations and a business dialogue management model for handling specialized services.
  • the general dialogue management model can be shared among different business customer service projects to reuse code logic, which improves the development efficiency of the customer service system.
  • the call In the call center scenario of intelligent customer service, the call is often accompanied by specific oral replies such as "um”, “ah”, and “hello”. In addition, the call is often accompanied by general conversation requests such as greetings, interruptions, repetitions, and clarifications.
  • specific oral replies such as "um”, “ah”, and "hello”.
  • general conversation requests such as greetings, interruptions, repetitions, and clarifications.
  • the current AI dialogue system rarely optimizes this type of conversation request in the call center scenario, which causes the customer service process to be too rigid and affect user experience.
  • the above-mentioned session request can be better processed and user experience can be improved.
  • Table 1 shows the description of the related processes involved in the general dialogue operations performed by each general purpose in the general purpose set.
  • the business dialogue management model is used to perform business operations including the following to determine the reply sentence: determine the target dialogue flow corresponding to the user's intent, and determine the target dialogue according to the determined target dialogue The dialogue node in the stream determines the reply sentence.
  • the dialog node includes a dialog start node, a user communication node, a user information identification node, and a slot filling node.
  • the conversation start node is the node where the conversation flow starts
  • the user communication node is the node where the customer service needs to communicate with the user
  • the user information identification node can identify user information by calling other service APIs (for example, through a function).
  • the slot filling node is the node that adds the finally collected information to the corresponding slot. In this way, the corresponding dialogue node in the target dialogue flow can be called based on the user's intention, so as to perform corresponding operations to determine the reply sentence.
  • the user information recognition node in the dialogue process directly identifies whether it is an old user. If it is an old user, it directly asks whether the user is in the historical express record Place an order at the place indicated in the box; if it is a new user, you can call the user communication node in the dialog flow to ask the user about the delivery location. Furthermore, after using the voice recognition technology to obtain the shipping address in the voice feedback from the user, the slot filling operation is completed. If the slot filling is not recognized or the slot filling is unsuccessful, it can be transferred to manual customer service.
  • an exemplary principle architecture suitable for applying the voice dialogue processing method of an embodiment of the present application includes an online AI dialogue system and an offline DM (Dialogue Management) customization platform as a whole.
  • the access service module of the online AI dialogue system interfaces with the call center of the user client through the sip protocol (session initiation protocol), so as to provide services for the user indicated by the user client.
  • the online AI dialogue system includes: ASR module, NLU module, NLG module, TTS module and DM engine.
  • ASR module ASR module
  • NLU module NLU module
  • NLG module NLG module
  • TTS module DM engine.
  • the DM engine is divided into business DM and general DM.
  • the dialog management in the AI dialog system is divided into two types: business DM and general DM.
  • Business DM is dialogue management related to specific business processes.
  • the general DM design is to handle the general dialogue management of the man-machine dialogue process, making the entire dialogue smoother, but does not involve specific business scenarios and knowledge.
  • the universal DM can be used as a universal module, embedded in the front side of any business DM, making the dialogue closer to the effect of chat communication between people. This design can not only improve the user experience, but also save the custom development cost of DMs in different scenarios.
  • Scene (Bot) It is a dialogue robot, which can be understood as a customer service project.
  • Dialogue flow the dialogue flow to complete a task.
  • a project consists of one or more conversation streams.
  • Dialogue node A round of dialogue in the dialogue flow, providing different node types, such as starting node, user communication node, slot filling node, user information recognition node, etc.
  • the starting node is the starting node of the dialogue flow
  • the user communication node is a round of dialogue
  • the slot filling node is a special node designed to collect slots to complete the slot information
  • the user information identification node is to access other service APIs Packaged.
  • the dialogue process is designed through the offline DM customization platform.
  • the offline DM customization platform includes a dialog design tool and a dialog test tool, providing a graphical drag-and-drop method for dialog design and testing.
  • the offline DM customization platform will generate a json format file for the dialogue of the project, and the online DM engine will load the file to produce an AI dialogue robot to serve online traffic.
  • the process of the voice customer service method applied to the dialogue process design server is implemented through the interactive operation of the dialogue process design server and the dialogue process design client to quickly customize the dialogue process and construct the corresponding dialogue business customization file.
  • the method includes:
  • Step 601 Obtain a dialog flow design request from the dialog flow design client, and determine at least one dialog flow corresponding to the dialog flow design request, where the dialog flow includes a plurality of dialog nodes in a set order.
  • Step 602 Generate a dialogue service customization file according to at least one dialogue flow.
  • Step 603 Send the dialogue service customization file to the voice customer service server to construct a training sample set on the dialogue management engine, so that the voice customer service server executes the voice customer service service based on the dialogue management engine.
  • the dialog flow design client is configured to have a graphical interface that can be dragged and adjusted by the user to adjust the dialog node frame, wherein the dialog flow design client 10 is used to receive the manufacturer’s developer to generate a corresponding user operation on the graphical interface. Describes the dialog flow design request. In this way, manufacturers can produce business dialogue streams by dragging and dropping through the development of a graphical client interface to produce corresponding json files to quickly complete business DM customization.
  • a dialog box will pop up, enter the "scene name” and "scene description".
  • a scene can represent a customer service project.
  • a new dialog flow is created. Click on the created scene "Express" to enter the following page, followed by the following pages on the left: 1) Scene configuration: including scene name, product ID, global variable addition, general process addition, etc.; 2) Dialogue flow customization: draw dialogue on this page Flow; 3) Intent customization: add the intent parsed by the user’s utterance; 4) Dialogue test: Test the built model; 5) bot debugging: Modify and modify the code automatically generated by the dialogue flow model built by the visualization tool debugging.
  • the json file can be saved locally through "Scene Configuration-DM File Export".
  • the constructed dialog stream may also be tested locally to ensure the completeness of the function. As shown in Figure 7Z, the test operation is performed by "clicking the dialogue test-test release-input user voice".
  • a rapid customization platform which can greatly accelerate the speed of call center intelligent voice development, greatly reduce the development cost, and thus affect the intelligence level of the entire call center industry.
  • the voice dialogue processing apparatus 800 of an embodiment of the present application includes a voice semantic determining unit 810, a dialogue management engine invoking unit 820, and a customer service voice generating unit 830.
  • the voice semantic determining unit 810 is used to determine the voice semantics corresponding to the user voice to be processed
  • the dialog management engine invoking unit 820 is used to determine a reply sentence for the voice semantics based on the dialog management engine, and the training of the dialog management engine
  • the sample set is constructed according to a dialogue service customization file that includes at least one dialogue flow, and the dialogue flow includes a plurality of dialogue nodes in a set order
  • the customer service voice generation unit 830 is used to generate data for The customer service voice that answers the user voice.
  • the device in the foregoing embodiment of the present application can be used to execute the corresponding method embodiment of the present application, and accordingly achieve the technical effects achieved by the foregoing method embodiment of the present application, which will not be repeated here.
  • a hardware processor may be used to implement related functional modules.
  • an embodiment of the present application provides a storage medium on which a computer program is stored, and the program is executed by a processor to execute the steps of the skill local management method performed on the server as above.
  • the clients in the embodiments of this application exist in various forms, including but not limited to:
  • Mobile communication equipment This type of equipment is characterized by mobile communication functions, and its main goal is to provide voice and data communications.
  • Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
  • Ultra-mobile personal computer equipment This type of equipment belongs to the category of personal computers, has calculation and processing functions, and generally also has mobile Internet features.
  • Such terminals include: PDA, MID and UMPC devices, such as iPad.
  • Portable entertainment equipment This type of equipment can display and play multimedia content.
  • Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, as well as smart toys and portable car navigation devices.
  • the device embodiments described above are merely illustrative.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each implementation manner can be implemented by means of software plus a general hardware platform, and of course, it can also be implemented by hardware.
  • the above technical solution essentially or the part that contributes to the related technology can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk , CD-ROM, etc., including a number of instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in each embodiment or some parts of the embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Telephonic Communication Services (AREA)
  • Machine Translation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种语音对话处理方法及装置,所述语音对话处理方法包括:确定待处理的用户语音所对应的语音语义(S201);基于对话管理引擎确定针对所述语音语义的答复语句(S202),所述对话管理引擎的训练样本集是根据包括至少一个对话流的对话业务定制文件而构建的,且所述对话流包括具有设定顺序的多个对话节点;根据所确定的答复语句生成用于答复所述用户语音的客服语音(S203)。

Description

语音对话处理方法及装置
本公开要求在2019年08月15日提交中国专利局、申请号为201910752714.0的中国专利申请的优先权,以上申请的全部内容通过引用结合在本公开中。
技术领域
本申请属于智能客服技术领域,例如涉及一种语音对话处理方法及装置。
背景技术
人机交互是研究人和计算机系统之间的交互关系的技术。其中,AI(Artificial Intelligence,人工智能)对话系统是一种新型的人机交互方式,通过自然语音或自然语言形式进行输入输出。目前,AI对话系统已广泛应用于智能手机、智能家居、智能车载、智能客服等场景。
然而,不同的厂商所需求的对话业务流程和内容也都是存在区别的,所以各个厂商都需要定制属于自己的语音对话系统。目前,业务逻辑复杂性造就AI对话系统对话流程开发的复杂性,而导致了对话业务定制开发的低效。另外,当厂商需要对业务流程或内容进行维护修改时,可能需要重复开发或修改代码逻辑,不利于对业务流程的后期维护。
发明内容
本申请提供一种语音对话处理方法及装置,用于至少解决上述技术问题之一。
第一方面,本申请实施例提供一种语音对话处理方法,应用于语音客服服务端,该方法包括:确定待处理的用户语音所对应的语音语义;基于对话管理引擎确定针对所述语音语义的答复语句,所述对话管理引擎的训练样本集是根据包括至少一个对话流的对话业务定制文件而构建的,且所述对话流包括具有设定顺序的多个对话节点;根据所确定的答复语句生成用于答复所述用户语音的客服语音。
第二方面,本申请实施例提供一种语音客服方法,应用于对话流程设计服务端,该方法包括:获取自对话流程设计客户端的对话流程设计请求,并确定对应所述对话流程设计请求的至少一个对话流,其中所述对话流包括具有设定顺序的多个对话节点;根据所述至少一个对话流,生成对话业务定制文件;发送所述对话业务定制文件至语音客服服务端以构建关于对话管理引擎的训练样本集,使得所述语音客服服务端基于所述对话管理引擎执行语音客服服务。
第三方面,本申请实施例提供一种语音对话处理装置,包括:语音语义确定单元,用于确定待处理的用户语音所对应的语音语义;对话管理引擎调用单元,用于基于对话管理引擎确定针对所述语音语义的答复语句,所述对话管理引擎的训练样本集是根据包括至少一个对话流的对话业务定制文件而构建的,且所述对话流包括具有设定顺序的多个对话节点;客服语音生成单元,用于根据所确定的答复语句生成用于答复所述用户语音的客服语音。
第四方面,本申请实施例提供一种语音对话处理装置,包括:对话流确定单元,用于获取自对话流程设计客户端的对话流程设计请求,并确定对应所述对话流程设计请求的至少一个对话流,其中所述对话流包括具有设定顺序的多个对话节点;业务定制文件生成单元,用 于根据所述至少一个对话流,生成对话业务定制文件;业务定制文件发送单元,用于发送所述对话业务定制文件至语音客服服务端以构建关于对话管理引擎的训练样本集,使得所述语音客服服务端基于所述对话管理引擎执行语音客服服务。
第五方面,本申请实施例提供一种电子设备,其包括:至少一个处理器,以及与所述至少一个处理器通信连接的存储器,其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述方法的步骤。
第六方面,本申请实施例提供一种存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述方法的步骤。
本申请利用对话管理引擎来确定用户语音的语音语义所对应的答复语句,这里对话管理引擎的训练样本集是根据包括至少一个具有设定顺序的多个对话节点的对话流的对话业务定制文件来构建的;对话业务定制文件制作简单方便,不涉及代码逻辑的修改,使得对话业务定制开发的更简便和高效,另外,通过更换业务定制文件就能够完成对对话管理引擎的迭代操作,便于对业务流程的后期维护。
附图说明
为图1为本申请实施例的语音对话处理方法的流程图;
图2示出了应用于语音客服服务端的语音对话处理方法在一示例中的流程图;
图3示出了根据本实施例对话流程在一示例中的示意图。
图4示出了适于应用本申请一实施例的语音对话处理方法的示例性的原理架构示意图;
图5示出了本申请实施例的线上对话系统中对话管理引擎进行建模的原理流程图;
图6示出了应用于对话流程设计服务端的语音客服方法的流程图;
图7A-7Z分别示出了在搭建对话流程的过程中对话设计客户端在不同状态下的截图界面的示例;
图8示出了本申请实施例的语音对话处理装置的结构框图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、元件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
在本申请中,“模块”、“系统”等等指应用于计算机的相关实体,如硬件、硬件和软件的组合、软件或执行中的软件等。详细地说,例如,元件可以、但不限于是运行于处理器的过程、处理器、对象、可执行元件、执行线程、程序和/或计算机。还有,运行于服务器上的应用程序或脚本程序、服务器都可以是元件。一个或多个元件可在执行的过程和/或线程中,并且元件可以在一台计算机上本地化和/或分布在两台或多台计算机之间,并可以由各种计算机可读介质运行。元件还可以根据具有一个或多个数据包的信号,例如,来自一个与本地系统、分布式系统中另一元件交互的,和/或在因特网的网络通过信号与其它系统交互的数据的信号通过本地和/或远程过程来进行通信。
最后,还需要说明的是,在本文中,术语“包括”、“包含”,不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
在本文中,术语“引擎”表示对至少一个算法模型进行封装之后所形成的结构。术语“意图”表示针对语句的归类,例如用户语句“再见”应归类于意图“会话结束”。
如图1所示,本申请实施例的语音对话处理方法的示例性流程,其涉及在多个执行主体之间的通信交互,例如在对话流程设计客户端10、对话流程设计服务端20、语音客服服务端30和用户终端40之间进行通信交互。这里,客户端和用户终端可以是任意类型的诸如笔记本电脑、平板电脑、手机之类的终端设备,例如其可以是安装有特定应用程序的终端设备。另外,服务端可表示中心通信网络架构中的服务器或对等通信网络架构中的用于提供服务的主节点设备。
参照图1,本申请实施例的语音对话处理方法的流程100包括:
步骤101、由对话流程设计客户端10生成对话流程设计请求。
其中,对话流程设计客户端10接收厂商语音业务人员的用户操作,并根据用户操作生成对应的对话流程设计请求。这里,对话流程设计请求可以是针对已设计的一个或多个的对话流的请求指令,对话流可以包括具有设定顺序的多个对话节点。
示例性地,厂商语音业务人员可通过在对话流程设计客户端10上绘制针对一个对话项目的多个对话流程,并根据此多个对话流程生成对话流程设计请求。另外,对话流程设计客户端可被配置成具有可供用户拖拽调整对话节点框的图形化界面,这样厂商语音业务人员可通过拖拽对话节点框来较快捷地实现构建对话流程。
步骤102、对话流程设计客户端10将对话流程设计请求发送至对话流程设计服务端20。
步骤103、对话流程设计服务端20根据对话流程设计请求生成对话业务定制文件。
示例性地,对话流程设计服务端20解析对话流程设计请求中的至少一个对话流,并自动识别对话流中的各个有序节点的节点内容和节点类型。进而生成对话业务定制文件,该对话业务定制文件可以是json文件。这里,对话流程设计服务端20可通过与对话流 程设计客户端10之间的多次交互操作(例如多个对话流程设计请求)来确定对话业务定制文件。
步骤104、对话流程设计服务端20发送对话业务定制文件至语音客服服务端30。
步骤105、语音客服服务端30基于对话业务定制文件训练对话管理引擎。
这里,语音客服服务端30可基于对话业务定制文件来生成针对对话管理引擎的训练样本集,进而利用训练样本集来训练对话管理引擎。
步骤106、用户终端40发送用户语音至语音客服服务端30。
这里,在对话管理引擎被训练完成之后就可以上线了。此时,上线之后的语音客服服务端可接收自用户终端的用户语音,以为用户提供客服服务。
步骤107、语音客服服务端30确定针对用户语音的客服语音。
,语音客服服务端30可确定待处理的用户语音所对应的语音语义,例如通过应用ASR(Automatic Speech Recognition,语音识别)功能模块和NLU(Natural Language Understanding,自然语言理解)功能模块来实现。然后,语音客服服务端30通过调用对话管理引擎来确定针对语音语义的答复语句,并根据所确定的答复语句生成用于答复用户语音的客服语音。在利用答复语句生成客服语音的过程中,语音客服服务端30可以是通过应用NLG(Natural Language Generation,自然语言生成)功能模块和TTS(Text To Speech,语音合成)功能模块来实现的。
步骤108、语音客服服务端30将客服语音发送至用户终端40。
可理解的是,上述步骤106-108可重复进行,以多次提供客服语音,直到为用户提供满意的客服服务为止。另外,图1所示的实施例仅用作示例,上述流程中的部分步骤是可选的或可替换的,例如语音客服服务端还可以是在本地直接得到对话业务定制文件。
如图2所示,本申请实施例的应用于语音客服服务端30的语音对话处理方法在一示例中的流程,包括:
步骤201、确定待处理的用户语音所对应的语音语义。
步骤202、基于对话管理引擎确定针对语音语义的答复语句,其中对话管理引擎包括通用对话管理模型和业务对话管理模型。
步骤203、根据所确定的答复语句生成用于答复用户语音的客服语音。
在本实施例中,对话管理引擎包括用于处理通用会话的通用对话管理模型和用于处理专门业务的业务对话管理模型。通用对话管理模型可以在不同的多个业务客户服务项目中进行共享以复用代码逻辑,提高了客服系统的开发效率。
在智能客服的呼叫中心场景中,电话中往往伴随有“嗯”、“啊”、“喂”此类特定的电话口语回复。并且,电话过程中常伴有用户进行问候、打断、重复、澄清等通用型的会话请求。然而,目前的AI对话系统在呼叫中心场景下很少对此类对此类会话请求进行优化处理,导致客服服务过程过于僵硬而影响用户体验。
通过本申请实施例中的通用对话管理模型,能够较佳地处理上述会话请求,并提高 用户体验。确定用户语音的语音语义所指示的用户意图,并当用户意图属于通用意图集时,使用通用对话管理模型执行针对所述用户意图的通用对话操作,其中通用对话操作包括以下中的任意一种:转人工操作、重复播报操作、退出对话操作和叹词处理操作。
如表1示出了针对通用意图集中各个通用意图所执行的通用对话操作中所涉及的相关流程的描述。
Figure PCTCN2019123937-appb-000001
表1
在一些实施方式中,当用户意图不属于通用意图集时,使用业务对话管理模型执行包括以下的业务操作来确定答复语句:确定与用户意图相对应的目标对话流,以及根据所确定的目标对话流中的对话节点来确定答复语句。
示例性地,对话节点包括对话起始节点、用户沟通节点、用户信息识别节点和填槽节点。其中,对话起始节点是对话流起始的节点,用户沟通节点为客服需要与用户进行沟通的节点,用户信息识别节点可以是通过(例如通过函数)将其他服务API调用对用户信息进行识别,填槽节点是将最终所收集的信息添加至对应的槽位的节点。这样,基于用户意图能够调用目标对话流中的相应对话节点,从而执行相应的操作来确定答复语句。
结合如图3所示的对话流程,当用户语音被接入至客服平台时,通过对话流程中的用户信息识别节点直接识别是否是老用户,若是老用户,则直接询问用户是否在历史快递记录中所指示的地点下单;若是新用户,则可调用对话流程中的用户沟通节点来询问用户发货地点。进而,利用语音识别技术得到用户反馈语音中的发货地址之后,完成填槽操作,若未识别或填槽不成功,则可以转人工客服处理。
如图4所示,适于应用本申请一实施例的语音对话处理方法的示例性的原理架构, 整体上包括线上AI对话系统和线下DM(Dialogue Management,对话管理)定制平台。这里,线上AI对话系统的接入服务模块通过sip协议(session initiation protocol,会话初始协议)与用户客户端的call center进行对接,从而为用户客户端所指示的用户提供服务。
另外,线上AI对话系统内部有几个大的组件模块,包括:ASR模块、NLU模块、NLG模块、TTS模块和DM引擎。其中,DM引擎分为业务DM和通用DM。
在本申请实施例中,针对呼叫中心场景,将AI对话系统中的对话管理分成业务型DM和通用型DM两种。业务DM是与具体业务流程相关对话管理。与之对应的,通用DM设计是为了处理人机对话过程通用的对话管理,让整个对话更加顺畅,但不涉及具体的业务场景和知识。
在呼叫中心场景的电话对话中,有很多比较通用的对话,比如问候、打断、重复、澄清等。这些都可采用通用DM来处理这些通用型的会话逻辑,如表1所示。可理解的是,通用DM可以作为一个通用模块,嵌入到任何业务DM的前侧,使对话更接近人与人之间的聊天沟通的效果。这种设计既可以改进用户体验,又可以节省不同场景DM的定制开发成本。
如图5所示,本申请实施例的线上AI对话系统中DM引擎进行建模的原理过程。为了实现业务DM的快速定制,将对话管理中的关键要素进行建模,使得业务型对话流程可用json文件方式描述。具体描述如下:
场景(Bot):即对话机器人,可以理解为某个客服项目。
对话流(Flow):完成某个任务的对话流程。一个项目由一个或多个对话流组成。
对话节点(Node):对话流中某一轮对话,提供不同节点类型,如起始节点、用户沟通节点、填槽节点、用户信息识别节点等。起始节点为该对话流的起始节点,用户沟通节点为一轮对话,填槽节点是为了收集槽位而设计的特殊节点以补全槽位信息,用户信息识别节点是把其他服务API访问进行了封装。
具体操作(Operation):对话节点中所涉及的具体操作,如该对话节点回复给用户的话术、用到的解析用户说话的意图列表、跳转逻辑等。因此,在确定对话节点之后,可以直接依据对话节点所指示的内容和类型来进行相应的具体操作。
结合如图4的示例,通过线下DM定制平台设计出了对话流程。线下DM定制平台包括对话设计工具和对话测试工具,提供图形化拖拽的方式进行对话设计和测试。线下DM定制平台会把项目的对话生成json格式的文件,线上DM引擎加载该文件,生产AI对话机器人,服务于线上流量。
如图6所示,应用于对话流程设计服务端的语音客服方法的流程,通过对话流程设计服务端与对话流程设计客户端的交互操作实现快速定制对话流程和构建相应的对话业务定制文件。该方法包括:
步骤601、获取自对话流程设计客户端的对话流程设计请求,并确定对应对话流程 设计请求的至少一个对话流,其中对话流包括具有设定顺序的多个对话节点。
步骤602、根据至少一个对话流,生成对话业务定制文件。
步骤603、发送对话业务定制文件至语音客服服务端以构建关于对话管理引擎的训练样本集,使得语音客服服务端基于对话管理引擎执行语音客服服务。
对话流程设计客户端被配置成具有可供用户拖拽调整对话节点框的图形化界面,其中对话流程设计客户端10用于接收厂商开发人员针对所述图形化界面的用户操作而生成对应的所述对话流程设计请求。这样,厂商可以通过开发客户端图形化界面拖拽方式生产业务型对话流,以生产相应的json文件,快速完成业务型DM定制。
以下将结合如图7A-7Z中的开发客户端截图来描述在图3所示的具体快递客服场景下在对话流程设计客户端10中进行DM搭建的示例。
如图7A所示,点击左侧创建新场景部分,弹出对话框,输入“场景名称”和“场景描述”。这里,一个场景可以表示一个客服项目。
接着,如图7B所示,新建对话流。点击已创建的场景“快递”,进入如下页面,左侧依次为:1)场景配置:包括场景名称、产品ID、全局变量添加、通用流程添加等;2)对话流定制:在此页面绘制对话流;3)意图定制:添加用户话语所解析出的意图;4)对话测试:对搭建的模型进行测试;5)bot调试:对由可视化工具搭建出的对话流模型自动生成的代码进行修改和调试。
当点击对话流程配置,进入如图7C所示的用户界面。此时,新建对话流“取件地址收集”,点击进入,界面如下图,右侧提供四种节点,分别为:开始节点(或起始节点)、对话节点(或用户沟通节点)、函数节点(或用户信息识别节点)和填槽节点。
在节点配置的过程中,首先添加开始节点所对应的“开场白”(如图7D)。结合如图7E所示的界面截图,具体操作包括:1)点击右侧“开始节点”,在面板添加开始节点;2)点击开始节点上的“编辑”;3)通过“基本设置”:修改节点名称为“开场白”,添加话术文本“你好,这里是xx快递”;4)通过“跳转连线”:添加新的跳转逻辑-跳转至,选择下一节点进行跳转(需添加下一节点后才可跳转,可先跳过此步)。
接着,添加函数节点--“新老客户判断”。如图7F所示,点击“函数节点”,在面板创建函数节点。包括:1)点击编辑,在基本设置中,修改节点名称为“新老客户判断”,添加函数描述“判断此手机号是否拥有历史订单”,并保存;2)完成由开始节点-函数节点跳转:选择“新老客户判断”进行跳转,可以看到对话节点之间的连线出现;3)转至“场景配置”,新建两个全局变量:phone_num、historical_orders。同时在此页面可以添加通用DM流程,如图7G,勾选我们可能用到的通用流程;4)返回对话流,如图7H,编辑“函数节点”基本设置,输入参数设置“phone_num”,返回参数设置“historical_orders”。
接着,如图7I-M所示,添加用户沟通节点--“此地点下单?”和“询问发货地”。1)添加两个用户沟通节点,通过“编辑-基本设置-节点名称”重命名为“此地点下单?” 和“询问发货地”。2)编辑“新老客户判断”,设置跳转连线。点击下方“添加新的跳转逻辑”,条件设置“全局变量历史订单状态isNotEmpty”跳转至“此地点下单?”;再次点击“添加新的跳转逻辑”,条件设置“全局变量历史订单状态isEmpty”跳转至“询问发货地”。保存。3)如图7J所示,转到“意图定制”,添加新的业务意图,新增业务意图“是”、“否”。4)返回对话流程定制,编辑对话节点“此地点下单?”,输入话术文本“系统查询到近期您曾经在XX市XX区XX路XX号寄送过物品,是否仍使用此地址?”。5)添加业务意图“是”、“否”。6)编辑“询问发货地”,添加话术“请问您的发货地点是哪儿?”。
接着,如图7N和7O所示,添加对话节点--“收到地址信息”。1)添加播报对话节点“收到地址信息”;修改基本设置,添加话术“好的,请您耐心等待快递小哥上门取件,祝您生活愉快”;修改“播放后操作”为“对话完成”,保存。2)编辑“此地点下单?”设置跳转连线:条件设置“业务意图contains确定”跳转至“收到地址信息”;条件设置“业务意图contains否定”跳转至“询问发货地”。
接着,如图7P-7U所示,添加填槽节点--“填槽:发货地”。1)添加填槽节点,重命名“填槽:发货地”,编辑对话节点“询问发货地”,设置跳转连线,跳转至“填槽:发货地”。2)转到场景配置,新建全局变量“发货_城市”、“发货_区县”、“发货_街道”。3)转到意图定制,新建意图“只说了街道”、“只说了区县”、“只说了城市”、“只说了街道和区县”、“只说了区县和城市”、“只说了街道和城市”。4)返回对话流定制,编辑“填槽:发货地”,点击基本设置,添加以上六个意图。5)编辑“填槽:发货地”,点击槽位配置,添加新的槽位变量,依次设置:需要填槽的变量-发货_城市、必选、意图-“只说了街道”、“只说了区县”、“只说了街道和区县”;输入反问话术“你在哪个城市呢?”;最多反问次数设置为2。6)同样的方式添加另外两个槽位变量。
接着,如图7V和7W,添加对话节点--“转人工”。包括1)添加对话节点“转人工”。设置话术,选择对话完成。2)编辑“填槽:发货地”,点击跳转连线。如果条件“填槽成功”跳转至“收到地址信息”;如果条件“填槽失败”跳转至“转人工”。由此,搭建完成以上对话流。最终所的到的对话流的结果如图7X所示。
接着,保存本地。如图7Y,可以通过“场景配置-DM文件导出”将json文件保存至本地。
在一些实施方式中,还可以是在本地对所构建的对话流进行测试,以保证功能的完善性。如图7Z所示,通过“点击对话测试-测试发布-输入用户语音”来执行测试操作。
在本申请实施例中,提供了快速定制平台,可以大大加速呼叫中心智能语音开发速度,大大降低开发成本,从而影响整个呼叫中心行业的智能化水平。
如图8所示,本申请一实施例的语音对话处理装置800,包括语音语义确定单元810、对话管理引擎调用单元820和客服语音生成单元830。这里,语音语义确定单元810用 于确定待处理的用户语音所对应的语音语义;对话管理引擎调用单元820用于基于对话管理引擎确定针对所述语音语义的答复语句,所述对话管理引擎的训练样本集是根据包括至少一个对话流的对话业务定制文件而构建的,且所述对话流包括具有设定顺序的多个对话节点;客服语音生成单元830用于根据所确定的答复语句生成用于答复所述用户语音的客服语音。
上述本申请实施例的装置可用于执行本申请中相应的方法实施例,并相应的达到上述本申请方法实施例所达到的技术效果,这里不再赘述。
本申请实施例中可以通过硬件处理器(hardware processor)来实现相关功能模块。
另一方面,本申请实施例提供一种存储介质,其上存储有计算机程序,该程序被处理器执行如上在服务器所执行的技能本地管理方法的步骤。
上述产品可执行本申请实施例所提供的方法,具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本申请实施例所提供的方法。
本申请实施例的客户端以多种形式存在,包括但不限于:
(1)移动通信设备:这类设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。
(2)超移动个人计算机设备:这类设备属于个人计算机的范畴,有计算和处理功能,一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC设备等,例如iPad。
(3)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod),掌上游戏机,电子书,以及智能玩具和便携式车载导航设备。
(4)其他具有数据交互功能的电子装置。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。

Claims (10)

  1. 一种语音对话处理方法,应用于语音客服服务端,该方法包括:
    确定待处理的用户语音所对应的语音语义;
    基于对话管理引擎确定针对所述语音语义的答复语句,所述对话管理引擎的训练样本集是根据包括至少一个对话流的对话业务定制文件而构建的,且所述对话流包括具有设定顺序的多个对话节点;
    根据所确定的答复语句生成用于答复所述用户语音的客服语音。
  2. 根据权利要求1所述的方法,其中,所述对话管理引擎包括通用对话管理模型和业务对话管理模型,
    其中,基于对话管理引擎确定针对所述语音语义的答复语句包括:
    确定所述语音语义所指示的用户意图;以及
    当所述用户意图属于通用意图集时,使用所述通用对话管理模型执行针对所述用户意图的通用对话操作,其中所述通用对话操作包括以下中的任意一种:转人工操作、重复播报操作、退出对话操作和叹词处理操作。
  3. 根据权利要求2所述的方法,还包括:
    当所述用户意图不属于所述通用意图集时,使用所述业务对话管理模型执行包括以下的业务操作来确定所述答复语句:
    确定与所述用户意图相对应的目标对话流;
    根据所确定的目标对话流中的对话节点来确定所述答复语句。
  4. 根据权利要求1所述的方法,其中,所述对话业务定制文件是从对话流程设计服务端所获得的,以及所述对话流程设计服务端被配置成通过与对话流程设计客户端交互以构建所述对话业务定制文件。
  5. 根据权利要求1-4所述的方法,其中,所述对话节点包括:对话起始节点、用户沟通节点、用户信息识别节点和填槽节点。
  6. 一种语音客服方法,应用于对话流程设计服务端,该方法包括:
    获取自对话流程设计客户端的对话流程设计请求,并确定对应所述对话流程设计请求的至少一个对话流,其中所述对话流包括具有设定顺序的多个对话节点;
    根据所述至少一个对话流,生成对话业务定制文件;
    发送所述对话业务定制文件至语音客服服务端以构建关于对话管理引擎的训练样本集,使得所述语音客服服务端基于所述对话管理引擎执行语音客服服务。
  7. 根据权利要求6所述的方法,其中,所述对话流程设计客户端被配置成具有可供用户拖拽调整对话节点框的图形化界面,其中所述对话流程设计客户端用于接收针对所述图形化界面的用户操作而生成对应的所述对话流程设计请求。
  8. 一种语音对话处理装置,包括:
    语音语义确定单元,用于确定待处理的用户语音所对应的语音语义;
    对话管理引擎调用单元,用于基于对话管理引擎确定针对所述语音语义的答复语句,所述对话管理引擎的训练样本集是根据包括至少一个对话流的对话业务定制文件而构建的,且所述对话流包括具有设定顺序的多个对话节点;
    客服语音生成单元,用于根据所确定的答复语句生成用于答复所述用户语音的客服语音。
  9. 一种电子设备,包括:至少一个处理器,以及与所述至少一个处理器通信连接的存储器,其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-7中任一项所述方法的步骤。
  10. 一种存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现权利要求1-7中任一项所述方法的步骤。
PCT/CN2019/123937 2019-08-15 2019-12-09 语音对话处理方法及装置 WO2021027198A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP19941639.7A EP4016330A4 (en) 2019-08-15 2019-12-09 VOICE DIALOGUE PROCESSING METHOD AND APPARATUS
US17/635,489 US20220293089A1 (en) 2019-08-15 2019-12-09 Voice dialogue processing method and apparatus
JP2022510069A JP7274043B2 (ja) 2019-08-15 2019-12-09 音声会話処理方法及び装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910752714.0A CN110442701B (zh) 2019-08-15 2019-08-15 语音对话处理方法及装置
CN201910752714.0 2019-08-15

Publications (1)

Publication Number Publication Date
WO2021027198A1 true WO2021027198A1 (zh) 2021-02-18

Family

ID=68435649

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/123937 WO2021027198A1 (zh) 2019-08-15 2019-12-09 语音对话处理方法及装置

Country Status (5)

Country Link
US (1) US20220293089A1 (zh)
EP (1) EP4016330A4 (zh)
JP (1) JP7274043B2 (zh)
CN (1) CN110442701B (zh)
WO (1) WO2021027198A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113067952A (zh) * 2021-03-31 2021-07-02 中国工商银行股份有限公司 用于多台机器人的人机协同无感控制方法和装置
CN113326365A (zh) * 2021-06-24 2021-08-31 中国平安人寿保险股份有限公司 回复语句生成方法、装置、设备及存储介质
CN113506565A (zh) * 2021-07-12 2021-10-15 北京捷通华声科技股份有限公司 语音识别的方法、装置、计算机可读存储介质与处理器
CN114582314A (zh) * 2022-02-28 2022-06-03 江苏楷文电信技术有限公司 基于asr的人机音视频交互逻辑模型设计方法
CN115659994A (zh) * 2022-12-09 2023-01-31 深圳市人马互动科技有限公司 人机交互系统中的数据处理方法及相关装置

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442701B (zh) * 2019-08-15 2022-08-05 思必驰科技股份有限公司 语音对话处理方法及装置
CN111128147A (zh) * 2019-11-18 2020-05-08 云知声智能科技股份有限公司 一种终端设备自动接入ai多轮对话能力的系统及方法
CN111107156A (zh) * 2019-12-26 2020-05-05 苏州思必驰信息科技有限公司 用于主动发起对话的服务端处理方法及服务器、能够主动发起对话的语音交互系统
CN111182117B (zh) * 2019-12-30 2021-07-13 深圳追一科技有限公司 通话处理方法、装置、计算机设备和计算机可读存储介质
CN111402872B (zh) * 2020-02-11 2023-12-19 升智信息科技(南京)有限公司 用于智能语音对话系统的语音数据处理方法及装置
CN111654582A (zh) * 2020-06-05 2020-09-11 中国银行股份有限公司 一种智能外呼方法及装置
CN111683182B (zh) * 2020-06-11 2022-05-27 中国银行股份有限公司 一种业务节点的处理方法及系统
CN111916111B (zh) * 2020-07-20 2023-02-03 中国建设银行股份有限公司 带情感的智能语音外呼方法及装置、服务器、存储介质
CN111653262B (zh) * 2020-08-06 2020-11-17 上海荣数信息技术有限公司 一种智能语音交互系统及方法
CN112364140B (zh) * 2020-11-04 2022-09-13 北京致远互联软件股份有限公司 一种通过配置单实现语音识别意图定制的方法
CN113064987A (zh) * 2021-04-30 2021-07-02 中国工商银行股份有限公司 数据处理方法、装置、电子设备、介质和程序产品
CN114493513B (zh) * 2022-01-14 2023-04-18 杭州盈兴科技有限公司 一种基于语音处理的酒店管理方法、装置和电子设备
CN114691852B (zh) * 2022-06-01 2022-08-12 阿里巴巴达摩院(杭州)科技有限公司 人机对话系统及方法
CN116476092B (zh) * 2023-04-26 2024-01-23 上饶高投智城科技有限公司 基于asr及nlp技术实现小区智慧服务的方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063823A1 (en) * 2008-09-09 2010-03-11 Industrial Technology Research Institute Method and system for generating dialogue managers with diversified dialogue acts
CN105845137A (zh) * 2016-03-18 2016-08-10 中国科学院声学研究所 一种语音对话管理系统
CN108664568A (zh) * 2018-04-24 2018-10-16 科大讯飞股份有限公司 语义技能创建方法及装置
CN109408800A (zh) * 2018-08-23 2019-03-01 优视科技(中国)有限公司 对话机器人系统及相关技能配置方法
CN109597607A (zh) * 2018-10-31 2019-04-09 拓科(武汉)智能技术股份有限公司 任务型人机对话系统及其实现方法、装置与电子设备
CN110442701A (zh) * 2019-08-15 2019-11-12 苏州思必驰信息科技有限公司 语音对话处理方法及装置

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4785408A (en) * 1985-03-11 1988-11-15 AT&T Information Systems Inc. American Telephone and Telegraph Company Method and apparatus for generating computer-controlled interactive voice services
EP1224569A4 (en) * 1999-05-28 2005-08-10 Sehda Inc PHRASE BASED DIALOGUE MODELING WITH SPECIAL APPLICATION FOR GENERATING RECOGNITION GRAMMARK FOR LANGUAGE-CONTROLLED USER INTERFACE
US9092733B2 (en) * 2007-12-28 2015-07-28 Genesys Telecommunications Laboratories, Inc. Recursive adaptive interaction management system
US8953764B2 (en) * 2012-08-06 2015-02-10 Angel.Com Incorporated Dynamic adjustment of recommendations using a conversation assistant
JP6027476B2 (ja) * 2013-03-28 2016-11-16 Kddi株式会社 対話シナリオに動的対話ノードを挿入する対話プログラム、サーバ及び方法
US10455088B2 (en) * 2015-10-21 2019-10-22 Genesys Telecommunications Laboratories, Inc. Dialogue flow optimization and personalization
CN109891410B (zh) * 2016-11-04 2023-06-23 微软技术许可有限责任公司 用于新的会话对话系统的数据收集
CN107135247B (zh) * 2017-02-16 2019-11-29 江苏南大电子信息技术股份有限公司 一种人与人工智能协同工作的服务系统及方法
JP6824795B2 (ja) * 2017-03-17 2021-02-03 ヤフー株式会社 修正装置、修正方法および修正プログラム
JP6857581B2 (ja) * 2017-09-13 2021-04-14 株式会社日立製作所 成長型対話装置
CN107657017B (zh) * 2017-09-26 2020-11-13 百度在线网络技术(北京)有限公司 用于提供语音服务的方法和装置
CN108053023A (zh) * 2017-12-01 2018-05-18 北京物灵智能科技有限公司 一种自动式意图分类方法及装置
CN108427722A (zh) * 2018-02-09 2018-08-21 卫盈联信息技术(深圳)有限公司 智能交互方法、电子装置及存储介质
CN109739605A (zh) * 2018-12-29 2019-05-10 北京百度网讯科技有限公司 生成信息的方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063823A1 (en) * 2008-09-09 2010-03-11 Industrial Technology Research Institute Method and system for generating dialogue managers with diversified dialogue acts
CN105845137A (zh) * 2016-03-18 2016-08-10 中国科学院声学研究所 一种语音对话管理系统
CN108664568A (zh) * 2018-04-24 2018-10-16 科大讯飞股份有限公司 语义技能创建方法及装置
CN109408800A (zh) * 2018-08-23 2019-03-01 优视科技(中国)有限公司 对话机器人系统及相关技能配置方法
CN109597607A (zh) * 2018-10-31 2019-04-09 拓科(武汉)智能技术股份有限公司 任务型人机对话系统及其实现方法、装置与电子设备
CN110442701A (zh) * 2019-08-15 2019-11-12 苏州思必驰信息科技有限公司 语音对话处理方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4016330A4

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113067952A (zh) * 2021-03-31 2021-07-02 中国工商银行股份有限公司 用于多台机器人的人机协同无感控制方法和装置
CN113326365A (zh) * 2021-06-24 2021-08-31 中国平安人寿保险股份有限公司 回复语句生成方法、装置、设备及存储介质
CN113326365B (zh) * 2021-06-24 2023-11-07 中国平安人寿保险股份有限公司 回复语句生成方法、装置、设备及存储介质
CN113506565A (zh) * 2021-07-12 2021-10-15 北京捷通华声科技股份有限公司 语音识别的方法、装置、计算机可读存储介质与处理器
CN113506565B (zh) * 2021-07-12 2024-06-04 北京捷通华声科技股份有限公司 语音识别的方法、装置、计算机可读存储介质与处理器
CN114582314A (zh) * 2022-02-28 2022-06-03 江苏楷文电信技术有限公司 基于asr的人机音视频交互逻辑模型设计方法
CN115659994A (zh) * 2022-12-09 2023-01-31 深圳市人马互动科技有限公司 人机交互系统中的数据处理方法及相关装置
CN115659994B (zh) * 2022-12-09 2023-03-03 深圳市人马互动科技有限公司 人机交互系统中的数据处理方法及相关装置

Also Published As

Publication number Publication date
US20220293089A1 (en) 2022-09-15
EP4016330A4 (en) 2023-11-29
JP7274043B2 (ja) 2023-05-15
EP4016330A1 (en) 2022-06-22
CN110442701A (zh) 2019-11-12
JP2022544969A (ja) 2022-10-24
CN110442701B (zh) 2022-08-05

Similar Documents

Publication Publication Date Title
WO2021027198A1 (zh) 语音对话处理方法及装置
EP3510593B1 (en) Task initiation using long-tail voice commands
US20180025726A1 (en) Creating coordinated multi-chatbots using natural dialogues by means of knowledge base
CN114730429A (zh) 用于管理联系中心系统和其用户之间的对话的系统和方法
CN107112016A (zh) 多模态状态循环
CN107004411A (zh) 话音应用架构
US20170032027A1 (en) Contact Center Virtual Assistant
CN105723360A (zh) 利用情感调节改进自然语言交互
CN110998526B (zh) 用户配置的且自定义的交互式对话应用
CN108664568A (zh) 语义技能创建方法及装置
CN108536733A (zh) 人工智能数字代理
CN111462726B (zh) 一种外呼应答方法、装置、设备及介质
CN111933118B (zh) 进行语音识别优化的方法、装置及应用其的智能语音对话系统
CN112069830B (zh) 一种智能会话方法及装置
WO2023226767A1 (zh) 模型训练方法和装置及语音含义的理解方法和装置
CN110442698A (zh) 对话内容生成方法及系统
WO2021077528A1 (zh) 人机对话打断方法
US20230169273A1 (en) Systems and methods for natural language processing using a plurality of natural language models
CN111966803B (zh) 对话模拟方法、装置、存储介质及电子设备
CN114860910A (zh) 智能对话方法及系统
CN114201596A (zh) 虚拟数字人使用方法、电子设备和存储介质
CN111091011B (zh) 领域预测方法、领域预测装置及电子设备
Li et al. A speech-enabled virtual assistant for efficient human–robot interaction in industrial environments
CN113157241A (zh) 交互设备、交互装置及交互系统
CN112527987A (zh) 用于自助一体机的交互方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19941639

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022510069

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019941639

Country of ref document: EP

Effective date: 20220315