WO2021159904A1 - Voice data processing method and device for intelligent voice conversation system - Google Patents

Voice data processing method and device for intelligent voice conversation system Download PDF

Info

Publication number
WO2021159904A1
WO2021159904A1 PCT/CN2021/071367 CN2021071367W WO2021159904A1 WO 2021159904 A1 WO2021159904 A1 WO 2021159904A1 CN 2021071367 W CN2021071367 W CN 2021071367W WO 2021159904 A1 WO2021159904 A1 WO 2021159904A1
Authority
WO
WIPO (PCT)
Prior art keywords
function
expansion function
initial expansion
voice data
dialogue system
Prior art date
Application number
PCT/CN2021/071367
Other languages
French (fr)
Chinese (zh)
Inventor
彭殷路
孔冬兵
Original Assignee
升智信息科技(南京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 升智信息科技(南京)有限公司 filed Critical 升智信息科技(南京)有限公司
Publication of WO2021159904A1 publication Critical patent/WO2021159904A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the technical field of voice signal processing, in particular to a voice data processing method, device, computer equipment and storage medium used in an intelligent voice dialogue system.
  • Intelligent voice dialogue system also known as intelligent conversation agent or intelligent chat system. It refers to a system that realizes language interaction with humans through artificial intelligence technology, based on speech recognition, natural language processing and speech synthesis technology.
  • Intelligent voice dialogue systems are mainly divided into task-oriented dialogue systems and non-task-oriented dialogue systems from the application scenarios. Typical task-oriented dialogue systems such as intelligent voice assistants and smart phone outbound systems, and typical non-task-oriented systems such as Smart speakers, chat robots, etc.
  • the human-computer interaction link of the traditional intelligent voice dialogue system mainly includes three stages: speech recognition, semantic understanding and speech synthesis.
  • Speech recognition is to convert the speech spoken by the user into the corresponding text; semantic understanding is to extract the user's intention from the text-level dialogue context and other information expressed by the user and generate the response text; speech synthesis refers to the conversion of the response text into speech and Play to the user.
  • Speech recognition and speech synthesis technology have strong versatility, that is, the type and application of intelligent speech dialogue systems are different, and even the configuration of interactive speech templates will not have a greater impact on their effects.
  • the semantic understanding in the traditional intelligent speech dialogue system has a strong correlation between the dialogue field and the dialogue scene.
  • the general natural language understanding model solves the technical problems of text intention determination and named entity recognition to a certain extent, there are still some problems.
  • the needs of many user scenarios in different fields cannot be met based on traditional semantic understanding solutions, which directly lead to the problems of unintelligent dialogue and a very poor sense of actual dialogue experience.
  • Experienced speech configuration engineers can to a certain extent alleviate the problem of dialogue experience through the configuration of speech skills, but this also leads to the complexity of a single speech to a certain extent, and it is more likely to appear in the process of dialogue with users.
  • the problem of technical logic is a certain extent alleviate the problem of dialogue experience through the configuration of speech skills, but this also leads to the complexity of a single speech to a certain extent, and it is more likely to appear in the process of dialogue with users.
  • task-oriented intelligent voice dialogue systems often need to interface with external systems to obtain data to obtain user-related data, or send instructions to external systems to help users complete actual task operations.
  • the traditional solution is to complete related functions through customized development.
  • the main problems are the long development and integration cycle, the realization of functions cannot meet the requirements of complex speech configuration, and the ability to handle complex business events in the dialogue process. At the same time, the scalability and maintainability of the system are very poor.
  • the function of the system is mixed with the configuration of speech skills, and it is necessary to update the system to realize the update of speech skills.
  • the traditional intelligent voice dialogue system realizes the on-line and delivery of dialogue services through speech and speech processes.
  • the outbound sales system of smart phones will have operators who count, sort and summarize some sales based on sales scenarios. The champion's words and the flow of words.
  • the system will perform intent recognition and conversation management according to the design of speech and speech flow.
  • a simple interactive structure based on keywords that is, to determine the user's intention through the matching of keywords and key phrases, and respond according to the user's intention.
  • a typical implementation method is AIML (Artificial Intelligence Markup Language). This method can support simple context understanding and multi-lens dialogue capabilities based on limited keywords, and is generally common in early non-task-oriented intelligent voice dialogue systems.
  • a structured template based on a tree or a finite state machine that is, modeling speech and speech flow as a tree structure or a graph structure of a finite state machine, compared to a simple interactive structure based on keywords, a tree and a finite state machine speech
  • the flow structure method can integrate more conversation context during the conversation, and can combine the resources obtained in the conversation with the user information obtained through other means to provide more flexible and personalized conversation services.
  • This method needs to artificially define the dialogue process according to the dialogue scene. It is suitable for task-oriented scenarios where the dialogue is completely guided by the system. It is suitable for simple tasks. The disadvantage is that it is difficult to expand. It is easy to make the speech flow process complicated and difficult to maintain. Input comparison Limited, the operational flexibility of the speech flow is poor.
  • a framework template based on named entity recognition that is, a framework speech flow template based on slot value extraction
  • this technical solution usually models the speech flow process as a slot value extraction process.
  • the so-called slot value extraction is to extract the information that needs to be completed to understand the user's intention according to the type of information from the expression, and transform it into a clear instruction or response according to the completion status of all the slot value information required by the task.
  • the framework based on named entity recognition is usually used as an extension of the finite state machine phone process template to obtain relatively complex information and support the types and sequence of information input, and enhance the system to support task-oriented and non-task-oriented The ability to mix scenes of different types.
  • the present invention proposes a voice data processing method, device, computer equipment and storage medium for an intelligent voice dialogue system.
  • a voice data processing method for an intelligent voice dialogue system which includes the following steps:
  • S10 Define each service component of the intelligent voice dialogue system as an initial expansion function, so that the initial expansion function has the function of completing independent logic calls or business calls, and supporting modular multiplexing;
  • S40 Use the target extension function to process the voice data of the user during the call, so as to obtain the content represented by the voice data.
  • implementing and publishing the initial extension function includes:
  • configuring the initial expansion function in the function library to obtain the target expansion function includes:
  • using the target expansion function to process the voice data input by the user to obtain the content represented by the voice data includes:
  • the function configuration is performed on the nodes of the speech art, and the configured function is used to define the intention and information of the user's voice data characterization during the call.
  • a voice data processing device for an intelligent voice dialogue system including:
  • the definition module is used to define each business component of the intelligent voice dialogue system as an initial expansion function, so that the initial expansion function can complete independent logic calls or business calls, and support modular multiplexing;
  • the realization module is used to realize and publish the initial expansion function, so that the initial expansion function is in the function library of the intelligent voice dialogue system for users to use;
  • the configuration module is used to configure the initial expansion function in the function library to obtain the target expansion function
  • the processing module is used to process the voice data of the user during the call by using the target extension function to obtain the content represented by the voice data.
  • the implementation module is further used for:
  • the configuration module is further used for:
  • processing module is further used for:
  • the function configuration is performed on the node of the speech art, and the configured function is used to define the intention and information of the user's voice data characterization during the call.
  • a computer device includes a memory, a processor, and a computer program stored on the memory and running on the processor.
  • the processor executes the computer program, it implements the intelligent voice dialogue system in any one of the above embodiments. The steps of the voice data processing method.
  • a computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of the voice data processing method for an intelligent voice dialogue system of any one of the above embodiments are realized.
  • each service component of the intelligent voice dialogue system is defined as an initial expansion function, so that the initial expansion function can complete independent logical calls or Service call and support modular multiplexing functions, implement and publish the initial expansion function, make the initial expansion function in the function library of the intelligent voice dialogue system for users to use, configure the initial expansion function in the function library, Obtain the target extension function, and then use the target extension function to process the user's voice data during the call to obtain the content represented by the voice data, improve the efficiency of corresponding voice data processing in the intelligent voice dialogue system, and improve the flexibility of related session management.
  • the common logic components, rule components and business domain components in the intelligent voice dialogue system are realized in the way of modular componentization and service combination, and the speech and speech flow are assembled through the dynamic configuration method, and the speech template is enhanced. At the same time of business description ability, it reduces the complexity of the speech template and improves the scalability and reusability.
  • Fig. 1 is a flowchart of a high-precision warning method according to an embodiment
  • FIG. 2 is a schematic diagram of a call flow of an execution example of an extended function-based speech technique flow template according to an embodiment
  • Figure 3 is a schematic structural diagram of a high-precision warning device according to an embodiment
  • Fig. 4 is a schematic diagram of a computer device according to an embodiment.
  • the voice data processing method for the intelligent voice dialogue system can be applied to related intelligent voice dialogue systems.
  • the above-mentioned voice data processing terminal defines each service component of the intelligent voice dialogue system as an initial expansion function, so that the initial expansion function can complete independent logic calls or business calls, and support modular multiplexing functions, and realize and publish all
  • the initial expansion function is described so that the initial expansion function is in the function library of the intelligent voice dialogue system for users to use, the initial expansion function in the function library is configured to obtain the target expansion function, and the target expansion function is used to process the user’s voice during the call Data to obtain the content represented by the voice data, so as to reduce the complexity of processing the corresponding voice data and improve the flexibility of the related session management solution.
  • the voice data processing terminal can be, but is not limited to, various personal computers and notebook computers and other intelligent processing equipment.
  • a voice data processing method for an intelligent voice dialogue system is provided. Taking the method applied to a voice data processing terminal as an example for description, the method includes the following steps:
  • S10 Define each service component of the intelligent voice dialogue system as an initial expansion function, so that the initial expansion function has the function of completing independent logic calls or business calls, and supporting modular multiplexing.
  • extension function initial extension function
  • extension functions can complete independent and simple logic calls or business calls, and Support modular reuse.
  • extension functions can be classified according to functions, such as logic functions, system functions, named entity recognition functions, business domain functions, and external service call functions.
  • the extension function needs to define the function input, including the acceptable input parameters and its types; the extension function needs to define the function output, including the output of the function and its type, the output types include: numeric, Boolean, string, enumeration, etc. .
  • extension functions The classification of extension functions is used to manage the functions and improve the interactive experience of building words.
  • the input and output definition of the extension function determines the input needs and final output behavior of the function execution.
  • the above-mentioned users may include operators of intelligent voice dialogue systems, etc.
  • implementing and publishing the initial extension function includes:
  • the specific definition of the initial expansion function refers to the function that the corresponding function (initial expansion function) needs to realize, that is, what problem the function is used to solve.
  • the realization of the initial expansion function refers to the process by which the developer realizes the corresponding function according to the functional requirement.
  • the specific definition of the initial expansion function and the functional requirements of the business function can be developed to implement a business function process.
  • the extension function After the extension function is implemented and released, it will be registered in the available function library of the intelligent voice dialogue system for users such as operators who use the intelligent voice dialogue system.
  • configuring the initial expansion function in the function library to obtain the target expansion function includes:
  • the extended functions provided by the intelligent voice dialogue system can be combined and configured to implement custom extended components of complex functions, thereby obtaining target extended functions; these custom extended components (target extended functions) are still extended through custom
  • the function method is registered in the available function library of the system, and users such as operators can call these customized extension functions in different business scenarios and speech templates.
  • S40 Use the target extension function to process the voice data of the user during the call, so as to obtain the content represented by the voice data.
  • using the target expansion function to process the voice data input by the user to obtain the content represented by the voice data includes:
  • the function configuration is performed on the node of the speech art, and the configured function is used to define the intention and information of the user's voice data characterization during the call.
  • This embodiment is a process of defining the intelligent voice dialogue system and its custom extended function combination call mode in the speech technique template of the intelligent speech dialogue system, and finally configuring it into a usable speech technique template.
  • the service execution engine of the intelligent voice dialogue system finally calls the extension function according to the defined speech template to realize the functions of intent recognition and session management in the dialogue process.
  • the speech builder of the intelligent voice dialogue system can perform functions at the node of huashu based on the extension functions provided in the function library of the intelligent voice dialogue system and the custom extension functions defined by their own configuration.
  • Configuration the content of the configuration includes the function to be executed, the order in which the function is executed, the input data source of the function and the output data assignment.
  • each service component of the intelligent voice dialogue system is respectively defined as an initial expansion function, so that the initial expansion function can complete independent logic calls or business calls, and supports modular replication.
  • Use functions realize and publish the initial expansion function, make the initial expansion function in the function library of the intelligent voice dialogue system for users to use, configure the initial expansion function in the function library, obtain the target expansion function, and then adopt the target
  • the extension function processes the user's voice data during the call to obtain the content represented by the voice data, improves the efficiency of corresponding voice data processing in the intelligent voice dialogue system, and enhances the flexibility of related session management.
  • the common logic components, rule components and business domain components in the intelligent voice dialogue system are realized in the way of modular componentization and service combination, and the speech and speech flow are assembled through the dynamic configuration method, and the speech template is enhanced. At the same time of business description ability, it reduces the complexity of the speech template and improves the scalability and reusability.
  • extension function developers can be divided into two types of users: extension function developers and business speech builders.
  • the extension function developer has professional function combination service and business domain knowledge, and its main responsibilities are specifically expressed as: providing the specific implementation of the extension function for the intelligent voice dialogue system and maintaining the function library of the system, including adding, updating and extending Functions, provide detailed service descriptions corresponding to extension functions, etc.
  • a business speech builder with knowledge of the business domain of speech application and the ability to build intelligent speech, can use the extended function library and the speech flow structure template to build speech and speech flow according to the characteristics of the domain.
  • the extension function developer performs the encapsulation of the function, and provides the interface definition and implementation description of the function. Take the implementation of the extension function extracted from the city name named entity as an example.
  • the input of the function is a string type, which is often expressed by the user Text
  • the output of the function is the extracted city name and predicted score, which are defined as string type and numeric type respectively.
  • the function definition can be described in the following ways:
  • the speech art builder uses the extension function library to configure the custom expansion function and the intelligent speech art construction configuration.
  • the speech builder builds the key processing steps of the speech node through the legal combination of the expansion function, such as the behavioral ability of intention recognition and session management. For example, for the user's expression of "what's the weather tomorrow", the speech builder needs and is not limited to use the following extension functions to complete an intelligent answer.
  • the keyword-based verbal domain filtering expansion function is used to extract the user's intention from the user's expression sentence.
  • the "weather” keyword is used to output the verbal domain as the domain node of "inquiring weather”.
  • the domain screening expansion function based on semantic similarity is used to extract user intentions from user expression sentences.
  • the output speech field is the domain node of "inquiring weather” and the similarity score is 0.99.
  • the domain word art node matching expansion function the input is a list of candidate domain nodes, and the output is the domain word art node with the highest score.
  • the date named entity extraction extension function is used to extract the date from the user statement.
  • the date entity extracted in this example is "tomorrow"
  • the city location named entity extraction extension function is used to extract locations from user expressions.
  • the date natural language expression formatting function the input is the date entity extracted from the user's expression, and the output is the formatted date, such as "2019-10-28".
  • the conversation context information extraction extension function retrieves the same type of information available from the conversation context when the required named entity extraction is empty.
  • Weather query expansion function when the required parameters (date, location and city) slot values are all extracted, call this function to output weather information.
  • the reply text generates an extended function, and outputs the reply text based on the output of the weather query extended function and the definition of the speech template, such as "It will rain tomorrow in Nanjing, remember to bring an umbrella”.
  • FIG. 2 shows a schematic diagram of a call flow of an execution example of a speech flow template based on an extension function of the present invention.
  • FIG. 3 is a schematic structural diagram of a voice data processing device for an intelligent voice dialogue system according to an embodiment, including:
  • the definition module 10 is used to define each service component of the intelligent voice dialogue system as an initial expansion function, so that the initial expansion function has the function of completing independent logic calls or business calls, and supporting modular multiplexing;
  • the implementation module 20 is used to implement and publish the initial expansion function, so that the initial expansion function is in the function library of the intelligent voice dialogue system for users to use;
  • the configuration module 30 is used to configure the initial expansion function in the function library to obtain the target expansion function
  • the processing module 40 is used to process the voice data of the user during the call by using the target expansion function to obtain the content represented by the voice data.
  • the implementation module is further used for:
  • the configuration module is further used for:
  • processing module is further used for:
  • the function configuration is performed on the node of the speech art, and the configured function is used to define the intention and information of the user's voice data characterization during the call.
  • the voice data processing device used in the intelligent voice dialogue system can be implemented in whole or in part by software, hardware and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 4.
  • the computer equipment includes a processor, a memory, a network interface, a display screen and an input device connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a voice data processing method for the intelligent voice dialogue system.
  • the display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen
  • the input device of the computer equipment can be a touch layer covered on the display screen, or it can be a button, a trackball or a touchpad set on the housing of the computer equipment , It can also be an external keyboard, touchpad, or mouse.
  • FIG. 4 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device is further provided.
  • the computer device includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the The program implements the voice data processing method used in the intelligent voice dialogue system as any of the above embodiments.
  • the program can be stored in a non-volatile computer readable storage.
  • the program can be stored in the storage medium of the computer system and executed by at least one processor in the computer system to realize the voice data processing for the intelligent voice dialogue system as described above.
  • the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
  • a computer storage medium a computer readable storage medium, on which a computer program is stored, where the program is executed by the processor to implement any one of the above-mentioned embodiments for Voice data processing method of intelligent voice dialogue system.
  • first ⁇ second ⁇ third involved in the embodiments of this application only distinguishes similar objects, and does not represent a specific order for the objects. Understandably, “first ⁇ second ⁇ third” “Three” can be interchanged in specific order or precedence when permitted. It should be understood that the objects distinguished by “first ⁇ second ⁇ third” can be interchanged under appropriate circumstances, so that the embodiments of the present application described herein can be implemented in an order other than those illustrated or described herein.

Abstract

A voice data processing method and device for an intelligent voice conversation system, a computer apparatus, and a storage medium. The method comprises: defining each service component of an intelligent voice conversation system as an initial expansion function, and enabling the initial expansion function to complete independent logic calling or service calling, and to support modular reuse (S10); implementing and releasing the initial expansion function, such that the initial expansion function is present in a function library of the intelligent voice conversation system and is available to a user (S20); configuring the initial expansion function in the function library so as to acquire a target expansion function (S30); and using the target expansion function to process voice data of the user during a call so as to acquire content represented by the voice data (S40). The method improves efficiency of corresponding voice data processing in an intelligent voice conversation system, and improves flexibility of related session management.

Description

用于智能语音对话系统的语音数据处理方法及装置Voice data processing method and device for intelligent voice dialogue system 技术领域Technical field
本发明涉及语音信号处理技术领域,尤其涉及一种用于智能语音对话系统的语音数据处理方法、装置、计算机设备和存储介质。The present invention relates to the technical field of voice signal processing, in particular to a voice data processing method, device, computer equipment and storage medium used in an intelligent voice dialogue system.
背景技术Background technique
智能语音对话系统,又称为智能会话Agent或者智能聊天系统。是指通过人工智能技术,以语音识别、自然语言处理和语音合成技术为基础,实现与人类进行语言交互的系统。智能语音对话系统从应用场景上主要分为任务导向型对话系统和非任务导向型对话系统,典型的任务导向型对话系统如智能语音助手、智能电话外呼系统,典型的非任务导向型系统如智能音箱、聊天机器人等。Intelligent voice dialogue system, also known as intelligent conversation agent or intelligent chat system. It refers to a system that realizes language interaction with humans through artificial intelligence technology, based on speech recognition, natural language processing and speech synthesis technology. Intelligent voice dialogue systems are mainly divided into task-oriented dialogue systems and non-task-oriented dialogue systems from the application scenarios. Typical task-oriented dialogue systems such as intelligent voice assistants and smart phone outbound systems, and typical non-task-oriented systems such as Smart speakers, chat robots, etc.
传统智能语音对话系统的人机交互链路主要包含语音识别、语义理解和语音合成三个阶段。语音识别就是把用户说的语音转化为对应的文字;语义理解就是从用户表述的文字级对话上下文等信息中提取用户的意图并产生应答的文本;语音合成是指将回应的文本转化为语音并播放给用户。语音识别和语音合成技术具备较强的通用性,即智能语音对话系统的类型和应用领域的不同、甚至交互话术模版的配置不会对其效果造成较大的影响。The human-computer interaction link of the traditional intelligent voice dialogue system mainly includes three stages: speech recognition, semantic understanding and speech synthesis. Speech recognition is to convert the speech spoken by the user into the corresponding text; semantic understanding is to extract the user's intention from the text-level dialogue context and other information expressed by the user and generate the response text; speech synthesis refers to the conversion of the response text into speech and Play to the user. Speech recognition and speech synthesis technology have strong versatility, that is, the type and application of intelligent speech dialogue systems are different, and even the configuration of interactive speech templates will not have a greater impact on their effects.
传统智能语音对话系统中的语义理解具备较强的对话领域和对话场景的相关性,虽然通用的自然语言理解模型从一定程度上解决了文本的意图判定、命名实体识别等技术问题,但是依然有很多不同领域的用户场景的需求,是基于传统的语义理解方案无法满足的,直接导致对话不智能和实际对话体验感非常差的问题。经验丰富的话术配置工程师能够在一定程度上通过话术的配置缓解对话体验的问题,但是这在一定程度上也导致了单个话术的复杂程度,在与用户进行对话过程中也更容易出现话术逻辑上的问题。尤其是任务导向型的智能语音对话系统,经常需要对接外部系统获取数据获取用户相关的数据,或者向外部系统发送指令帮助用户完成实际的任务操作,传统的解决方案是通过定制开发完成相关的功能,存在的问题主要是开发和集成的周期长,功能的实现不能满足复杂话术配置的要求,不具备在对话过程中处理复杂业务事件的能力,同时系统的可扩展性和可维护性很差,系统的功能和话术的配置杂糅在一起,需要更新系统才能实现话术能力的更新。The semantic understanding in the traditional intelligent speech dialogue system has a strong correlation between the dialogue field and the dialogue scene. Although the general natural language understanding model solves the technical problems of text intention determination and named entity recognition to a certain extent, there are still some problems. The needs of many user scenarios in different fields cannot be met based on traditional semantic understanding solutions, which directly lead to the problems of unintelligent dialogue and a very poor sense of actual dialogue experience. Experienced speech configuration engineers can to a certain extent alleviate the problem of dialogue experience through the configuration of speech skills, but this also leads to the complexity of a single speech to a certain extent, and it is more likely to appear in the process of dialogue with users. The problem of technical logic. In particular, task-oriented intelligent voice dialogue systems often need to interface with external systems to obtain data to obtain user-related data, or send instructions to external systems to help users complete actual task operations. The traditional solution is to complete related functions through customized development. , The main problems are the long development and integration cycle, the realization of functions cannot meet the requirements of complex speech configuration, and the ability to handle complex business events in the dialogue process. At the same time, the scalability and maintainability of the system are very poor. , The function of the system is mixed with the configuration of speech skills, and it is necessary to update the system to realize the update of speech skills.
一般来说,传统的智能语音对话系统都是通过话术和话术流程实现对话服务的上线和交付使用,如智能电话外呼销售系统,会有运营人员根据销售场景统计、整理和归纳一些销售冠军的话术和话术流程。系统在外呼会话的过程中,会根据话术和话术流程的设计,进行意图识别和会话管理。Generally speaking, the traditional intelligent voice dialogue system realizes the on-line and delivery of dialogue services through speech and speech processes. For example, the outbound sales system of smart phones will have operators who count, sort and summarize some sales based on sales scenarios. The champion's words and the flow of words. In the process of outbound conversation, the system will perform intent recognition and conversation management according to the design of speech and speech flow.
常见的话术和话术流程的结构化方式以及会话管理的方法包含:Common speech skills and structured methods of speech flow and conversation management methods include:
基于关键词的简单交互结构,即通过关键词和关键短语的匹配来判定用户的意图,并根据用户的意图进行回应,典型的实现方式如AIML(人工智能标记语言)。这种方式能够基于有限的关键词支持简单的上下文理解和多伦对话能力,一般常见于早期的非任务导向型智能语音对话系统。A simple interactive structure based on keywords, that is, to determine the user's intention through the matching of keywords and key phrases, and respond according to the user's intention. A typical implementation method is AIML (Artificial Intelligence Markup Language). This method can support simple context understanding and multi-lens dialogue capabilities based on limited keywords, and is generally common in early non-task-oriented intelligent voice dialogue systems.
基于树或者有限状态机的结构化模版,即将话术和话术流程建模为树状结构或有限状态机的图结构,相比于基于关键词的简单交互结构,树和有限状态机的话术流程结构方式能够在对话的过程中能融合更多的会话上下文,并且能够将会话中获取的资源与通过其他途径获取用户信息结合起来,提供更加灵活的个性化对话服务。这种方法需要根据对话场景人为定义对话流程,适用于完全由系统引导对话的任务导向型场景,适用于简单的任务,缺点是难以扩展,很容易使话术流程变得复杂难以维护,输入比较有限,话术流程的运转灵活性较差。A structured template based on a tree or a finite state machine, that is, modeling speech and speech flow as a tree structure or a graph structure of a finite state machine, compared to a simple interactive structure based on keywords, a tree and a finite state machine speech The flow structure method can integrate more conversation context during the conversation, and can combine the resources obtained in the conversation with the user information obtained through other means to provide more flexible and personalized conversation services. This method needs to artificially define the dialogue process according to the dialogue scene. It is suitable for task-oriented scenarios where the dialogue is completely guided by the system. It is suitable for simple tasks. The disadvantage is that it is difficult to expand. It is easy to make the speech flow process complicated and difficult to maintain. Input comparison Limited, the operational flexibility of the speech flow is poor.
基于命名实体识别的框架性模版,即基于槽值提取的框架性话术流程模版,这种技术方案通常将话术流程建模为一个槽值提取的过程。所谓槽值提取,就是从用表述中按照信息类型提取理解用户意图所需要补全的信息,并根据任务所需要的所有槽值信息的补全状态转化为明确的指令或回应。在具体的实现上,基于命名实体识别的框架通常作为有限状态机话术流程模版的扩展,用于获取相对复杂的信息和支持信息输入的种类和顺序,提升系统支持任务导向型和非任务导向型的混合场景的能力。A framework template based on named entity recognition, that is, a framework speech flow template based on slot value extraction, this technical solution usually models the speech flow process as a slot value extraction process. The so-called slot value extraction is to extract the information that needs to be completed to understand the user's intention according to the type of information from the expression, and transform it into a clear instruction or response according to the completion status of all the slot value information required by the task. In terms of specific implementation, the framework based on named entity recognition is usually used as an extension of the finite state machine phone process template to obtain relatively complex information and support the types and sequence of information input, and enhance the system to support task-oriented and non-task-oriented The ability to mix scenes of different types.
由于人类语言表述具备复杂性、随机性和非理性化这三个特点,传统的话术和话术流程的结构化方式以及会话管理的往往存在过程复杂,灵活性低的问题。As human language expression has the three characteristics of complexity, randomness and irrationality, traditional speech and speech flow structured methods and conversation management often have problems of complex processes and low flexibility.
发明内容Summary of the invention
针对以上问题,本发明提出一种用于智能语音对话系统的语音数据处理方法、装置、计算机设备和存储介质。In view of the above problems, the present invention proposes a voice data processing method, device, computer equipment and storage medium for an intelligent voice dialogue system.
为实现本发明的目的,提供一种用于智能语音对话系统的语音数据处理方法,包括如下步骤:In order to achieve the purpose of the present invention, a voice data processing method for an intelligent voice dialogue system is provided, which includes the following steps:
S10,将智能语音对话系统的各个业务组件分别定义为初始扩展函数,使所述初始扩展函数具有完成独立的逻辑调用或者业务调用,并支持模块化复用的功能;S10: Define each service component of the intelligent voice dialogue system as an initial expansion function, so that the initial expansion function has the function of completing independent logic calls or business calls, and supporting modular multiplexing;
S20,实现并发布所述初始扩展函数,使所述初始扩展函数在智能语音对话系统的函数库中,供用户使用;S20. Implement and publish the initial expansion function, so that the initial expansion function is in the function library of the intelligent voice dialogue system for users to use;
S30,配置函数库中的初始扩展函数,得到目标扩展函数;S30, configure the initial expansion function in the function library to obtain the target expansion function;
S40,采用目标扩展函数处理通话过程中用户的语音数据,以获取所述语音数据表征的内容。S40: Use the target extension function to process the voice data of the user during the call, so as to obtain the content represented by the voice data.
在一个实施例中,实现并发布所述初始扩展函数包括:In one embodiment, implementing and publishing the initial extension function includes:
根据初始扩展函数的具体定义和初始扩展函数的功能需求实现并开发所述初始扩展函数。Realize and develop the initial expansion function according to the specific definition of the initial expansion function and the functional requirements of the initial expansion function.
在一个实施例中,配置函数库中的初始扩展函数,得到目标扩展函数包括:In one embodiment, configuring the initial expansion function in the function library to obtain the target expansion function includes:
将函数库中的一个初始扩展函数作为另一个初始扩展函数的输入,得到自定义的目标扩展函数。Use an initial expansion function in the function library as the input of another initial expansion function to obtain a customized target expansion function.
在一个实施例中,采用目标扩展函数处理用户输入的语音数据,以获取所述语音数据表征的内容包括:In one embodiment, using the target expansion function to process the voice data input by the user to obtain the content represented by the voice data includes:
根据函数库中所提供的初始扩展函数以及目标扩展函数,在话术的节点进行函数配置,采用配置后的函数,定义通话过程中对用户的语音数据表征的意图和信息。According to the initial expansion function and the target expansion function provided in the function library, the function configuration is performed on the nodes of the speech art, and the configured function is used to define the intention and information of the user's voice data characterization during the call.
一种用于智能语音对话系统的语音数据处理装置,包括:A voice data processing device for an intelligent voice dialogue system, including:
定义模块,用于将智能语音对话系统的各个业务组件分别定义为初始扩展函数,使所述初始扩展函数具有完成独立的逻辑调用或者业务调用,并支持模块化复用的功能;The definition module is used to define each business component of the intelligent voice dialogue system as an initial expansion function, so that the initial expansion function can complete independent logic calls or business calls, and support modular multiplexing;
实现模块,用于实现并发布所述初始扩展函数,使所述初始扩展函数在智能语音对话系统的函数库中,供用户使用;The realization module is used to realize and publish the initial expansion function, so that the initial expansion function is in the function library of the intelligent voice dialogue system for users to use;
配置模块,用于配置函数库中的初始扩展函数,得到目标扩展函数;The configuration module is used to configure the initial expansion function in the function library to obtain the target expansion function;
处理模块,用于采用目标扩展函数处理通话过程中用户的语音数据,以获取所述语音数据表征的内容。The processing module is used to process the voice data of the user during the call by using the target extension function to obtain the content represented by the voice data.
在一个实施例中,所述实现模块进一步用于:In an embodiment, the implementation module is further used for:
根据初始扩展函数的具体定义和初始扩展函数的功能需求实现并开发所述初始扩展函数。Realize and develop the initial expansion function according to the specific definition of the initial expansion function and the functional requirements of the initial expansion function.
在一个实施例中,所述配置模块进一步用于:In an embodiment, the configuration module is further used for:
将函数库中的一个初始扩展函数作为另一个初始扩展函数的输入,得到自定义的目标扩展函数。Use an initial expansion function in the function library as the input of another initial expansion function to obtain a customized target expansion function.
在一个实施例中,所述处理模块进一步用于:In an embodiment, the processing module is further used for:
根据函数库中所提供的初始扩展函数以及目标扩展函数,在话术的节点进行函数配置,采用配置后的函数,定义通话过程中对用户的语音数据表征的意图和信息。According to the initial expansion function and the target expansion function provided in the function library, the function configuration is performed on the node of the speech art, and the configured function is used to define the intention and information of the user's voice data characterization during the call.
一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述任一实施例的用于智能语音对话系统的语音数据处理方法的步骤。A computer device includes a memory, a processor, and a computer program stored on the memory and running on the processor. When the processor executes the computer program, it implements the intelligent voice dialogue system in any one of the above embodiments. The steps of the voice data processing method.
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一实施例的用于智能语音对话系统的语音数据处理方法的步骤。A computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of the voice data processing method for an intelligent voice dialogue system of any one of the above embodiments are realized.
上述用于智能语音对话系统的语音数据处理方法、装置、计算机设备和存储介质,将智能语音对话系统的各个业务组件分别定义为初始扩展函数,使所述初始扩展函数具有完成独立的逻辑调用或者业务调用,并支持模块化复用的功能,实现并发布所述初始扩展函数,使所述初始扩展函数在智能语音对话系统的函数库中,供用户使用,配置函数库中的初始扩展函数,得到目标扩展函数,再采用目标扩展函数处理通话过程中用户的语音数据,以获取所述语音数据表征的内容,提高智能语音对话系统中相应语音数据处理的效率,提升相关会话管理的灵活性。具体以模块组件化和服务组合的方式来实现智能语音对话系统中的常见逻辑组件、规则组件和业务领域组件,并通过动态配置化的方式来组装话术和话术流程,在增强话术模版业务描述能力的同时,降低话术模版的复杂度并提高可扩展性和可复用性。In the above-mentioned voice data processing method, device, computer equipment and storage medium for the intelligent voice dialogue system, each service component of the intelligent voice dialogue system is defined as an initial expansion function, so that the initial expansion function can complete independent logical calls or Service call and support modular multiplexing functions, implement and publish the initial expansion function, make the initial expansion function in the function library of the intelligent voice dialogue system for users to use, configure the initial expansion function in the function library, Obtain the target extension function, and then use the target extension function to process the user's voice data during the call to obtain the content represented by the voice data, improve the efficiency of corresponding voice data processing in the intelligent voice dialogue system, and improve the flexibility of related session management. Specifically, the common logic components, rule components and business domain components in the intelligent voice dialogue system are realized in the way of modular componentization and service combination, and the speech and speech flow are assembled through the dynamic configuration method, and the speech template is enhanced. At the same time of business description ability, it reduces the complexity of the speech template and improves the scalability and reusability.
附图说明Description of the drawings
图1是一个实施例的高精确度告警方法流程图;Fig. 1 is a flowchart of a high-precision warning method according to an embodiment;
图2是一个实施例的基于扩展函数的话术流程模版执行实例调用流程示意图;FIG. 2 is a schematic diagram of a call flow of an execution example of an extended function-based speech technique flow template according to an embodiment;
图3是一个实施例的高精确度告警装置结构示意图;Figure 3 is a schematic structural diagram of a high-precision warning device according to an embodiment;
图4是一个实施例的计算机设备示意图。Fig. 4 is a schematic diagram of a computer device according to an embodiment.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer and clearer, the following further describes the application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。The reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.
本申请提供的用于智能语音对话系统的语音数据处理方法,可以应用于相关智能语音对话系统。上述语音数据处理终端将智能语音对话系统的各个业务组件分别定义为初始扩展函数,使所述初始扩展函数具有完成独立的逻辑调用或者业务调用,并支持模块化复用的功能,实现并发布所述初始扩展函数,使所述初始扩展函数在智能语音对话系统的函数库中,供用户使用,配置函数库中的初始扩展函数,得到目标扩展函数,采用目标扩展函数处理通话过程中用户的语音数据,以获取所述语音数据表征的内容,以降低处理相应语音数据的复杂度,提高相关会话管理方案的灵活性。其中,语音数据处理终端可以但不限于是各种个人计算机和笔记本电脑等智能处理设备。The voice data processing method for the intelligent voice dialogue system provided in this application can be applied to related intelligent voice dialogue systems. The above-mentioned voice data processing terminal defines each service component of the intelligent voice dialogue system as an initial expansion function, so that the initial expansion function can complete independent logic calls or business calls, and support modular multiplexing functions, and realize and publish all The initial expansion function is described so that the initial expansion function is in the function library of the intelligent voice dialogue system for users to use, the initial expansion function in the function library is configured to obtain the target expansion function, and the target expansion function is used to process the user’s voice during the call Data to obtain the content represented by the voice data, so as to reduce the complexity of processing the corresponding voice data and improve the flexibility of the related session management solution. Among them, the voice data processing terminal can be, but is not limited to, various personal computers and notebook computers and other intelligent processing equipment.
在一个实施例中,如图1所示,提供了一种用于智能语音对话系统的语音数据处理方法,以该方法应用于语音数据处理终端为例进行说明,包括以下步骤:In an embodiment, as shown in FIG. 1, a voice data processing method for an intelligent voice dialogue system is provided. Taking the method applied to a voice data processing terminal as an example for description, the method includes the following steps:
S10,将智能语音对话系统的各个业务组件分别定义为初始扩展函数,使所述初始扩展函数具有完成独立的逻辑调用或者业务调用,并支持模块化复用的功能。S10: Define each service component of the intelligent voice dialogue system as an initial expansion function, so that the initial expansion function has the function of completing independent logic calls or business calls, and supporting modular multiplexing.
上述步骤可以将智能语音对话系统的业务主角和对话系统所需要对接的外部系统的接口组件定义为扩展函数(初始扩展函数)。实质上为对智能语音对话系统及其应用的业务场景抽象化的过程,将智能语音对话系统中一些常用的业务组件定义为扩展函数,这些扩展函数能够完成独立简单的逻辑调用或者业务调用,并支持模块化的复用。The above steps can define the business protagonist of the intelligent voice dialogue system and the interface component of the external system that the dialogue system needs to connect as an extension function (initial extension function). In essence, it is a process of abstracting the business scenarios of the intelligent voice dialogue system and its application. Some commonly used business components in the intelligent voice dialogue system are defined as extension functions. These extension functions can complete independent and simple logic calls or business calls, and Support modular reuse.
进一步地,扩展函数(初始扩展函数)可以根据功能进行分类,如逻辑函数、系统函数、命名实体识别函数、业务领域函数以及外部服务调用函数等。扩展函数需要定义函数输入,包含可接受的输入的参数及其类型;扩展函数需要定义函数输出,包含函数的输出及其类型,输出类型包括:数值型、布尔型、字符串和枚举型等。Further, extension functions (initial extension functions) can be classified according to functions, such as logic functions, system functions, named entity recognition functions, business domain functions, and external service call functions. The extension function needs to define the function input, including the acceptable input parameters and its types; the extension function needs to define the function output, including the output of the function and its type, the output types include: numeric, Boolean, string, enumeration, etc. .
扩展函数的分类用于对函数进行管理,提高话术搭建的交互体验。扩展函数的输入输出定义决定了函数执行的输入需要和最终输出行为。The classification of extension functions is used to manage the functions and improve the interactive experience of building words. The input and output definition of the extension function determines the input needs and final output behavior of the function execution.
S20,实现并发布所述初始扩展函数,使所述初始扩展函数在智能语音对话系统的函数库中,供用户使用。S20. Implement and publish the initial expansion function, so that the initial expansion function is in the function library of the intelligent voice dialogue system for users to use.
上述用户可以包括智能语音对话系统的运营人员等。The above-mentioned users may include operators of intelligent voice dialogue systems, etc.
在一个实施例中,实现并发布所述初始扩展函数包括:In one embodiment, implementing and publishing the initial extension function includes:
根据初始扩展函数的具体定义和初始扩展函数的功能需求实现并开发所述初始扩展函数。Realize and develop the initial expansion function according to the specific definition of the initial expansion function and the functional requirements of the initial expansion function.
初始扩展函数的具体定义是指所需要相应函数(初始扩展函数)实现的功能,即该函数用来解决什么问题。初始扩展函数的实现是指开发人员根据功能需求实现相应函数功能的过程。The specific definition of the initial expansion function refers to the function that the corresponding function (initial expansion function) needs to realize, that is, what problem the function is used to solve. The realization of the initial expansion function refers to the process by which the developer realizes the corresponding function according to the functional requirement.
具体地,本实施例可以初始扩展函数的具体定义和该业务函数(初始扩展函数)的功能需求开发实现业务函数的过程。扩展函数实现和发布后,会注册在智能语音对话系统的可用函数库中,供使用智能语音对话系统的运营人员等用户使用。Specifically, in this embodiment, the specific definition of the initial expansion function and the functional requirements of the business function (initial expansion function) can be developed to implement a business function process. After the extension function is implemented and released, it will be registered in the available function library of the intelligent voice dialogue system for users such as operators who use the intelligent voice dialogue system.
S30,配置函数库中的初始扩展函数,得到目标扩展函数。S30: Configure the initial expansion function in the function library to obtain the target expansion function.
在一个实施例中,配置函数库中的初始扩展函数,得到目标扩展函数包括:In one embodiment, configuring the initial expansion function in the function library to obtain the target expansion function includes:
将函数库中的一个初始扩展函数作为另一个初始扩展函数的输入,得到自定义的目标扩展函数。Use an initial expansion function in the function library as the input of another initial expansion function to obtain a customized target expansion function.
本实施例可以将智能语音对话系统提供的扩展函数通过组合配置的方式,实现复杂功能的自定义扩展组件,从而得到目标扩展函数;这些自定义的扩展组件(目标扩展函数)依然通过自定义扩展函数的方式注册于系统的可用函数库中,运营人员等用户可以在不同的业务场景和话术模版中调用这些自定义的扩展函数。In this embodiment, the extended functions provided by the intelligent voice dialogue system can be combined and configured to implement custom extended components of complex functions, thereby obtaining target extended functions; these custom extended components (target extended functions) are still extended through custom The function method is registered in the available function library of the system, and users such as operators can call these customized extension functions in different business scenarios and speech templates.
S40,采用目标扩展函数处理通话过程中用户的语音数据,以获取所述语音数据表征的内容。S40: Use the target extension function to process the voice data of the user during the call, so as to obtain the content represented by the voice data.
在一个实施例中,采用目标扩展函数处理用户输入的语音数据,以获取所述语音数据表征的内容包括:In one embodiment, using the target expansion function to process the voice data input by the user to obtain the content represented by the voice data includes:
根据函数库中所提供的初始扩展函数以及目标扩展函数,在话术的节点进行函数配置,采用配置后的函数,定义通话过程中对用户的语音数据表征的意图和信息。According to the initial expansion function and the target expansion function provided in the function library, the function configuration is performed on the node of the speech art, and the configured function is used to define the intention and information of the user's voice data characterization during the call.
本实施例是在智能语音对话系统的话术模版中定义智能语音对话系统及其自定义扩展函数组合调用方式,最终配置成一个可以使用的话术模版的过程。智能语音对话系统的服务执行引擎最终根据定义的话术模版调用扩展函数来实现对话过程中的意图识别和会话管理等功能。This embodiment is a process of defining the intelligent voice dialogue system and its custom extended function combination call mode in the speech technique template of the intelligent speech dialogue system, and finally configuring it into a usable speech technique template. The service execution engine of the intelligent voice dialogue system finally calls the extension function according to the defined speech template to realize the functions of intent recognition and session management in the dialogue process.
进一步地,智能语音对话系统的话术搭建者在搭建话术的时候,可以根据智能语音对话系统的函数库中所提供的扩展函数以及自己配置定义的自定义扩展函数,在话术的节点进行函数配置,配置的内容包括需要执行的函数、函数执行的先后顺序、函数的输入的数据来源和输出的数据赋值。通过使用这些函数,可以清晰的定义通话过程中对用户说话的意图识别和信息抽取,这些扩展函数是能够在话术的不同话术节点甚至在不同话术中复用的,能够有效降低话术配置的复杂性。Furthermore, when constructing huashu, the speech builder of the intelligent voice dialogue system can perform functions at the node of huashu based on the extension functions provided in the function library of the intelligent voice dialogue system and the custom extension functions defined by their own configuration. Configuration, the content of the configuration includes the function to be executed, the order in which the function is executed, the input data source of the function and the output data assignment. By using these functions, it is possible to clearly define the intention recognition and information extraction of the user's speech during the call. These expansion functions can be reused in different speech nodes or even in different speeches, which can effectively reduce the speech The complexity of the configuration.
上述用于智能语音对话系统的语音数据处理方法,将智能语音对话系统的各个业务组件分别定义为初始扩展函数,使所述初始扩展函数具有完成独立的逻辑调用或者业务调用,并支持模块化复用的功能,实现并发布所述初始扩展函数,使所述初始扩展函数在智能语音对话系统的函数库中,供用户使用,配置函数库中的初始扩展函数,得到目标扩展函数,再采用目标扩展函数处理通话过程中用户的语音数据,以获取所述语音数据表征的内容,提高智能语音对话系统中相应语音数据处理的效率,提升相关会话管理的灵活性。具体以模块组件化和服务组合的方式来实现智能语音对话系统中的常见逻辑组件、规则组件和业务领域组件,并通过动态配置化的方式来组装话术和话术流程,在增强话术模版业务描述能力的同时,降低话术模版的复杂度并提高可扩展性和可复用性。In the voice data processing method used in the intelligent voice dialogue system, each service component of the intelligent voice dialogue system is respectively defined as an initial expansion function, so that the initial expansion function can complete independent logic calls or business calls, and supports modular replication. Use functions, realize and publish the initial expansion function, make the initial expansion function in the function library of the intelligent voice dialogue system for users to use, configure the initial expansion function in the function library, obtain the target expansion function, and then adopt the target The extension function processes the user's voice data during the call to obtain the content represented by the voice data, improves the efficiency of corresponding voice data processing in the intelligent voice dialogue system, and enhances the flexibility of related session management. Specifically, the common logic components, rule components and business domain components in the intelligent voice dialogue system are realized in the way of modular componentization and service combination, and the speech and speech flow are assembled through the dynamic configuration method, and the speech template is enhanced. At the same time of business description ability, it reduces the complexity of the speech template and improves the scalability and reusability.
在一个实施例中,遵循扩展函数构建和使用相剥离的原则,可以将智能语音对话系统的生产者分为扩展函数开发者和业务话术搭建者两类用户。其中,扩展函数开发者,具备专业的函数组合服务化和业务领域知识,其主要职责具体表现为:为智能语音对话系统 提供扩展函数的具体实现和维护系统的函数库,包括新增、更新扩展函数,提供扩展函数对应的服务细节描述等。In one embodiment, following the principle of separating the construction and use of the extension function, the producers of the intelligent voice dialogue system can be divided into two types of users: extension function developers and business speech builders. Among them, the extension function developer has professional function combination service and business domain knowledge, and its main responsibilities are specifically expressed as: providing the specific implementation of the extension function for the intelligent voice dialogue system and maintaining the function library of the system, including adding, updating and extending Functions, provide detailed service descriptions corresponding to extension functions, etc.
业务话术搭建者,具备话术应用的业务领域知识和智能化话术搭建的能力,能够根据领域特征使用扩展函数库和话术流程结构化模板进行话术和话术流程搭建。A business speech builder, with knowledge of the business domain of speech application and the ability to build intelligent speech, can use the extended function library and the speech flow structure template to build speech and speech flow according to the characteristics of the domain.
本实施例中由扩展函数开发者进行函数的封装,并提供函数的接口定义和实现描述,以城市名称命名实体提取的扩展函数实现为例,函数的输入是字符串类型,往往为用户表述的文本,函数的输出为提取的城市名称和预测分值,分别定义为字符串类型和数值类型。In this embodiment, the extension function developer performs the encapsulation of the function, and provides the interface definition and implementation description of the function. Take the implementation of the extension function extracted from the city name named entity as an example. The input of the function is a string type, which is often expressed by the user Text, the output of the function is the extracted city name and predicted score, which are defined as string type and numeric type respectively.
在一个示例中,可以通过以下方式来描述该函数定义:In an example, the function definition can be described in the following ways:
Figure PCTCN2021071367-appb-000001
Figure PCTCN2021071367-appb-000001
本实施例由话术搭建者使用扩展函数库进行自定义扩展函数的配置以及智能化 话术的搭建配置。话术搭建者根据业务领域的话术要求和扩展函数的定义描述,通过扩展函数的合法组合搭建话术节点的关键处理步骤如意图识别和会话管理的行为能力。如针对用户“明天天气怎么样”的表述,话术搭建者需要且不限于使用以下扩展函数完成智能化的答复。In this embodiment, the speech art builder uses the extension function library to configure the custom expansion function and the intelligent speech art construction configuration. According to the requirements of the business field and the definition and description of the extension function, the speech builder builds the key processing steps of the speech node through the legal combination of the expansion function, such as the behavioral ability of intention recognition and session management. For example, for the user's expression of "what's the weather tomorrow", the speech builder needs and is not limited to use the following extension functions to complete an intelligent answer.
下面列出几个相关语音数据处理的示例:Several examples of related voice data processing are listed below:
基于关键字的话术领域筛选扩展函数,用于从用户表述语句中提取用户意图,如本例中通过“天气”关键字,输出话术领域为“询问天气”的领域节点。The keyword-based verbal domain filtering expansion function is used to extract the user's intention from the user's expression sentence. For example, in this example, the "weather" keyword is used to output the verbal domain as the domain node of "inquiring weather".
基于语义相似度的领域筛选扩展函数,用于从用户表述语句中提取用户意图,如本例中输出话术领域为“询问天气”的领域节点和相似度得分0.99分。The domain screening expansion function based on semantic similarity is used to extract user intentions from user expression sentences. For example, in this example, the output speech field is the domain node of "inquiring weather" and the similarity score is 0.99.
领域话术节点匹配扩展函数,输入为候选的领域节点列表,输出为得分最高的领域话术节点。The domain word art node matching expansion function, the input is a list of candidate domain nodes, and the output is the domain word art node with the highest score.
日期命名实体提取扩展函数,用于从用户表述语句中提取日期,如本例中提取的日期实体为“明天”The date named entity extraction extension function is used to extract the date from the user statement. For example, the date entity extracted in this example is "tomorrow"
城市地点命名实体提取扩展函数,用于从用户表述语句中提取地点。The city location named entity extraction extension function is used to extract locations from user expressions.
日期自然语言表述格式化函数,输入为从用户表述中提取出的日期实体,输出为格式化的日期,如“2019-10-28”。The date natural language expression formatting function, the input is the date entity extracted from the user's expression, and the output is the formatted date, such as "2019-10-28".
会话上下文信息提取扩展函数,在需要的命名实体提取为空的情况下从对话的上下文中检索可用的同类型信息。The conversation context information extraction extension function retrieves the same type of information available from the conversation context when the required named entity extraction is empty.
天气查询扩展函数,在所需要参数(日期、地点城市)槽值都提取到的情况下,调用该函数输出天气信息。Weather query expansion function, when the required parameters (date, location and city) slot values are all extracted, call this function to output weather information.
答复文本生成扩展函数,基于天气查询扩展函数的输出和话术模版的定义输出答复用户的文本,如“南京明天会下雨,记得带伞”。The reply text generates an extended function, and outputs the reply text based on the output of the weather query extended function and the definition of the speech template, such as "It will rain tomorrow in Nanjing, remember to bring an umbrella".
本发明在实施过程中最终会根据话术搭建者的话术配置将话术流程模版中所使用的扩展函数及其调用逻辑生成话术执行实例。在一个示例中,图2显示了本发明基于扩展函数的话术流程模版执行实例调用流程示意图。In the implementation process of the present invention, the expansion function used in the speech flow template and its calling logic will eventually be generated according to the speech construction configuration of the speech construction builder to generate a speech execution example. In an example, FIG. 2 shows a schematic diagram of a call flow of an execution example of a speech flow template based on an extension function of the present invention.
参考图3所示,图3为一个实施例的用于智能语音对话系统的语音数据处理装置结构示意图,包括:Referring to FIG. 3, FIG. 3 is a schematic structural diagram of a voice data processing device for an intelligent voice dialogue system according to an embodiment, including:
定义模块10,用于将智能语音对话系统的各个业务组件分别定义为初始扩展函数,使所述初始扩展函数具有完成独立的逻辑调用或者业务调用,并支持模块化复用的功能;The definition module 10 is used to define each service component of the intelligent voice dialogue system as an initial expansion function, so that the initial expansion function has the function of completing independent logic calls or business calls, and supporting modular multiplexing;
实现模块20,用于实现并发布所述初始扩展函数,使所述初始扩展函数在智能语音对话系统的函数库中,供用户使用;The implementation module 20 is used to implement and publish the initial expansion function, so that the initial expansion function is in the function library of the intelligent voice dialogue system for users to use;
配置模块30,用于配置函数库中的初始扩展函数,得到目标扩展函数;The configuration module 30 is used to configure the initial expansion function in the function library to obtain the target expansion function;
处理模块40,用于采用目标扩展函数处理通话过程中用户的语音数据,以获取所述语音数据表征的内容。The processing module 40 is used to process the voice data of the user during the call by using the target expansion function to obtain the content represented by the voice data.
在一个实施例中,所述实现模块进一步用于:In an embodiment, the implementation module is further used for:
根据初始扩展函数的具体定义和初始扩展函数的功能需求实现并开发所述初始扩展函数。Realize and develop the initial expansion function according to the specific definition of the initial expansion function and the functional requirements of the initial expansion function.
在一个实施例中,所述配置模块进一步用于:In an embodiment, the configuration module is further used for:
将函数库中的一个初始扩展函数作为另一个初始扩展函数的输入,得到自定义的目标扩展函数。Use an initial expansion function in the function library as the input of another initial expansion function to obtain a customized target expansion function.
在一个实施例中,所述处理模块进一步用于:In an embodiment, the processing module is further used for:
根据函数库中所提供的初始扩展函数以及目标扩展函数,在话术的节点进行函数配置,采用配置后的函数,定义通话过程中对用户的语音数据表征的意图和信息。According to the initial expansion function and the target expansion function provided in the function library, the function configuration is performed on the node of the speech art, and the configured function is used to define the intention and information of the user's voice data characterization during the call.
关于用于智能语音对话系统的语音数据处理装置的具体限定可以参见上文中对于用于智能语音对话系统的语音数据处理方法的限定,在此不再赘述。上述用于智能语音对话系统的语音数据处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the voice data processing device used in the intelligent voice dialogue system, please refer to the above limitation on the voice data processing method used in the intelligent voice dialogue system, which will not be repeated here. The various modules in the voice data processing device used in the intelligent voice dialogue system can be implemented in whole or in part by software, hardware and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是终端,其内部结构图可以如图4所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种用于智能语音对话系统的语音数据处理方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。In one embodiment, a computer device is provided. The computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 4. The computer equipment includes a processor, a memory, a network interface, a display screen and an input device connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a voice data processing method for the intelligent voice dialogue system. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, or it can be a button, a trackball or a touchpad set on the housing of the computer equipment , It can also be an external keyboard, touchpad, or mouse.
本领域技术人员可以理解,图4中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 4 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
基于如上所述的示例,在一个实施例中还提供一种计算机设备,该计算机设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现如上述各实施例中的任意一种用于智能语音对话系统的语音数据处理方法。Based on the above example, in one embodiment, a computer device is further provided. The computer device includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the The program implements the voice data processing method used in the intelligent voice dialogue system as any of the above embodiments.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性的计算机可读取存储介质中,如本发明实施例中,该程序可存储于计算机系统的存储介质中,并被该计算机系统中的至少一个处理器执行,以实现包括如上述用于智能语音对话系统的语音数据处理方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The program can be stored in a non-volatile computer readable storage. In the medium, as in the embodiment of the present invention, the program can be stored in the storage medium of the computer system and executed by at least one processor in the computer system to realize the voice data processing for the intelligent voice dialogue system as described above. The flow of an embodiment of the method. Wherein, the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
据此,在一个实施例中还提供一种计算机存储介质计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如上述各实施例中的任意一种用于智能语音对话系统的语音数据处理方法。Accordingly, in one embodiment, there is also provided a computer storage medium, a computer readable storage medium, on which a computer program is stored, where the program is executed by the processor to implement any one of the above-mentioned embodiments for Voice data processing method of intelligent voice dialogue system.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.
需要说明的是,本申请实施例所涉及的术语“第一\第二\第三”仅仅是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序。应该理解“第一\第二\第三”区分的对象在适当情况下可以互换,以使这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。It should be noted that the term "first\second\third" involved in the embodiments of this application only distinguishes similar objects, and does not represent a specific order for the objects. Understandably, "first\second\third" "Three" can be interchanged in specific order or precedence when permitted. It should be understood that the objects distinguished by "first\second\third" can be interchanged under appropriate circumstances, so that the embodiments of the present application described herein can be implemented in an order other than those illustrated or described herein.
本申请实施例的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或模块的过程、方法、装置、产品或设备没有限定于已列出的步骤或模块,而是可选地还包括没有列出的步骤或模块,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或模块。The terms "include" and "have" and any variations thereof in the embodiments of the present application are intended to cover non-exclusive inclusions. For example, a process, method, device, product, or device that includes a series of steps or modules is not limited to the listed steps or modules, but optionally includes unlisted steps or modules, or optionally also includes Other steps or modules inherent to these processes, methods, products or equipment.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (10)

  1. 一种用于智能语音对话系统的语音数据处理方法,其特征在于,包括如下步骤:A voice data processing method for an intelligent voice dialogue system is characterized in that it comprises the following steps:
    S10,将智能语音对话系统的各个业务组件分别定义为初始扩展函数,使所述初始扩展函数具有完成独立的逻辑调用或者业务调用,并支持模块化复用的功能;S10: Define each service component of the intelligent voice dialogue system as an initial expansion function, so that the initial expansion function has the function of completing independent logic calls or business calls, and supporting modular multiplexing;
    S20,实现并发布所述初始扩展函数,使所述初始扩展函数在智能语音对话系统的函数库中,供用户使用;S20. Implement and publish the initial expansion function, so that the initial expansion function is in the function library of the intelligent voice dialogue system for users to use;
    S30,配置函数库中的初始扩展函数,得到目标扩展函数;S30, configure the initial expansion function in the function library to obtain the target expansion function;
    S40,采用目标扩展函数处理通话过程中用户的语音数据,以获取所述语音数据表征的内容。S40: Use the target extension function to process the voice data of the user during the call, so as to obtain the content represented by the voice data.
  2. 根据权利要求1所述的用于智能语音对话系统的语音数据处理方法,其特征在于,实现并发布所述初始扩展函数包括:The voice data processing method for an intelligent voice dialogue system according to claim 1, wherein implementing and publishing the initial expansion function comprises:
    根据初始扩展函数的具体定义和初始扩展函数的功能需求实现并开发所述初始扩展函数。Realize and develop the initial expansion function according to the specific definition of the initial expansion function and the functional requirements of the initial expansion function.
  3. 根据权利要求1所述的用于智能语音对话系统的语音数据处理方法,其特征在于,配置函数库中的初始扩展函数,得到目标扩展函数包括:The voice data processing method for an intelligent voice dialogue system according to claim 1, wherein configuring the initial expansion function in the function library to obtain the target expansion function comprises:
    将函数库中的一个初始扩展函数作为另一个初始扩展函数的输入,得到自定义的目标扩展函数。Use an initial expansion function in the function library as the input of another initial expansion function to obtain a customized target expansion function.
  4. 根据权利要求1所述的用于智能语音对话系统的语音数据处理方法,其特征在于,采用目标扩展函数处理用户输入的语音数据,以获取所述语音数据表征的内容包括:The voice data processing method for an intelligent voice dialogue system according to claim 1, wherein using a target expansion function to process voice data input by a user to obtain the content represented by the voice data comprises:
    根据函数库中所提供的初始扩展函数以及目标扩展函数,在话术的节点进行函数配置,采用配置后的函数,定义通话过程中对用户的语音数据表征的意图和信息。According to the initial expansion function and the target expansion function provided in the function library, the function configuration is performed on the nodes of the speech art, and the configured function is used to define the intention and information of the user's voice data characterization during the call.
  5. 一种用于智能语音对话系统的语音数据处理装置,其特征在于,包括:A voice data processing device used in an intelligent voice dialogue system, which is characterized in that it comprises:
    定义模块,用于将智能语音对话系统的各个业务组件分别定义为初始扩展函数,使所述初始扩展函数具有完成独立的逻辑调用或者业务调用,并支持模块化复用的功能;The definition module is used to define each business component of the intelligent voice dialogue system as an initial expansion function, so that the initial expansion function can complete independent logic calls or business calls, and support modular multiplexing;
    实现模块,用于实现并发布所述初始扩展函数,使所述初始扩展函数在智能语音对话系统的函数库中,供用户使用;The realization module is used to realize and publish the initial expansion function, so that the initial expansion function is in the function library of the intelligent voice dialogue system for users to use;
    配置模块,用于配置函数库中的初始扩展函数,得到目标扩展函数;The configuration module is used to configure the initial expansion function in the function library to obtain the target expansion function;
    处理模块,用于采用目标扩展函数处理通话过程中用户的语音数据,以获取所述语音数据表征的内容。The processing module is used to process the voice data of the user during the call by using the target extension function to obtain the content represented by the voice data.
  6. 根据权利要求5所述的用于智能语音对话系统的语音数据处理装置,其特征在于,所述实现模块进一步用于:The voice data processing device for an intelligent voice dialogue system according to claim 5, wherein the realization module is further used for:
    根据初始扩展函数的具体定义和初始扩展函数的功能需求实现并开发所述初始扩展函数。Realize and develop the initial expansion function according to the specific definition of the initial expansion function and the functional requirements of the initial expansion function.
  7. 根据权利要求5所述的用于智能语音对话系统的语音数据处理装置,其特征在于,所述配置模块进一步用于:The voice data processing device for an intelligent voice dialogue system according to claim 5, wherein the configuration module is further used for:
    将函数库中的一个初始扩展函数作为另一个初始扩展函数的输入,得到自定义的目标扩展函数。Use an initial expansion function in the function library as the input of another initial expansion function to obtain a customized target expansion function.
  8. 根据权利要求5所述的用于智能语音对话系统的语音数据处理装置,其特征在于,所述处理模块进一步用于:The voice data processing device for an intelligent voice dialogue system according to claim 5, wherein the processing module is further used for:
    根据函数库中所提供的初始扩展函数以及目标扩展函数,在话术的节点进行函数配 置,采用配置后的函数,定义通话过程中对用户的语音数据表征的意图和信息。According to the initial expansion function and target expansion function provided in the function library, the function configuration is performed on the node of the speech art, and the configured function is used to define the intention and information of the user's voice data during the call.
  9. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至4中任一项所述方法的步骤。A computer device, comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor implements any one of claims 1 to 4 when the computer program is executed The steps of the method.
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至4中任一项所述方法的步骤。A computer-readable storage medium with a computer program stored thereon, wherein the computer program implements the steps of any one of claims 1 to 4 when the computer program is executed by a processor.
PCT/CN2021/071367 2020-02-11 2021-01-13 Voice data processing method and device for intelligent voice conversation system WO2021159904A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010086219.3A CN111402872B (en) 2020-02-11 2020-02-11 Voice data processing method and device for intelligent voice dialogue system
CN202010086219.3 2020-02-11

Publications (1)

Publication Number Publication Date
WO2021159904A1 true WO2021159904A1 (en) 2021-08-19

Family

ID=71428357

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/071367 WO2021159904A1 (en) 2020-02-11 2021-01-13 Voice data processing method and device for intelligent voice conversation system

Country Status (2)

Country Link
CN (1) CN111402872B (en)
WO (1) WO2021159904A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111402872B (en) * 2020-02-11 2023-12-19 升智信息科技(南京)有限公司 Voice data processing method and device for intelligent voice dialogue system
CN112800199A (en) * 2021-01-20 2021-05-14 广州佰锐网络科技有限公司 Method and system for supporting dynamic flexible configuration of verbal text content
CN113468303B (en) * 2021-06-25 2022-05-17 贝壳找房(北京)科技有限公司 Dialogue interaction processing method and computer-readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006130508A2 (en) * 2005-05-31 2006-12-07 Computer Associates Think, Inc. Executing a dialog using one or more xml components and one or more embedded scripts
CN106406844A (en) * 2015-08-03 2017-02-15 腾讯科技(深圳)有限公司 A method and a device for realizing a communication interaction platform official account menu
CN109002510A (en) * 2018-06-29 2018-12-14 北京百度网讯科技有限公司 A kind of dialog process method, apparatus, equipment and medium
CN109308320A (en) * 2018-07-20 2019-02-05 北京智能点科技有限公司 Conversation process configuration method is taken turns more by a kind of robot
CN109325150A (en) * 2018-08-06 2019-02-12 北京京东金融科技控股有限公司 Big data processing method based on expression formula, device, electronic equipment, storage medium
CN109885666A (en) * 2019-01-18 2019-06-14 科大国创软件股份有限公司 A kind of method and system of the intelligent sound customer service robot based on HTML5
CN110457011A (en) * 2019-08-15 2019-11-15 苏州思必驰信息科技有限公司 Software application method for customizing and exploitation server-side
CN111402872A (en) * 2020-02-11 2020-07-10 升智信息科技(南京)有限公司 Voice data processing method and device for intelligent voice conversation system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844499A (en) * 2016-12-26 2017-06-13 网易(杭州)网络有限公司 Many wheel session interaction method and devices
CN108153902A (en) * 2018-01-16 2018-06-12 和美(深圳)信息技术股份有限公司 More wheel session interaction method, apparatus, computer equipment and storage medium
US11056107B2 (en) * 2018-03-30 2021-07-06 International Business Machines Corporation Conversational framework
CN109101545A (en) * 2018-06-29 2018-12-28 北京百度网讯科技有限公司 Natural language processing method, apparatus, equipment and medium based on human-computer interaction
CN110534104A (en) * 2019-07-03 2019-12-03 平安科技(深圳)有限公司 Voice match method, electronic device, the computer equipment of Intelligent dialogue system
CN110442701B (en) * 2019-08-15 2022-08-05 思必驰科技股份有限公司 Voice conversation processing method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006130508A2 (en) * 2005-05-31 2006-12-07 Computer Associates Think, Inc. Executing a dialog using one or more xml components and one or more embedded scripts
CN106406844A (en) * 2015-08-03 2017-02-15 腾讯科技(深圳)有限公司 A method and a device for realizing a communication interaction platform official account menu
CN109002510A (en) * 2018-06-29 2018-12-14 北京百度网讯科技有限公司 A kind of dialog process method, apparatus, equipment and medium
CN109308320A (en) * 2018-07-20 2019-02-05 北京智能点科技有限公司 Conversation process configuration method is taken turns more by a kind of robot
CN109325150A (en) * 2018-08-06 2019-02-12 北京京东金融科技控股有限公司 Big data processing method based on expression formula, device, electronic equipment, storage medium
CN109885666A (en) * 2019-01-18 2019-06-14 科大国创软件股份有限公司 A kind of method and system of the intelligent sound customer service robot based on HTML5
CN110457011A (en) * 2019-08-15 2019-11-15 苏州思必驰信息科技有限公司 Software application method for customizing and exploitation server-side
CN111402872A (en) * 2020-02-11 2020-07-10 升智信息科技(南京)有限公司 Voice data processing method and device for intelligent voice conversation system

Also Published As

Publication number Publication date
CN111402872B (en) 2023-12-19
CN111402872A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
WO2021159904A1 (en) Voice data processing method and device for intelligent voice conversation system
CN111860753B (en) Directed acyclic graph-based framework for training models
US11816435B1 (en) Applied artificial intelligence technology for contextualizing words to a knowledge base using natural language processing
US20200395008A1 (en) Personality-Based Conversational Agents and Pragmatic Model, and Related Interfaces and Commercial Models
CN114424185A (en) Stop word data augmentation for natural language processing
JP2023520420A (en) A batching technique for handling imbalanced training data for chatbots
JP2023520416A (en) Improved techniques for out-of-domain (OOD) detection
JP2020140210A (en) Method and system to handle queries whose intention are unclear in conversational system
CN110046169A (en) Calculating based on structured query language sentence services implementation
KR102429407B1 (en) User-configured and customized interactive dialog application
JP6725535B2 (en) Computer-implemented method for displaying software type applications based on design specifications
CN115917553A (en) Entity-level data augmentation to enable robust named entity recognition in chat robots
CN111145745B (en) Conversation process customizing method and device
US20210073254A1 (en) Search-based natural language intent determination
US20200210505A1 (en) Electronic apparatus and controlling method thereof
CN115398436A (en) Noise data augmentation for natural language processing
JP2004530973A (en) Automatic SQL generation for frame completion
EP3550449A1 (en) Search method and electronic device using the method
CN116615727A (en) Keyword data augmentation tool for natural language processing
WO2023122444A1 (en) Language model prediction of api call invocations and verbal responses
Inupakutika et al. Integration of NLP and Speech-to-text Applications with Chatbots
Ouaddi et al. Architecture, tools, and dsls for developing conversational agents: An overview
CN113268593A (en) Intention classification and model training method and device, terminal and storage medium
US20230177263A1 (en) Identifying chat correction pairs for trainig model to automatically correct chat inputs
Hochberg et al. A flexible framework for developing mixed-initiative dialog systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21754589

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21754589

Country of ref document: EP

Kind code of ref document: A1