WO2021212929A1 - Multilingual interaction method and apparatus for active outbound intelligent speech robot - Google Patents

Multilingual interaction method and apparatus for active outbound intelligent speech robot Download PDF

Info

Publication number
WO2021212929A1
WO2021212929A1 PCT/CN2021/071368 CN2021071368W WO2021212929A1 WO 2021212929 A1 WO2021212929 A1 WO 2021212929A1 CN 2021071368 W CN2021071368 W CN 2021071368W WO 2021212929 A1 WO2021212929 A1 WO 2021212929A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
language
recognition engine
recognition
multilingual
Prior art date
Application number
PCT/CN2021/071368
Other languages
French (fr)
Chinese (zh)
Inventor
李训林
王帅
张晋
Original Assignee
升智信息科技(南京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 升智信息科技(南京)有限公司 filed Critical 升智信息科技(南京)有限公司
Publication of WO2021212929A1 publication Critical patent/WO2021212929A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention relates to the technical field of voice signal processing, in particular to an active outbound intelligent voice robot multilingual interaction method, device, computer equipment and storage medium.
  • the active outbound intelligent voice robot uses the Duolun dialogue method to guide users to dialogue on the premise of preset dialogue scenes, so as to achieve the purpose of marketing.
  • Its main core functional modules are speech recognition (ASR), speech synthesis (TTS), dialogue management (DM), natural language processing (NLP), and natural language understanding (NLU).
  • ASR speech recognition
  • TTS speech synthesis
  • DM dialogue management
  • NLP natural language processing
  • NLU natural language understanding
  • the present invention proposes an active outbound call intelligent voice robot multilingual interaction method, device, computer equipment and storage medium.
  • a multilingual interaction method for an active outbound intelligent voice robot which includes the following steps:
  • S20 Send the voice data to each language recognition engine to obtain the recognized text returned by each language recognition engine;
  • S40 Input the valid text into the NLU system, perform intention recognition on the valid text in the NLU system, and trigger an interactive action according to the result of the intention recognition.
  • the language recognition engine includes an English language recognition engine and a Chinese language recognition engine.
  • the voice data is sent to each language recognition engine, and the recognized text returned by each language recognition engine includes:
  • the voice data is sent to the Chinese language recognition engine, and the Chinese recognition text returned by the Chinese language recognition engine is obtained.
  • the method further includes:
  • each recognized text carries preset weight words or none of them carries preset weight words
  • each recognized text is called the corresponding voice model, and each voice model is used to detect the text score of each recognized text, and the text score of each recognized text is determined according to the recognition text.
  • the text score, hesitation time coefficient, and adjustment coefficient determine the comprehensive score of each recognized text, and the recognized text with the highest comprehensive score is determined as the effective text.
  • the method further includes:
  • the recording language is the default language, and the default language is used to trigger the interactive action.
  • the method further includes:
  • the non-empty text is determined as a valid text.
  • inputting valid text into the NLU system, and performing intent recognition on the valid text in the NLU system includes:
  • Input the valid text into the NLU system make the NLU system recognize the language corresponding to the valid text, obtain the current language, and use the language algorithm model corresponding to the current language to perform intent recognition on the valid text.
  • a multilingual interaction device for an active outbound intelligent voice robot including:
  • the first detection module is used to detect the voice data sent by the user when the user enters the multilingual setting scene
  • the sending module is used to send voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine;
  • the second detection module is used to detect whether each recognized text carries preset weight words when each recognized text is not empty text, and determine the text carrying the weight words as valid text;
  • the input module is used for inputting valid text into the NLU system, in the NLU system for intent recognition of the valid text, and triggering interactive actions according to the intent recognition result.
  • a computer device including a memory, a processor, and a computer program stored on the memory and capable of running on the processor.
  • the processor implements the active outbound call intelligent voice robot of any of the above-mentioned embodiments when the processor executes the computer program The steps of a multilingual interactive method.
  • a computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of the active outbound intelligent voice robot multilingual interaction method of any one of the above embodiments are realized.
  • the above-mentioned active outbound call intelligent voice robot multilingual interaction method, device, computer equipment and storage medium can detect the voice data sent by the user when the user enters the multilingual setting scene, and send the voice data to each language recognition engine to obtain each The recognized text returned by the language recognition engine, when each recognized text is not empty, detects whether each recognized text carries preset weight words, determines the text with weight words as valid text, and enters the valid text into NLU (Natural Language Understanding)
  • NLU Natural Language Understanding
  • the system in the NLU system, performs intent recognition on valid texts, and triggers interactive actions based on the intent recognition results to realize the multilingual services of the corresponding intelligent voice robot, improve the value of the intelligent voice robot, and improve the corresponding user experience.
  • Fig. 1 is a flow chart of an embodiment of an active outbound call intelligent voice robot multilingual interaction method
  • Fig. 2 is a schematic diagram of a working process of an intelligent voice robot according to an embodiment
  • Figure 3 is a language decision flow chart of an embodiment
  • FIG. 4 is a schematic structural diagram of an active outbound call intelligent voice robot multilingual interaction device according to an embodiment
  • Fig. 5 is a schematic diagram of a computer device according to an embodiment.
  • the multilingual interaction method of the active outbound intelligent voice robot can be applied to related intelligent voice robots.
  • the above intelligent voice robot can detect the voice data sent by the user when the user enters the multilingual setting scene, send the voice data to each language recognition engine, and obtain the recognized text returned by each language recognition engine, and each recognized text is not empty text At the time, detect whether each recognized text carries preset weight words, determine the text with weight words as valid text, input the valid text into the NLU (Natural Language Understanding) system, and perform intent recognition on the valid text in the NLU system.
  • the recognition result triggers interactive actions to realize the multilingual services of the corresponding intelligent voice robot, and increase the value of the intelligent voice robot, thereby enhancing the corresponding user experience.
  • an active outbound call intelligent voice robot multilingual interaction method is provided. Taking the method applied to an intelligent voice robot as an example for description, the method includes the following steps:
  • the intelligent voice robot can use the language recognition setting device to preset the scene of speech and art. In the scene where multilingual recognition is required, set the possible language types of speech, such as Chinese and English; when the user enters the scene where the intelligent speech robot is located , The intelligent voice robot will use the preset language recognition engine to identify the user's corresponding language.
  • S20 Send the voice data to each language recognition engine to obtain the recognized text returned by each language recognition engine.
  • the language recognition engine includes an English language recognition engine and a Chinese language recognition engine.
  • the above-mentioned English language recognition engine may be the default language recognition engine, and correspondingly, the English language may be the default language.
  • the voice data is sent to each language recognition engine, and the recognized text returned by each language recognition engine includes:
  • the English language recognition engine Send the voice data to the English language recognition engine to obtain the English recognition text returned by the English language recognition engine; further, the English recognition text can be recorded as TXT-EN;
  • the voice data is sent to the Chinese language recognition engine to obtain the Chinese recognized text returned by the Chinese language recognition engine. Further, the Chinese recognized text can be recorded as TXT-CN.
  • the above weighted words can be preset in the intelligent voice robot. Specifically, for calculation, a specific weight word needs to be set. If the preset speech scene of the intelligent voice robot includes a Chinese scene and an English scene, the weight word setting process may include:
  • the language expression analysis that may be used in Chinese and English in this scenario sets weight words.
  • the logic of weight words can be based on the user's voice habits and related psychological aspects in each context; for example, a normal person is receiving a call , If the first sentence is in a familiar language, it will be answered normally. If the opening robot (intelligent voice robot) asks "Hello, this is XX calling from XX, may I speak to XXX?", if the person answering the call understands English, it will answer this sentence smoothly. To understand from two levels. One is that the answer is semantically consistent with the answer of the opening statement; the other is that the answering speed is normal. Generally within 200ms to 500ms.
  • the intelligent voice robot uses this "range” to set adjustment coefficients for each language, and determines the hesitation time coefficient according to the time interval during the response of the corresponding user.
  • the core function of the weight word is that the speech recognition result contains the preset weight word, which indicates that the user is most likely to express the language; the weight word rule will be a part of the core logic of language judgment; according to the actual scene Analysis shows that in unfamiliar language scenarios, users will hesitate for 300-500ms. The more information, the longer the hesitation time. Therefore, the time threshold T is set according to the scene analysis; after T, the more time used, the lower the language familiarity.
  • S40 Input the valid text into the NLU system, perform intention recognition on the valid text in the NLU system, and trigger an interactive action according to the result of the intention recognition.
  • the above steps can input the valid text and the language corresponding to the valid text into the NLU system.
  • inputting valid text into the NLU system, and performing intent recognition on the valid text in the NLU system includes:
  • Input the valid text into the NLU system make the NLU system recognize the language corresponding to the valid text, obtain the current language, and use the language algorithm model corresponding to the current language to perform intent recognition on the valid text.
  • the NLU system obtains valid text, considering that different languages need to use different natural language processing models, so NLU enters different processing rules and models by acquiring language types; and uses the recognition results as the intentional response input, and then passes through pre-training A good algorithm model for different languages is used for intent recognition; after the intent recognition is completed, the action corresponding to the intent will be triggered, such as broadcasting; and the action will be processed according to the language, and the text-to-speech service (TTS) corresponding to the language will be called to generate the corresponding The voice is broadcast to complete the feedback exchange with the user.
  • TTS text-to-speech service
  • the above active outbound call intelligent voice robot multilingual interaction method can detect the voice data sent by the user when the user enters the multilingual setting scene, send the voice data to each language recognition engine, and obtain the recognition text returned by each language recognition engine.
  • each recognized text is not empty text, detect whether each recognized text carries preset weight words, determine the text with weight words as valid text, and input the valid text into the NLU (Natural Language Understanding) system, in the NLU system Recognize the intent of the valid text and trigger interactive actions according to the result of the intent recognition to realize the multilingual services of the corresponding intelligent voice robot, increase the value of the intelligent voice robot, and improve the corresponding user experience.
  • NLU Natural Language Understanding
  • the method further includes:
  • each recognized text carries preset weight words or none of them carries preset weight words
  • each recognized text is called the corresponding voice model, and each voice model is used to detect the text score of each recognized text, and the text score of each recognized text is determined according to the recognition text.
  • the text score, hesitation time coefficient, and adjustment coefficient determine the comprehensive score of each recognized text, and the recognized text with the highest comprehensive score is determined as the effective text.
  • the method further includes:
  • the recording language is the default language, and the default language is used to trigger the interactive action.
  • the method further includes:
  • the non-empty text is determined as a valid text.
  • the language recognition engine includes an English language recognition engine and a Chinese language recognition engine as an example.
  • the intelligent voice robot sends the user's voice (voice data) to the English language recognition engine and the Chinese language recognition engine for voice recognition, and the results returned by the English engine It is TXT-EN, and the result returned by the Chinese engine is TXT-CN.
  • the process of determining the corresponding recognition result (valid text) can include:
  • TXT-EN or TXT-CN are both empty texts (no valid results are recognized), it is considered that the current reception is likely to be noise, and the recording language is the default language (such as English);
  • TXT-EN or TXT-CN both return non-empty text, then weight calculation: Whether TXT-EN or TXT-CN contains the weighted words set in step 2, consider the weighted words to fit the scene, so if there are weighted words, It means that the recognition result is probably the user’s answer; therefore, the recognition result is the optimal result only when one of TXT-EN or TXT-CN contains a weighted word; if TXT-EN or TXT-CN both contain a weighted word or If not included at the same time, TXT-EN and TXT-CN will be called Chinese and English language models respectively, and the results will be scored to obtain sourceEN (TXT-EN) and sourceCN (TXT-CN); both Chinese and English For the different dimensions of the English model score, we find the adjustment coefficient s according to the actual scene data statistics; think that when sourceEN(TXT-EN)*s, the scoring dimension is close to sourceCN; also consider the hesitation time coefficient
  • the above-mentioned active outbound call intelligent voice robot multilingual interaction method completes the determination of the interactive language by identifying and determining the mode, and finally solves the existing defects of the existing intelligent voice robot in the multilingual scene, and the intelligent voice robot in the specific scene
  • the method for passively switching dialogue languages may include a speech synthesis module, a natural speech processing module, a natural language understanding module, a dialogue management module, and a speech recognition module.
  • different language recognition engines are set up according to the configuration of the scene, and the results of multilingual speech recognition are evaluated and scored through a unique language model. According to the evaluation results, the most suitable results are found, and the most suitable results are corresponded.
  • the language marks the language used by the user, and provides the best result and the language of the result to NLU as the basis for NLU judgment;
  • the intelligent voice robot will use different semantic matching algorithms to improve semantic recognition, and adjust the output language (TTS) of the intelligent voice robot according to the user's language;
  • hot words of nodes in specific scenarios of dialogue management This link is mainly to analyze the specific scenarios, if the user's English expression ability is weak, they may ask the AI's words skills, and summarize the corresponding words in the switching intention words.
  • High-frequency hot words are: Chinese, Mandarin, Chinese, English.
  • the relevant decision-making process can be referred to as shown in Figure 3.
  • make quick decision, quick decision logic, simple Implementations can call two or more ASR engines for identification respectively. Which kind of engine returns the result is judged as the language. If there is no result to make an acoustic model decision, the acoustic model is mainly used to solve the problem of similar pronunciation in different languages, and there may be correct return results for different ASRs. For example, the spoken Chinese " ⁇ (nei) ⁇ " sounds similar to the English nigger. English ASR may generally recognize it as a nigger. Decompose the sound into IPA, and then match the IPA.
  • Fig. 4 is a schematic structural diagram of an active outbound call intelligent voice robot multilingual interaction device according to an embodiment, including:
  • the first detection module 10 is configured to detect the voice data sent by the user when the user enters the multilingual setting scene
  • the sending module 20 is used to send voice data to each language recognition engine to obtain the recognized text returned by each language recognition engine;
  • the second detection module 30 is configured to detect whether each recognized text carries a preset weight word when each recognized text is not an empty text, and determine the text carrying the weight word as a valid text;
  • the input module 40 is used for inputting valid text into the NLU system, in the NLU system for intent recognition of the valid text, and triggering interactive actions according to the intent recognition result.
  • the active outbound intelligent voice robot multilingual interaction device For the specific limitation of the active outbound intelligent voice robot multilingual interaction device, please refer to the above limitation on the active outbound intelligent voice robot multilingual interaction method, which will not be repeated here.
  • the various modules in the above-mentioned active outbound intelligent voice robot multilingual interaction device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 5.
  • the computer equipment includes a processor, a memory, a network interface, a display screen and an input device connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize an active outbound call intelligent voice robot multilingual interaction method.
  • the display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen
  • the input device of the computer equipment can be a touch layer covered on the display screen, or it can be a button, trackball or touch pad set on the housing of the computer equipment , It can also be an external keyboard, touchpad, or mouse.
  • FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device is further provided.
  • the computer device includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the The program implements any active outbound call intelligent voice robot multilingual interaction method as in the foregoing embodiments.
  • the program can be stored in a non-volatile computer readable storage.
  • the program can be stored in the storage medium of the computer system and executed by at least one processor in the computer system to realize multilingual interaction including the active outbound intelligent voice robot described above
  • the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
  • a computer storage medium a computer readable storage medium, on which a computer program is stored, where the program is executed by the processor to implement any active type in the above-mentioned embodiments.
  • Multilingual interaction method for outbound intelligent voice robots Multilingual interaction method for outbound intelligent voice robots.
  • first ⁇ second ⁇ third involved in the embodiments of this application only distinguishes similar objects, and does not represent a specific order for the objects. Understandably, “first ⁇ second ⁇ third” “Three” can be interchanged in specific order or precedence when permitted. It should be understood that the objects distinguished by “first ⁇ second ⁇ third” can be interchanged under appropriate circumstances, so that the embodiments of the present application described herein can be implemented in an order other than those illustrated or described herein.

Abstract

A multilingual interaction method and apparatus for an active outbound intelligent speech robot, and a computer device and a storage medium. The method comprises: when a user enters a multilingual setting scene, detecting speech data sent by a user (S10); sending the speech data to language recognition engines to obtain recognition texts returned by the language recognition engines (S20); when the recognition texts are not empty texts, detecting whether the recognition texts carry a preset weight word, and determining the text carrying the weight word to be an effective text (S30); and inputting the effective text into an NLU system, performing intention recognition on the effective text in the NLU system, and triggering an interaction action according to an intention recognition result (S40). According to the method, a multilingual service of the intelligent speech robot can be implemented, thus improving the corresponding user experience.

Description

主动式外呼智能语音机器人多语种交互方法及装置Active outbound call intelligent voice robot multilingual interaction method and device 技术领域Technical field
本发明涉及语音信号处理技术领域,尤其涉及一种主动式外呼智能语音机器人多语种交互方法、装置、计算机设备和存储介质。The present invention relates to the technical field of voice signal processing, in particular to an active outbound intelligent voice robot multilingual interaction method, device, computer equipment and storage medium.
背景技术Background technique
随着云时代的来临,以及人工智能技术创新方面的不断改革,基于语音系统的智能机器人进入到各行各业;当前智能语音机器人大量替换了枯燥重复的客服工作,解放人工生产力;为各行各业智能回复提供了大量的遍历;With the advent of the cloud era and the continuous reform of artificial intelligence technology innovation, intelligent robots based on voice systems have entered all walks of life; the current intelligent voice robots have replaced a large number of boring and repetitive customer service tasks, liberating artificial productivity; for all walks of life Smart reply provides a lot of traversal;
主动式外呼智能语音机器人,在预设对话场景的前提下,采用多伦对话的方式引导用户进行对话,从而达到营销的目的。它的主要核心功能模块为语音识别(ASR)、语音合成(TTS)、对话管理(DM)、自然语言处理(NLP)、自然语言理解(NLU)。The active outbound intelligent voice robot uses the Duolun dialogue method to guide users to dialogue on the premise of preset dialogue scenes, so as to achieve the purpose of marketing. Its main core functional modules are speech recognition (ASR), speech synthesis (TTS), dialogue management (DM), natural language processing (NLP), and natural language understanding (NLU).
在海外市场,智能语音机器人大部分都是单一语种,单一语种支持用户占比95%。但在真实外呼场景中,依然存在一些用户在单一语种场景下的表达能力较弱,如东南亚地区,主体语言为英语,而侨居华人更加熟悉华语占比5%。在听到语音机器人播音为英文的情况下,此类用户会询问语音机器人,能否提供其他语言服务,例如中文?在这种场景下,因为语言问题导致产品价值降低,导致相应产品的用户体验差。In overseas markets, most intelligent voice robots are single-language, and single-language support accounts for 95% of users. However, in real outbound call scenarios, there are still some users whose expressive ability is weak in a single-language scenario. For example, in Southeast Asia, the main language is English, and the expatriate Chinese are more familiar with Chinese accounting for 5%. When hearing the voice robot broadcast in English, such users will ask the voice robot whether it can provide other language services, such as Chinese? In this scenario, the product value is reduced due to language problems, resulting in poor user experience of the corresponding product.
发明内容Summary of the invention
针对以上问题,本发明提出一种主动式外呼智能语音机器人多语种交互方法、装置、计算机设备和存储介质。In view of the above problems, the present invention proposes an active outbound call intelligent voice robot multilingual interaction method, device, computer equipment and storage medium.
为实现本发明的目的,提供一种主动式外呼智能语音机器人多语种交互方法,包括如下步骤:In order to achieve the purpose of the present invention, a multilingual interaction method for an active outbound intelligent voice robot is provided, which includes the following steps:
S10,在用户进入多语种设置场景时,检测用户发出的语音数据;S10, when the user enters the multilingual setting scene, detect the voice data sent by the user;
S20,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本;S20: Send the voice data to each language recognition engine to obtain the recognized text returned by each language recognition engine;
S30,在各个识别文本均不为空文本时,检测各个识别文本是否携带预设的权重词,将携带权重词的文本确定为有效文本;S30: When all recognized texts are not empty texts, detect whether each recognized text carries preset weight words, and determine the text carrying the weight words as valid text;
S40,将有效文本输入NLU系统,在NLU系统中对有效文本进行意图识别,根据意图识别结果触发交互动作。S40: Input the valid text into the NLU system, perform intention recognition on the valid text in the NLU system, and trigger an interactive action according to the result of the intention recognition.
在一个实施例中,所述语言识别引擎包括英文语言识别引擎和中文语言识别引擎。In one embodiment, the language recognition engine includes an English language recognition engine and a Chinese language recognition engine.
作为一个实施例,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本包括:As an embodiment, the voice data is sent to each language recognition engine, and the recognized text returned by each language recognition engine includes:
将语音数据发送至英文语言识别引擎,得到英文语言识别引擎返回的英文识别文本;Send the voice data to the English language recognition engine to obtain the English recognition text returned by the English language recognition engine;
将语音数据发送至中文语言识别引擎,得到中文语言识别引擎返回的中文识别文 本。The voice data is sent to the Chinese language recognition engine, and the Chinese recognition text returned by the Chinese language recognition engine is obtained.
在一个实施例中,在检测各个识别文本是否携带预设的权重词之后,还包括:In one embodiment, after detecting whether each recognized text carries preset weight words, the method further includes:
若各个识别文本均携带预设的权重词或者均不携带预设的权重词,则将各个识别文本分别调用相应的语音模型,采用各个语音模型检测各个识别文本的文本分数,根据各个识别文本的文本分数、犹豫时间系数和调整系数确定各个识别文本的综合评分,将综合评分最高的识别文本确定为有效文本。If each recognized text carries preset weight words or none of them carries preset weight words, then each recognized text is called the corresponding voice model, and each voice model is used to detect the text score of each recognized text, and the text score of each recognized text is determined according to the recognition text. The text score, hesitation time coefficient, and adjustment coefficient determine the comprehensive score of each recognized text, and the recognized text with the highest comprehensive score is determined as the effective text.
在一个实施例中,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本之后,还包括:In one embodiment, after the voice data is sent to each language recognition engine, and the recognition text returned by each language recognition engine is obtained, the method further includes:
若各个识别文本均为空文本,则记录使用语种为默认语种,采用默认语种触发交互动作。If each recognized text is an empty text, the recording language is the default language, and the default language is used to trigger the interactive action.
在一个实施例中,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本之后,还包括:In one embodiment, after the voice data is sent to each language recognition engine, and the recognition text returned by each language recognition engine is obtained, the method further includes:
若各个识别文本中存在一个非空文本,则将非空文本确定为有效文本。If there is a non-empty text in each recognized text, the non-empty text is determined as a valid text.
在一个实施例中,将有效文本输入NLU系统,在NLU系统中对有效文本进行意图识别包括:In one embodiment, inputting valid text into the NLU system, and performing intent recognition on the valid text in the NLU system includes:
将有效文本输入NLU系统,使NLU系统识别有效文本对应的语种,得到当前语种,采用当前语种对应的语种算法模型,对有效文本进行意图识别。Input the valid text into the NLU system, make the NLU system recognize the language corresponding to the valid text, obtain the current language, and use the language algorithm model corresponding to the current language to perform intent recognition on the valid text.
一种主动式外呼智能语音机器人多语种交互装置,包括:A multilingual interaction device for an active outbound intelligent voice robot, including:
第一检测模块,用于在用户进入多语种设置场景时,检测用户发出的语音数据;The first detection module is used to detect the voice data sent by the user when the user enters the multilingual setting scene;
发送模块,用于将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本;The sending module is used to send voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine;
第二检测模块,用于在各个识别文本均不为空文本时,检测各个识别文本是否携带预设的权重词,将携带权重词的文本确定为有效文本;The second detection module is used to detect whether each recognized text carries preset weight words when each recognized text is not empty text, and determine the text carrying the weight words as valid text;
输入模块,用于将有效文本输入NLU系统,在NLU系统中对有效文本进行意图识别,根据意图识别结果触发交互动作。The input module is used for inputting valid text into the NLU system, in the NLU system for intent recognition of the valid text, and triggering interactive actions according to the intent recognition result.
一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述任一实施例的主动式外呼智能语音机器人多语种交互方法的步骤。A computer device, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor. The processor implements the active outbound call intelligent voice robot of any of the above-mentioned embodiments when the processor executes the computer program The steps of a multilingual interactive method.
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一实施例的主动式外呼智能语音机器人多语种交互方法的步骤。A computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of the active outbound intelligent voice robot multilingual interaction method of any one of the above embodiments are realized.
上述主动式外呼智能语音机器人多语种交互方法、装置、计算机设备和存储介质,可以在用户进入多语种设置场景时,检测用户发出的语音数据,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本,在各个识别文本均不为空文本时,检测各个识别文本是否携带预设的权重词,将携带权重词的文本确定为有效文本,将有效文本输入NLU(自然语言理解)系统,在NLU系统中对有效文本进行意图识别,根据意图识别结果触发交互动作,以实现相应智能语音机器人的多语种服务,提高智能语音机器人的价值,从而提升相应的用户体验。The above-mentioned active outbound call intelligent voice robot multilingual interaction method, device, computer equipment and storage medium can detect the voice data sent by the user when the user enters the multilingual setting scene, and send the voice data to each language recognition engine to obtain each The recognized text returned by the language recognition engine, when each recognized text is not empty, detects whether each recognized text carries preset weight words, determines the text with weight words as valid text, and enters the valid text into NLU (Natural Language Understanding) The system, in the NLU system, performs intent recognition on valid texts, and triggers interactive actions based on the intent recognition results to realize the multilingual services of the corresponding intelligent voice robot, improve the value of the intelligent voice robot, and improve the corresponding user experience.
附图说明Description of the drawings
图1是一个实施例的主动式外呼智能语音机器人多语种交互方法流程图;Fig. 1 is a flow chart of an embodiment of an active outbound call intelligent voice robot multilingual interaction method;
图2是一个实施例的智能语音机器人工作过程示意图;Fig. 2 is a schematic diagram of a working process of an intelligent voice robot according to an embodiment;
图3是一个实施例的语种决策流程图;Figure 3 is a language decision flow chart of an embodiment;
图4是一个实施例的主动式外呼智能语音机器人多语种交互装置结构示意图;FIG. 4 is a schematic structural diagram of an active outbound call intelligent voice robot multilingual interaction device according to an embodiment;
图5是一个实施例的计算机设备示意图。Fig. 5 is a schematic diagram of a computer device according to an embodiment.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer and clearer, the following further describes the application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。The reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.
本申请提供的主动式外呼智能语音机器人多语种交互方法,可以应用于相关智能语音机器人。上述智能语音机器人可以在用户进入多语种设置场景时,检测用户发出的语音数据,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本,在各个识别文本均不为空文本时,检测各个识别文本是否携带预设的权重词,将携带权重词的文本确定为有效文本,将有效文本输入NLU(自然语言理解)系统,在NLU系统中对有效文本进行意图识别,根据意图识别结果触发交互动作,以实现相应智能语音机器人的多语种服务,提高智能语音机器人的价值,从而提升相应的用户体验。The multilingual interaction method of the active outbound intelligent voice robot provided in this application can be applied to related intelligent voice robots. The above intelligent voice robot can detect the voice data sent by the user when the user enters the multilingual setting scene, send the voice data to each language recognition engine, and obtain the recognized text returned by each language recognition engine, and each recognized text is not empty text At the time, detect whether each recognized text carries preset weight words, determine the text with weight words as valid text, input the valid text into the NLU (Natural Language Understanding) system, and perform intent recognition on the valid text in the NLU system. The recognition result triggers interactive actions to realize the multilingual services of the corresponding intelligent voice robot, and increase the value of the intelligent voice robot, thereby enhancing the corresponding user experience.
在一个实施例中,如图1所示,提供了一种主动式外呼智能语音机器人多语种交互方法,以该方法应用于智能语音机器人为例进行说明,包括以下步骤:In one embodiment, as shown in FIG. 1, an active outbound call intelligent voice robot multilingual interaction method is provided. Taking the method applied to an intelligent voice robot as an example for description, the method includes the following steps:
S10,在用户进入多语种设置场景时,检测用户发出的语音数据。S10: When the user enters the multilingual setting scene, detect the voice data sent by the user.
智能语音机器人可以通过语种识别设置器,预设话术场景,在需要进行多语种识别的场景中,设置话术可能语种类型,如中文和英文等;当用户进入到智能语音机器人所在的场景时,智能语音机器人将会使用预设的语种识别引擎进行识别用户对应的语种。The intelligent voice robot can use the language recognition setting device to preset the scene of speech and art. In the scene where multilingual recognition is required, set the possible language types of speech, such as Chinese and English; when the user enters the scene where the intelligent speech robot is located , The intelligent voice robot will use the preset language recognition engine to identify the user's corresponding language.
S20,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本。S20: Send the voice data to each language recognition engine to obtain the recognized text returned by each language recognition engine.
在一个实施例中,所述语言识别引擎包括英文语言识别引擎和中文语言识别引擎。上述英文语言识别引擎可以为默认的语言识别引擎,相应的,英文语种可以为默认语种。In one embodiment, the language recognition engine includes an English language recognition engine and a Chinese language recognition engine. The above-mentioned English language recognition engine may be the default language recognition engine, and correspondingly, the English language may be the default language.
具体地,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本包括:Specifically, the voice data is sent to each language recognition engine, and the recognized text returned by each language recognition engine includes:
将语音数据发送至英文语言识别引擎,得到英文语言识别引擎返回的英文识别文本;进一步地,可以将英文识别文本记为TXT-EN;Send the voice data to the English language recognition engine to obtain the English recognition text returned by the English language recognition engine; further, the English recognition text can be recorded as TXT-EN;
将语音数据发送至中文语言识别引擎,得到中文语言识别引擎返回的中文识别文本进一步地,可以将中文识别文本记为TXT-CN。The voice data is sent to the Chinese language recognition engine to obtain the Chinese recognized text returned by the Chinese language recognition engine. Further, the Chinese recognized text can be recorded as TXT-CN.
S30,在各个识别文本均不为空文本时,检测各个识别文本是否携带预设的权重词,将携带权重词的文本确定为有效文本。S30: When each recognized text is not an empty text, detect whether each recognized text carries a preset weight word, and determine the text carrying the weight word as a valid text.
上述权重词可以在智能语音机器人中预设。具体地,为了计算,需要设置特定的权重词,若智能语音机器人预设的话术场景包括中文场景和英文场景,则权重词的设置过程可以包括:The above weighted words can be preset in the intelligent voice robot. Specifically, for calculation, a specific weight word needs to be set. If the preset speech scene of the intelligent voice robot includes a Chinese scene and an English scene, the weight word setting process may include:
根据实际场景分析,在该场景下中文和英文可能使用的语言表达分析设置权重词,权重词涉及逻辑可以依据用户的在各个语境中的语音习惯和相关心理层面;比如正常人在接到电话,第一句话如果是熟悉的语言,会正常回答。如果开场白机器人(智能语音机器人)询问“Hello,this is XX calling from XX,may I speak to XXX?”,若接电话的人懂英语,那么会顺利的回答这句话。从两个层面上来理解。一是回答的话语义上符合开场白的回答;二是,回答的速度是正常的。一般200ms到500ms内。如果是不懂英文的人接电话,首先会“木讷”一下,然后会用中文回答“请问可以说中文吗”或者“你打错了”。对于不同语言的开场白,一般人的响应回答是有一个范围的。智能语音机器人利用这个“范围”,设置各个语种的调整系数,依据相应用户响应回答过程中的时间间隔确定犹豫时间系数。According to the actual scenario analysis, the language expression analysis that may be used in Chinese and English in this scenario sets weight words. The logic of weight words can be based on the user's voice habits and related psychological aspects in each context; for example, a normal person is receiving a call , If the first sentence is in a familiar language, it will be answered normally. If the opening robot (intelligent voice robot) asks "Hello, this is XX calling from XX, may I speak to XXX?", if the person answering the call understands English, it will answer this sentence smoothly. To understand from two levels. One is that the answer is semantically consistent with the answer of the opening statement; the other is that the answering speed is normal. Generally within 200ms to 500ms. If someone who doesn't understand English answers the phone, he will first "mute", and then answer in Chinese "May I speak Chinese" or "You made a wrong number". For opening remarks in different languages, the average person's response has a range. The intelligent voice robot uses this "range" to set adjustment coefficients for each language, and determines the hesitation time coefficient according to the time interval during the response of the corresponding user.
权重词的核心作用是,语音识别结果中包含了预设的权重词,则表明该用户极大可能表述的是该种语言;权重词规则将作为语种判定的核心逻辑的一个部分;根据实际场景分析,在不熟悉的语种场景下,用户会犹豫300-500ms,信息越多犹豫时间越长,因此根据场景分析设置时间阈值T;超过T后时间使用越多表示语言熟悉度越低。The core function of the weight word is that the speech recognition result contains the preset weight word, which indicates that the user is most likely to express the language; the weight word rule will be a part of the core logic of language judgment; according to the actual scene Analysis shows that in unfamiliar language scenarios, users will hesitate for 300-500ms. The more information, the longer the hesitation time. Therefore, the time threshold T is set according to the scene analysis; after T, the more time used, the lower the language familiarity.
S40,将有效文本输入NLU系统,在NLU系统中对有效文本进行意图识别,根据意图识别结果触发交互动作。S40: Input the valid text into the NLU system, perform intention recognition on the valid text in the NLU system, and trigger an interactive action according to the result of the intention recognition.
上述步骤可以将有效文本以及有效文本对应的语种输入NLU系统。The above steps can input the valid text and the language corresponding to the valid text into the NLU system.
在一个实施例中,将有效文本输入NLU系统,在NLU系统中对有效文本进行意图识别包括:In one embodiment, inputting valid text into the NLU system, and performing intent recognition on the valid text in the NLU system includes:
将有效文本输入NLU系统,使NLU系统识别有效文本对应的语种,得到当前语种,采用当前语种对应的语种算法模型,对有效文本进行意图识别。Input the valid text into the NLU system, make the NLU system recognize the language corresponding to the valid text, obtain the current language, and use the language algorithm model corresponding to the current language to perform intent recognition on the valid text.
具体地,NLU系统获取有效文本,考虑到不同语言需要使用不同的自然语言处理模型,因此NLU通过获取语种进入到不同的处理规则和模型中;并以识别结果作为意图回应输入,再通过预先训练好的不同语种算法模型,进行意图识别;意图识别完成后,将触发该意图对应的动作,例如播音;并根据语种针对动作进行处理,调用该语种对应的文本转语音服务(TTS)生成对应的语音进行播音,完成与用户的反馈交流。Specifically, the NLU system obtains valid text, considering that different languages need to use different natural language processing models, so NLU enters different processing rules and models by acquiring language types; and uses the recognition results as the intentional response input, and then passes through pre-training A good algorithm model for different languages is used for intent recognition; after the intent recognition is completed, the action corresponding to the intent will be triggered, such as broadcasting; and the action will be processed according to the language, and the text-to-speech service (TTS) corresponding to the language will be called to generate the corresponding The voice is broadcast to complete the feedback exchange with the user.
上述主动式外呼智能语音机器人多语种交互方法,可以在用户进入多语种设置场景时,检测用户发出的语音数据,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本,在各个识别文本均不为空文本时,检测各个识别文本是否携带预设的权重词,将携带权重词的文本确定为有效文本,将有效文本输入NLU(自然语言理解)系统,在NLU系统中对有效文本进行意图识别,根据意图识别结果触发交互动作,以实现相应智能语音机器人的多语种服务,提高智能语音机器人的价值,从而提升相应的用户体验。The above active outbound call intelligent voice robot multilingual interaction method can detect the voice data sent by the user when the user enters the multilingual setting scene, send the voice data to each language recognition engine, and obtain the recognition text returned by each language recognition engine. When each recognized text is not empty text, detect whether each recognized text carries preset weight words, determine the text with weight words as valid text, and input the valid text into the NLU (Natural Language Understanding) system, in the NLU system Recognize the intent of the valid text and trigger interactive actions according to the result of the intent recognition to realize the multilingual services of the corresponding intelligent voice robot, increase the value of the intelligent voice robot, and improve the corresponding user experience.
在一个实施例中,在检测各个识别文本是否携带预设的权重词之后,还包括:In one embodiment, after detecting whether each recognized text carries preset weight words, the method further includes:
若各个识别文本均携带预设的权重词或者均不携带预设的权重词,则将各个识别文本分别调用相应的语音模型,采用各个语音模型检测各个识别文本的文本分数,根据各 个识别文本的文本分数、犹豫时间系数和调整系数确定各个识别文本的综合评分,将综合评分最高的识别文本确定为有效文本。If each recognized text carries preset weight words or none of them carries preset weight words, then each recognized text is called the corresponding voice model, and each voice model is used to detect the text score of each recognized text, and the text score of each recognized text is determined according to the recognition text. The text score, hesitation time coefficient, and adjustment coefficient determine the comprehensive score of each recognized text, and the recognized text with the highest comprehensive score is determined as the effective text.
在一个实施例中,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本之后,还包括:In one embodiment, after the voice data is sent to each language recognition engine, and the recognition text returned by each language recognition engine is obtained, the method further includes:
若各个识别文本均为空文本,则记录使用语种为默认语种,采用默认语种触发交互动作。If each recognized text is an empty text, the recording language is the default language, and the default language is used to trigger the interactive action.
在一个实施例中,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本之后,还包括:In one embodiment, after the voice data is sent to each language recognition engine, and the recognition text returned by each language recognition engine is obtained, the method further includes:
若各个识别文本中存在一个非空文本,则将非空文本确定为有效文本。If there is a non-empty text in each recognized text, the non-empty text is determined as a valid text.
具体地,语言识别引擎包括英文语言识别引擎和中文语言识别引擎为例,智能语音机器人将用户语音(语音数据)分别发送到英文语言识别引擎和中文语言识别引擎进行语音识别,英文引擎返回的结果是TXT-EN,中文引擎返回的结果是TXT-CN。相应识别结果(有效文本)的确定过程可以包括:Specifically, the language recognition engine includes an English language recognition engine and a Chinese language recognition engine as an example. The intelligent voice robot sends the user's voice (voice data) to the English language recognition engine and the Chinese language recognition engine for voice recognition, and the results returned by the English engine It is TXT-EN, and the result returned by the Chinese engine is TXT-CN. The process of determining the corresponding recognition result (valid text) can include:
场景一:TXT-EN或TXT-CN都是空文本(未识别出有效结果),则认为当前收音大概率是噪音,则记录使用语种为默认语种(如英文);Scenario 1: TXT-EN or TXT-CN are both empty texts (no valid results are recognized), it is considered that the current reception is likely to be noise, and the recording language is the default language (such as English);
场景二:TXT-EN或TXT-CN其中一个是空文本(未识别出有效结果),则认为有文本返回的是正确语种,并记录语种类型;Scenario 2: If one of TXT-EN or TXT-CN is empty text (no valid result is recognized), it is considered that the text returned is the correct language, and the language type is recorded;
场景三:TXT-EN或TXT-CN都返回非空文本,则进行权重计算:TXT-EN或TXT-CN是否包含步骤二中设置的权重词,考虑权重词契合场景,因此如果出现权重词,则表示识别结果大概率为用户回答;因此仅当TXT-EN或TXT-CN中的一个包含权重词时则该识别结果为最优结果;如果TXT-EN或TXT-CN同时都包含权重词或同时不包含时,则将TXT-EN和TXT-CN分别调用中文和英文的语言模型,针对各自返回的结果进行打分得到sourceEN(TXT-EN),sourceCN(TXT-CN);同时考虑到中文和英文模型得分的不同维度,我们根据实际场景数据统计找到调整系数s;认为当sourceEN(TXT-EN)*s后,评分分数维度接近sourceCN;同时考虑不同语言处理时的犹豫时间系数Δt=DelayTime(用户迟疑时间)-T(预设的时间阈值),在和敏感系数搭配使用得出最合适的得分处理;通过大量数据验证验证得出经验值a(敏感系数)、s(调整系数)最终得出得分公式(综合评分的计算工作):Scenario 3: TXT-EN or TXT-CN both return non-empty text, then weight calculation: Whether TXT-EN or TXT-CN contains the weighted words set in step 2, consider the weighted words to fit the scene, so if there are weighted words, It means that the recognition result is probably the user’s answer; therefore, the recognition result is the optimal result only when one of TXT-EN or TXT-CN contains a weighted word; if TXT-EN or TXT-CN both contain a weighted word or If not included at the same time, TXT-EN and TXT-CN will be called Chinese and English language models respectively, and the results will be scored to obtain sourceEN (TXT-EN) and sourceCN (TXT-CN); both Chinese and English For the different dimensions of the English model score, we find the adjustment coefficient s according to the actual scene data statistics; think that when sourceEN(TXT-EN)*s, the scoring dimension is close to sourceCN; also consider the hesitation time coefficient Δt=DelayTime( User hesitation time)-T (preset time threshold), used in conjunction with the sensitivity coefficient to obtain the most suitable score processing; through a large amount of data verification and verification, the empirical values a (sensitivity coefficient) and s (adjustment coefficient) are finally obtained Formula for scoring (calculation of comprehensive score):
英文综合评分为:sourceEN(TXT-EN)*s-a △t The English comprehensive score is: sourceEN(TXT-EN)*sa △t
中文综合评分为:sourceCN(TXT-CN)Chinese comprehensive score: sourceCN (TXT-CN)
对比英文评分(英文综合评分)和中文评分(中文综合评分),并选取得分高的结果作为用户识别结果及语种。Compare English score (English comprehensive score) and Chinese score (Chinese comprehensive score), and select the result with high score as the user identification result and language.
在一个实施例中,上述主动式外呼智能语音机器人多语种交互方法通过识别判定的模式完成对交互语种的判定,最终解决现有智能语音机器人在多语种场景中存在缺陷,智能语音机器人特定场景下被动式切换对话语言的方法,可以包括语音合成模块、自然语音处理模块、自然语言理解模块、对话管理模块、语音识别模块。In one embodiment, the above-mentioned active outbound call intelligent voice robot multilingual interaction method completes the determination of the interactive language by identifying and determining the mode, and finally solves the existing defects of the existing intelligent voice robot in the multilingual scene, and the intelligent voice robot in the specific scene The method for passively switching dialogue languages may include a speech synthesis module, a natural speech processing module, a natural language understanding module, a dialogue management module, and a speech recognition module.
参考图2所示,在智能语音机器人使用中,针对不同的场景,配置需要支持的语音种类:比如支持英文和中文。当智能机器人执行到该场景后,会获取到该配置。根据该配置在对应的场景中进行不同的语言的处理;一般来说,该配置为语言处理逻辑执行的判断条件。As shown in Figure 2, in the use of intelligent voice robots, for different scenarios, configure the types of voices that need to be supported: for example, English and Chinese are supported. When the intelligent robot executes the scene, it will get the configuration. According to this configuration, different language processing is performed in the corresponding scene; generally speaking, this configuration is the judgment condition for the execution of language processing logic.
语音识别层面,根据场景的配置,设置不同的语种识别引擎,将多语种的语音识别结果,通过特有语言模型进行评估打分,根据评估结果,找出最合适的结果,并将最合适的结果对应的语言标记用户使用的语言,将最佳结果和该结果的语言提供给NLU,作为NLU判断的依据;;At the speech recognition level, different language recognition engines are set up according to the configuration of the scene, and the results of multilingual speech recognition are evaluated and scored through a unique language model. According to the evaluation results, the most suitable results are found, and the most suitable results are corresponded. The language marks the language used by the user, and provides the best result and the language of the result to NLU as the basis for NLU judgment;
自然语言理解层面,兼容不同的语种,根据语音识别流程中判定的语种,智能语音机器人会使用不同的语义匹配算法,提升语义识别度,根据用户语言调整智能语音机器人的输出语言(TTS);At the level of natural language understanding, it is compatible with different languages. According to the languages determined in the speech recognition process, the intelligent voice robot will use different semantic matching algorithms to improve semantic recognition, and adjust the output language (TTS) of the intelligent voice robot according to the user's language;
心理学层面,正常人在接到电话,第一句话如果是熟悉的语言,会正常回答。如果开场白机器人询问“Hello,this is XX calling from XX,may I speak to XXX?”,如果接电话的人懂英语,那么会顺利的回答这句话。从两个层面上来理解。一是回答的话语义上符合开场白的回答;二是,回答的速度是正常的。一般200ms到500ms内。如果是不懂英文的人接电话,首先会“木讷”一下,然后会用中文回答“请问可以说中文吗”或者“你打错了”。对于不同语言的开场白,一般人的响应回答是有一个范围的。我们利用这个“范围”Psychologically, when a normal person receives a phone call, if the first sentence is in a familiar language, they will answer normally. If the robot asks "Hello, this is XX calling from XX, may I speak to XXX?" in the opening remarks, if the person answering the call understands English, it will answer this sentence smoothly. To understand from two levels. One is that the answer is semantically consistent with the answer of the opening statement; the other is that the answering speed is normal. Generally within 200ms to 500ms. If someone who doesn't understand English answers the phone, he will first "mute", and then answer in Chinese "May I speak Chinese" or "You made a wrong number". For opening remarks in different languages, the average person's response has a range. We use this "scope"
中文话术的搭建:在对话管理系统预设话术为英文的情况下,为了覆盖更多语言场景,针对此英文话术预设场景前提下,新增一套对应的中文话术场景;Construction of Chinese dialect: In the case that the dialogue management system defaults to English, in order to cover more language scenarios, a set of corresponding Chinese dialect scenarios is added under the premise of this English dialect preset scenario;
中文意图搭建:原有主动外呼开场白场景下,新增一个领域即客户意图场景分支为:客户想说中文的意图分支。该意图分支下需要配置客户对应场景的可能说法,如:可以说华语吗?can you speak chinese?Chinese intention construction: In the original active outbound opening scene, a new field is added, namely the customer intention scene branch: the intention branch of the customer who wants to speak Chinese. Under this intention branch, you need to configure the possible statements of the customer's corresponding scenario, such as: Can you speak Chinese? can you speak chinese?
对话管理特定场景下节点热词的定义:此环节主要是分析特定场景下,如果用户英文的表达能力较弱的情况下,可能会反问AI的话术,并总结归纳该切换意图话术中对应的高频热词为:中文、华语、chinese、English。The definition of hot words of nodes in specific scenarios of dialogue management: This link is mainly to analyze the specific scenarios, if the user's English expression ability is weak, they may ask the AI's words skills, and summarize the corresponding words in the switching intention words. High-frequency hot words are: Chinese, Mandarin, Chinese, English.
在原英文引擎的场景下针对特定场景新增一层中文引擎:针对英文的开场白,在播音英文的情况下,不会说英文的客户一般会反问语音系统,您可以说中文吗?针对这种特定场景,我们需要外加一层中文ASR引擎,作为5%场景的补充;In the original English engine scenario, a new layer of Chinese engine is added for specific scenarios: For English opening remarks, in the case of broadcasting in English, customers who do not speak English will generally ask the voice system, can you speak Chinese? For this specific scenario, we need to add a layer of Chinese ASR engine as a supplement to the 5% scenario;
其优势特征在于:覆盖多语言用户群体,使用巧妙的方法避开使用支持双语引擎带来的识别负面影响,如整体系统的反应率、针对95%场景下的用户识别准确率等。Its advantage features are: covering multilingual user groups, using clever methods to avoid the negative impact of recognition caused by the use of bilingual engines, such as the overall system response rate, and the accuracy of user recognition in 95% scenarios.
在一个示例中,上述主动式外呼智能语音机器人多语种交互方法的应用过程中,获得用于的语音数据后,相关决策过程可以参考图3所示,首先做快速决策,快速决策逻辑,简单实现可以分别调用两种或多种ASR引擎做识别。哪一种引擎返回了结果则判定为该语言。如果没有结果做声学模型决策,声学模型主要是用于解决不同语言类似发音的问题,对于不同的ASR可能都有正确的返回结果。比如说中文口头语“那(nei)个”跟英文中的nigger发音很类似。英文ASR一般可能会将其识别为nigger。将声音分解为IPA,然后对IPA做匹配。In an example, in the application process of the above-mentioned active outbound call intelligent voice robot multilingual interaction method, after the voice data for use is obtained, the relevant decision-making process can be referred to as shown in Figure 3. First, make quick decision, quick decision logic, simple Implementations can call two or more ASR engines for identification respectively. Which kind of engine returns the result is judged as the language. If there is no result to make an acoustic model decision, the acoustic model is mainly used to solve the problem of similar pronunciation in different languages, and there may be correct return results for different ASRs. For example, the spoken Chinese "那(nei)个" sounds similar to the English nigger. English ASR may generally recognize it as a nigger. Decompose the sound into IPA, and then match the IPA.
参考图4所示,图4为一个实施例的主动式外呼智能语音机器人多语种交互装置结构示意图,包括:Referring to Fig. 4, Fig. 4 is a schematic structural diagram of an active outbound call intelligent voice robot multilingual interaction device according to an embodiment, including:
第一检测模块10,用于在用户进入多语种设置场景时,检测用户发出的语音数据;The first detection module 10 is configured to detect the voice data sent by the user when the user enters the multilingual setting scene;
发送模块20,用于将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本;The sending module 20 is used to send voice data to each language recognition engine to obtain the recognized text returned by each language recognition engine;
第二检测模块30,用于在各个识别文本均不为空文本时,检测各个识别文本是否携带预设的权重词,将携带权重词的文本确定为有效文本;The second detection module 30 is configured to detect whether each recognized text carries a preset weight word when each recognized text is not an empty text, and determine the text carrying the weight word as a valid text;
输入模块40,用于将有效文本输入NLU系统,在NLU系统中对有效文本进行意图识别,根据意图识别结果触发交互动作。The input module 40 is used for inputting valid text into the NLU system, in the NLU system for intent recognition of the valid text, and triggering interactive actions according to the intent recognition result.
关于主动式外呼智能语音机器人多语种交互装置的具体限定可以参见上文中对于主动式外呼智能语音机器人多语种交互方法的限定,在此不再赘述。上述主动式外呼智能语音机器人多语种交互装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the active outbound intelligent voice robot multilingual interaction device, please refer to the above limitation on the active outbound intelligent voice robot multilingual interaction method, which will not be repeated here. The various modules in the above-mentioned active outbound intelligent voice robot multilingual interaction device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是终端,其内部结构图可以如图5所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种主动式外呼智能语音机器人多语种交互方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。In one embodiment, a computer device is provided. The computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 5. The computer equipment includes a processor, a memory, a network interface, a display screen and an input device connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize an active outbound call intelligent voice robot multilingual interaction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, or it can be a button, trackball or touch pad set on the housing of the computer equipment , It can also be an external keyboard, touchpad, or mouse.
本领域技术人员可以理解,图5中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
基于如上所述的示例,在一个实施例中还提供一种计算机设备,该计算机设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现如上述各实施例中的任意一种主动式外呼智能语音机器人多语种交互方法。Based on the above example, in one embodiment, a computer device is further provided. The computer device includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the The program implements any active outbound call intelligent voice robot multilingual interaction method as in the foregoing embodiments.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性的计算机可读取存储介质中,如本发明实施例中,该程序可存储于计算机系统的存储介质中,并被该计算机系统中的至少一个处理器执行,以实现包括如上述主动式外呼智能语音机器人多语种交互方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The program can be stored in a non-volatile computer readable storage. In the medium, as in the embodiment of the present invention, the program can be stored in the storage medium of the computer system and executed by at least one processor in the computer system to realize multilingual interaction including the active outbound intelligent voice robot described above The flow of an embodiment of the method. Wherein, the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
据此,在一个实施例中还提供一种计算机存储介质计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如上述各实施例中的任意一种主动式外呼智能语音机器人多语种交互方法。Accordingly, in one embodiment, there is also provided a computer storage medium, a computer readable storage medium, on which a computer program is stored, where the program is executed by the processor to implement any active type in the above-mentioned embodiments. Multilingual interaction method for outbound intelligent voice robots.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.
需要说明的是,本申请实施例所涉及的术语“第一\第二\第三”仅仅是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序。应该理解“第一\第二\第三”区分的对象在适当情况下可以互 换,以使这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。It should be noted that the term "first\second\third" involved in the embodiments of this application only distinguishes similar objects, and does not represent a specific order for the objects. Understandably, "first\second\third" "Three" can be interchanged in specific order or precedence when permitted. It should be understood that the objects distinguished by "first\second\third" can be interchanged under appropriate circumstances, so that the embodiments of the present application described herein can be implemented in an order other than those illustrated or described herein.
本申请实施例的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或模块的过程、方法、装置、产品或设备没有限定于已列出的步骤或模块,而是可选地还包括没有列出的步骤或模块,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或模块。The terms "include" and "have" and any variations thereof in the embodiments of the present application are intended to cover non-exclusive inclusions. For example, a process, method, device, product, or device that includes a series of steps or modules is not limited to the listed steps or modules, but optionally includes unlisted steps or modules, or optionally also includes Other steps or modules inherent to these processes, methods, products or equipment.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (10)

  1. 一种主动式外呼智能语音机器人多语种交互方法,其特征在于,包括如下步骤:A multilingual interaction method for an active outbound intelligent voice robot, which is characterized in that it includes the following steps:
    S10,在用户进入多语种设置场景时,检测用户发出的语音数据;S10: When the user enters the multilingual setting scene, detect the voice data sent by the user;
    S20,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本;S20: Send the voice data to each language recognition engine to obtain the recognized text returned by each language recognition engine;
    S30,在各个识别文本均不为空文本时,检测各个识别文本是否携带预设的权重词,将携带权重词的文本确定为有效文本;S30: When all recognized texts are not empty texts, detect whether each recognized text carries preset weight words, and determine the text carrying the weight words as valid text;
    S40,将有效文本输入NLU系统,在NLU系统中对有效文本进行意图识别,根据意图识别结果触发交互动作。S40: Input the valid text into the NLU system, perform intention recognition on the valid text in the NLU system, and trigger an interactive action according to the result of the intention recognition.
  2. 根据权利要求1所述的主动式外呼智能语音机器人多语种交互方法,其特征在于,所述语言识别引擎包括英文语言识别引擎和中文语言识别引擎。The active outbound call intelligent voice robot multilingual interaction method according to claim 1, wherein the language recognition engine includes an English language recognition engine and a Chinese language recognition engine.
  3. 根据权利要求2所述的主动式外呼智能语音机器人多语种交互方法,其特征在于,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本包括:The active outbound call intelligent voice robot multilingual interaction method according to claim 2, wherein the voice data is sent to each language recognition engine, and the recognition text returned by each language recognition engine includes:
    将语音数据发送至英文语言识别引擎,得到英文语言识别引擎返回的英文识别文本;Send the voice data to the English language recognition engine to obtain the English recognition text returned by the English language recognition engine;
    将语音数据发送至中文语言识别引擎,得到中文语言识别引擎返回的中文识别文本。The voice data is sent to the Chinese language recognition engine, and the Chinese recognized text returned by the Chinese language recognition engine is obtained.
  4. 根据权利要求1所述的主动式外呼智能语音机器人多语种交互方法,其特征在于,在检测各个识别文本是否携带预设的权重词之后,还包括:The active outbound call intelligent voice robot multilingual interaction method according to claim 1, characterized in that, after detecting whether each recognized text carries preset weight words, it further comprises:
    若各个识别文本均携带预设的权重词或者均不携带预设的权重词,则将各个识别文本分别调用相应的语音模型,采用各个语音模型检测各个识别文本的文本分数,根据各个识别文本的文本分数、犹豫时间系数和调整系数确定各个识别文本的综合评分,将综合评分最高的识别文本确定为有效文本。If each recognized text carries preset weight words or none of the preset weight words, then each recognized text is called the corresponding voice model, and each voice model is used to detect the text score of each recognized text. The text score, hesitation time coefficient and adjustment coefficient determine the comprehensive score of each recognized text, and the recognized text with the highest comprehensive score is determined as the effective text.
  5. 根据权利要求1所述的主动式外呼智能语音机器人多语种交互方法,其特征在于,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本之后,还包括:The active outbound call intelligent voice robot multilingual interaction method according to claim 1, characterized in that the voice data is sent to each language recognition engine, and after the recognition text returned by each language recognition engine is obtained, the method further comprises:
    若各个识别文本均为空文本,则记录使用语种为默认语种,采用默认语种触发交互动作。If each recognized text is an empty text, the recording language is the default language, and the default language is used to trigger the interactive action.
  6. 根据权利要求1所述的主动式外呼智能语音机器人多语种交互方法,其特征在于,将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本之后,还包括:The active outbound call intelligent voice robot multilingual interaction method according to claim 1, characterized in that the voice data is sent to each language recognition engine, and after the recognition text returned by each language recognition engine is obtained, the method further comprises:
    若各个识别文本中存在一个非空文本,则将非空文本确定为有效文本。If there is a non-empty text in each recognized text, the non-empty text is determined as a valid text.
  7. 根据权利要求1所述的主动式外呼智能语音机器人多语种交互方法,其特征在于,将有效文本输入NLU系统,在NLU系统中对有效文本进行意图识别包括:The active outbound call intelligent voice robot multilingual interaction method according to claim 1, wherein inputting valid text into the NLU system, and performing intent recognition on the valid text in the NLU system includes:
    将有效文本输入NLU系统,使NLU系统识别有效文本对应的语种,得到当前语种,采用当前语种对应的语种算法模型,对有效文本进行意图识别。Input the valid text into the NLU system, make the NLU system recognize the language corresponding to the valid text, obtain the current language, and use the language algorithm model corresponding to the current language to perform intent recognition on the valid text.
  8. 一种主动式外呼智能语音机器人多语种交互装置,其特征在于,包括:A multilingual interaction device for an active outbound intelligent voice robot, which is characterized in that it includes:
    第一检测模块,用于在用户进入多语种设置场景时,检测用户发出的语音数据;The first detection module is used to detect the voice data sent by the user when the user enters the multilingual setting scene;
    发送模块,用于将语音数据发送至各个语言识别引擎,得到各个语言识别引擎返回的识别文本;The sending module is used to send voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine;
    第二检测模块,用于在各个识别文本均不为空文本时,检测各个识别文本是否携带预设的权重词,将携带权重词的文本确定为有效文本;The second detection module is used to detect whether each recognized text carries preset weight words when each recognized text is not empty text, and determine the text carrying the weight words as valid text;
    输入模块,用于将有效文本输入NLU系统,在NLU系统中对有效文本进行意图识别,根据意图识别结果触发交互动作。The input module is used for inputting valid text into the NLU system, in the NLU system for intent recognition of the valid text, and triggering interactive actions according to the intent recognition result.
  9. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计 算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至7中任一项所述方法的步骤。A computer device comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor implements any one of claims 1 to 7 when the computer program is executed The steps of the method.
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述方法的步骤。A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the steps of any one of claims 1 to 7 when the computer program is executed by a processor.
PCT/CN2021/071368 2020-04-21 2021-01-13 Multilingual interaction method and apparatus for active outbound intelligent speech robot WO2021212929A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010316400.9 2020-04-21
CN202010316400.9A CN111627432B (en) 2020-04-21 2020-04-21 Active outbound intelligent voice robot multilingual interaction method and device

Publications (1)

Publication Number Publication Date
WO2021212929A1 true WO2021212929A1 (en) 2021-10-28

Family

ID=72258977

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/071368 WO2021212929A1 (en) 2020-04-21 2021-01-13 Multilingual interaction method and apparatus for active outbound intelligent speech robot

Country Status (2)

Country Link
CN (1) CN111627432B (en)
WO (1) WO2021212929A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114918950A (en) * 2022-01-12 2022-08-19 国网吉林省电力有限公司延边供电公司 Intelligent robot for power supply of Xinji Jianfrontier
CN115134466A (en) * 2022-06-07 2022-09-30 马上消费金融股份有限公司 Intention recognition method and device and electronic equipment
CN116343786A (en) * 2023-03-07 2023-06-27 南方电网人工智能科技有限公司 Customer service voice analysis method, system, computer equipment and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627432B (en) * 2020-04-21 2023-10-20 升智信息科技(南京)有限公司 Active outbound intelligent voice robot multilingual interaction method and device
CN112562640B (en) * 2020-12-01 2024-04-12 北京声智科技有限公司 Multilingual speech recognition method, device, system, and computer-readable storage medium
CN113571064B (en) * 2021-07-07 2024-01-30 肇庆小鹏新能源投资有限公司 Natural language understanding method and device, vehicle and medium
CN114464179B (en) * 2022-01-28 2024-03-19 达闼机器人股份有限公司 Voice interaction method, system, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105957516A (en) * 2016-06-16 2016-09-21 百度在线网络技术(北京)有限公司 Switching method and device for multiple voice identification models
US20170011735A1 (en) * 2015-07-10 2017-01-12 Electronics And Telecommunications Research Institute Speech recognition system and method
US20180137109A1 (en) * 2016-11-11 2018-05-17 The Charles Stark Draper Laboratory, Inc. Methodology for automatic multilingual speech recognition
CN108335692A (en) * 2018-03-21 2018-07-27 上海木爷机器人技术有限公司 A kind of method for switching languages, server and system
CN109065020A (en) * 2018-07-28 2018-12-21 重庆柚瓣家科技有限公司 The identification storehouse matching method and system of multilingual classification
CN109522564A (en) * 2018-12-17 2019-03-26 北京百度网讯科技有限公司 Voice translation method and device
CN109712607A (en) * 2018-12-30 2019-05-03 联想(北京)有限公司 A kind of processing method, device and electronic equipment
CN111627432A (en) * 2020-04-21 2020-09-04 升智信息科技(南京)有限公司 Active call-out intelligent voice robot multi-language interaction method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5073024B2 (en) * 2010-08-10 2012-11-14 株式会社東芝 Spoken dialogue device
CN107818781B (en) * 2017-09-11 2021-08-10 远光软件股份有限公司 Intelligent interaction method, equipment and storage medium
CN109104534A (en) * 2018-10-22 2018-12-28 北京智合大方科技有限公司 A kind of system for improving outgoing call robot and being intended to Detection accuracy, recall rate
KR20210009596A (en) * 2019-07-17 2021-01-27 엘지전자 주식회사 Intelligent voice recognizing method, apparatus, and intelligent computing device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170011735A1 (en) * 2015-07-10 2017-01-12 Electronics And Telecommunications Research Institute Speech recognition system and method
CN105957516A (en) * 2016-06-16 2016-09-21 百度在线网络技术(北京)有限公司 Switching method and device for multiple voice identification models
US20180137109A1 (en) * 2016-11-11 2018-05-17 The Charles Stark Draper Laboratory, Inc. Methodology for automatic multilingual speech recognition
CN108335692A (en) * 2018-03-21 2018-07-27 上海木爷机器人技术有限公司 A kind of method for switching languages, server and system
CN109065020A (en) * 2018-07-28 2018-12-21 重庆柚瓣家科技有限公司 The identification storehouse matching method and system of multilingual classification
CN109522564A (en) * 2018-12-17 2019-03-26 北京百度网讯科技有限公司 Voice translation method and device
CN109712607A (en) * 2018-12-30 2019-05-03 联想(北京)有限公司 A kind of processing method, device and electronic equipment
CN111627432A (en) * 2020-04-21 2020-09-04 升智信息科技(南京)有限公司 Active call-out intelligent voice robot multi-language interaction method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114918950A (en) * 2022-01-12 2022-08-19 国网吉林省电力有限公司延边供电公司 Intelligent robot for power supply of Xinji Jianfrontier
CN115134466A (en) * 2022-06-07 2022-09-30 马上消费金融股份有限公司 Intention recognition method and device and electronic equipment
CN116343786A (en) * 2023-03-07 2023-06-27 南方电网人工智能科技有限公司 Customer service voice analysis method, system, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111627432B (en) 2023-10-20
CN111627432A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
WO2021212929A1 (en) Multilingual interaction method and apparatus for active outbound intelligent speech robot
KR102178738B1 (en) Automated assistant calls from appropriate agents
CN110998717B (en) Automatically determining a language for speech recognition of a spoken utterance received through an automated assistant interface
KR102348904B1 (en) Method for providing chatting service with chatbot assisted by human counselor
US20220335930A1 (en) Utilizing pre-event and post-event input streams to engage an automated assistant
CN112262430A (en) Automatically determining language for speech recognition of a spoken utterance received via an automated assistant interface
US11966764B2 (en) Adapting client application of feature phone based on experiment parameters
US20030061029A1 (en) Device for conducting expectation based mixed initiative natural language dialogs
KR20190011570A (en) Method for providing chatting service with chatbot assisted by human agents
US11677871B2 (en) Methods and apparatus for bypassing holds
KR102326853B1 (en) User adaptive conversation apparatus based on monitoring emotion and ethic and method for thereof
CN108763548A (en) Collect method, apparatus, equipment and the computer readable storage medium of training data
KR20220088926A (en) Use of Automated Assistant Function Modifications for On-Device Machine Learning Model Training
CN115083434B (en) Emotion recognition method and device, computer equipment and storage medium
WO2019088383A1 (en) Method and computer device for providing natural language conversation by providing interjection response in timely manner, and computer-readable recording medium
CN113643684A (en) Speech synthesis method, speech synthesis device, electronic equipment and storage medium
US11914923B1 (en) Computer system-based pausing and resuming of natural language conversations
US11438457B1 (en) Method and apparatus for coaching call center agents
US20230239401A1 (en) Captioned telephone service system having text-to-speech and answer assistance functions
CN117711389A (en) Voice interaction method, device, server and storage medium
CN114268694A (en) Service request response method, device, equipment, system and medium
CN114724587A (en) Voice response method and device
CN111324702A (en) Man-machine conversation method and headset for simulating human voice to carry out man-machine conversation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21793069

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21793069

Country of ref document: EP

Kind code of ref document: A1