CN111627432A - Active call-out intelligent voice robot multi-language interaction method and device - Google Patents

Active call-out intelligent voice robot multi-language interaction method and device Download PDF

Info

Publication number
CN111627432A
CN111627432A CN202010316400.9A CN202010316400A CN111627432A CN 111627432 A CN111627432 A CN 111627432A CN 202010316400 A CN202010316400 A CN 202010316400A CN 111627432 A CN111627432 A CN 111627432A
Authority
CN
China
Prior art keywords
text
recognition
language
robot
intelligent voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010316400.9A
Other languages
Chinese (zh)
Other versions
CN111627432B (en
Inventor
李训林
王帅
张晋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shengzhi Information Technology Nanjing Co ltd
Original Assignee
Shengzhi Information Technology Nanjing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shengzhi Information Technology Nanjing Co ltd filed Critical Shengzhi Information Technology Nanjing Co ltd
Priority to CN202010316400.9A priority Critical patent/CN111627432B/en
Publication of CN111627432A publication Critical patent/CN111627432A/en
Priority to PCT/CN2021/071368 priority patent/WO2021212929A1/en
Application granted granted Critical
Publication of CN111627432B publication Critical patent/CN111627432B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Abstract

The invention discloses a multilingual interaction method, a device, computer equipment and a storage medium for an active outbound intelligent voice robot.

Description

Active call-out intelligent voice robot multi-language interaction method and device
Technical Field
The invention relates to the technical field of voice signal processing, in particular to a multi-language interaction method and device for an active outbound intelligent voice robot, computer equipment and a storage medium.
Background
With the coming of the cloud era and the continuous innovation of the artificial intelligence technology, the intelligent robot based on the voice system enters various industries; the intelligent voice robot replaces a large amount of boring and repetitive customer service work at present, and the manual productivity is liberated; a large amount of traversals are provided for intelligent replies of various industries;
the active outbound intelligent voice robot guides a user to have a conversation in a Torontal conversation mode on the premise of presetting a conversation scene, so that the marketing purpose is achieved. Its main core functional modules Are Speech Recognition (ASR), speech synthesis (TTS), Dialog Management (DM), Natural Language Processing (NLP), Natural Language Understanding (NLU).
In overseas markets, most of intelligent voice robots are in a single language, and users are supported by the single language to account for 95%. However, in a real outbound scene, some users still have weak expression ability in a single language scene, such as southeast asia, where the main language is english, and qiao ju hua is more familiar with chinese by 5%. In the case of hearing the speech robot broadcast in english, such a user would ask the speech robot whether other language services, such as chinese? In such a scenario, the product value is reduced due to the language problem, resulting in poor user experience of the corresponding product.
Disclosure of Invention
In order to solve the problems, the invention provides an active outbound intelligent voice robot multi-language interaction method, an active outbound intelligent voice robot multi-language interaction device, a computer device and a storage medium.
In order to realize the aim of the invention, the invention provides an active intelligent voice calling robot multilingual interaction method, which comprises the following steps:
s10, when the user enters a multilingual setting scene, detecting voice data sent by the user;
s20, sending the voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine;
s30, when each recognition text is not empty text, detecting whether each recognition text carries preset weight words or not, and determining the text carrying the weight words as effective text;
and S40, inputting the effective text into the NLU system, performing intention recognition on the effective text in the NLU system, and triggering interactive action according to an intention recognition result.
In one embodiment, the language recognition engine includes an English language recognition engine and a Chinese language recognition engine.
As an embodiment, sending the speech data to each language recognition engine, and obtaining the recognition text returned by each language recognition engine includes:
sending the voice data to an English language recognition engine to obtain an English recognition text returned by the English language recognition engine;
and sending the voice data to a Chinese language recognition engine to obtain a Chinese recognition text returned by the Chinese language recognition engine.
In one embodiment, after detecting whether each recognition text carries a preset weight word, the method further includes:
if each recognition text carries a preset weight word or does not carry a preset weight word, calling the corresponding voice model for each recognition text respectively, detecting the text score of each recognition text by adopting each voice model, determining the comprehensive score of each recognition text according to the text score, the hesitation time coefficient and the adjustment coefficient of each recognition text, and determining the recognition text with the highest comprehensive score as the effective text.
In one embodiment, after sending the speech data to each language recognition engine and obtaining the recognized text returned by each language recognition engine, the method further includes:
and if all the identification texts are empty texts, recording the used language as a default language, and triggering the interaction action by adopting the default language.
In one embodiment, after sending the speech data to each language recognition engine and obtaining the recognized text returned by each language recognition engine, the method further includes:
and if one non-empty text exists in each recognition text, determining the non-empty text as the valid text.
In one embodiment, inputting valid text into the NLU system, where the intent recognition of the valid text comprises:
inputting the effective text into the NLU system, enabling the NLU system to identify the language corresponding to the effective text to obtain the current language, and adopting a language algorithm model corresponding to the current language to identify the intention of the effective text.
An active intelligent voice robot multi-language interaction device for calling out, comprising:
the system comprises a first detection module, a second detection module and a third detection module, wherein the first detection module is used for detecting voice data sent by a user when the user enters a multilingual setting scene;
the sending module is used for sending the voice data to each language recognition engine to obtain a recognition text returned by each language recognition engine;
the second detection module is used for detecting whether each recognition text carries preset weight words or not when each recognition text is not empty text, and determining the text carrying the weight words as an effective text;
and the input module is used for inputting the effective text into the NLU system, performing intention recognition on the effective text in the NLU system, and triggering interactive action according to an intention recognition result.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the proactive intelligent voice-out-calling robot multilingual interaction method of any of the above embodiments when executing the computer program.
A computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the active intelligent voice-over-outbound robot multilingual interaction method of any of the above embodiments.
The active intelligent voice robot multi-language interaction method, the device, the computer equipment and the storage medium can detect voice data sent by a user when the user enters a multi-language setting scene, send the voice data to each language recognition engine to obtain recognition texts returned by each language recognition engine, detect whether each recognition text carries preset weight words when each recognition text is not empty text, determine the text carrying the weight words as an effective text, input the effective text into an NLU (natural language understanding) system, perform intention recognition on the effective text in the NLU system, and trigger interaction according to intention recognition results to realize multi-language services of corresponding intelligent voice robots, improve the value of the intelligent voice robots and further improve corresponding user experience.
Drawings
FIG. 1 is a flowchart of an active outbound intelligent voice robot multilingual interaction method of an embodiment;
FIG. 2 is a schematic diagram of an embodiment of an intelligent voice robot operating process;
FIG. 3 is a language decision flow diagram of an embodiment;
FIG. 4 is a schematic diagram of an exemplary multi-lingual interaction device of the proactive intelligent voice outbound robot;
FIG. 5 is a schematic diagram of a computer device of an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The active intelligent voice calling robot multilingual interaction method can be applied to relevant intelligent voice robots. The intelligent voice robot can detect voice data sent by a user when the user enters a multi-language setting scene, send the voice data to each language recognition engine to obtain recognition texts returned by each language recognition engine, detect whether each recognition text carries preset weight words or not when each recognition text is not empty text, determine the text carrying the weight words as an effective text, input the effective text into an NLU (natural language understanding) system, perform intention recognition on the effective text in the NLU system, and trigger interactive action according to intention recognition results to realize multi-language services of the corresponding intelligent voice robot, improve the value of the intelligent voice robot and further improve corresponding user experience.
In one embodiment, as shown in fig. 1, there is provided an active intelligent voice calling robot multi-language interaction method, which is described by taking the method as an example for an intelligent voice robot, and includes the following steps:
s10, when the user enters the multilingual setting scenario, voice data uttered by the user is detected.
The intelligent voice robot can preset a language scene through a language identification setter, and set possible language types of the language, such as Chinese, English and the like, in the scene needing multi-language identification; when a user enters a scene where the intelligent voice robot is located, the intelligent voice robot can use a preset language recognition engine to recognize the language corresponding to the user.
And S20, sending the voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine.
In one embodiment, the language recognition engine includes an English language recognition engine and a Chinese language recognition engine. The english language recognition engine may be a default language recognition engine, and correspondingly, the english language may be a default language.
Specifically, sending the speech data to each language recognition engine, and obtaining a recognition text returned by each language recognition engine includes:
sending the voice data to an English language recognition engine to obtain an English recognition text returned by the English language recognition engine; furthermore, the English identification text can be recorded as TXT-EN;
and sending the voice data to a Chinese language recognition engine to obtain a Chinese recognition text returned by the Chinese language recognition engine, and further recording the Chinese recognition text as TXT-CN.
And S30, when each recognition text is not empty text, detecting whether each recognition text carries preset weight words or not, and determining the text carrying the weight words as effective text.
The weight words can be preset in the intelligent voice robot. Specifically, for calculation, a specific weight word needs to be set, and if the conversational scene preset by the intelligent voice robot includes a chinese scene and an english scene, the setting process of the weight word may include:
according to the analysis of an actual scene, language expression analysis which is possibly used by Chinese and English in the scene sets weight words, and the related logic of the weight words can be according to the speech habits and related psychological levels of a user in each context; for example, a normal person receiving a call will answer normally the first sentence if the first sentence is in a familiar language. If an open-field white robot (intelligent speech robot) asks "Hello, this is XX calling from XX, may I speak to XXX? If the person receiving the phone understands english, the person will answer the word smoothly. This is understood from two levels. First, the answer words semantically conform to the answer of the open field white; secondly, the speed of the answer is normal. Typically within 200ms to 500 ms. If the person who does not know English answers the phone, the person who does not know English firstly takes "stupid" and then answers "ask can say Chinese" or "you make a mistake". For different languages, there is a range of responses from the average person. The intelligent voice robot utilizes the 'range' to set the adjustment coefficients of all languages, and determines the hesitation time coefficient according to the time interval in the response process of the corresponding user.
The core function of the weight word is that the voice recognition result contains the preset weight word, which indicates that the user most possibly expresses the language; the weight word rule is used as a part of the core logic of language judgment; according to the actual scene analysis, under an unfamiliar language scene, a user is hesitant to set a time threshold T according to the scene analysis, the more information is, the more hesitant time is, the 500ms is, and therefore the time threshold T is set according to the scene analysis; more time usage after T indicates lower language familiarity.
And S40, inputting the effective text into the NLU system, performing intention recognition on the effective text in the NLU system, and triggering interactive action according to an intention recognition result.
The above steps can input the effective text and the language corresponding to the effective text into the NLU system.
In one embodiment, inputting valid text into the NLU system, where the intent recognition of the valid text comprises:
inputting the effective text into the NLU system, enabling the NLU system to identify the language corresponding to the effective text to obtain the current language, and adopting a language algorithm model corresponding to the current language to identify the intention of the effective text.
Specifically, the NLU system acquires valid text, and considers that different languages require different natural language processing models, so the NLU enters different processing rules and models by acquiring languages; the recognition result is used as intention response input, and intention recognition is carried out through different pre-trained language algorithm models; after the intention identification is completed, an action corresponding to the intention is triggered, such as broadcasting; and processing the action according to the language, calling a text-to-speech service (TTS) corresponding to the language to generate corresponding speech for broadcasting, and completing feedback communication with the user.
The active intelligent voice robot multi-language interaction method for calling out the outside can detect voice data sent by a user when the user enters a multi-language setting scene, send the voice data to each language recognition engine to obtain recognition texts returned by each language recognition engine, detect whether each recognition text carries preset weight words or not when each recognition text is not empty text, determine the text carrying the weight words as an effective text, input the effective text into an NLU (natural language understanding) system, perform intention recognition on the effective text in the NLU system, and trigger interaction actions according to intention recognition results to realize multi-language services of corresponding intelligent voice robots, improve the value of the intelligent voice robots, and further improve corresponding user experience.
In one embodiment, after detecting whether each recognition text carries a preset weight word, the method further includes:
if each recognition text carries a preset weight word or does not carry a preset weight word, calling the corresponding voice model for each recognition text respectively, detecting the text score of each recognition text by adopting each voice model, determining the comprehensive score of each recognition text according to the text score, the hesitation time coefficient and the adjustment coefficient of each recognition text, and determining the recognition text with the highest comprehensive score as the effective text.
In one embodiment, after sending the speech data to each language recognition engine and obtaining the recognized text returned by each language recognition engine, the method further includes:
and if all the identification texts are empty texts, recording the used language as a default language, and triggering the interaction action by adopting the default language.
In one embodiment, after sending the speech data to each language recognition engine and obtaining the recognized text returned by each language recognition engine, the method further includes:
and if one non-empty text exists in each recognition text, determining the non-empty text as the valid text.
Specifically, the language recognition engine includes an english language recognition engine and a chinese language recognition engine as examples, the intelligent voice robot sends the user voice (voice data) to the english language recognition engine and the chinese language recognition engine respectively for voice recognition, the result returned by the english engine is TXT-EN, and the result returned by the chinese engine is TXT-CN. The determination process of the corresponding recognition result (valid text) may include:
scene one: if TXT-EN or TXT-CN is empty text (no effective result is identified), the current probability of radio reception is noise, and the language used is recorded as the default language (such as English);
scene two: if one of the TXT-EN or the TXT-CN is a null text (no effective result is identified), the returned text is considered to be in the correct language, and the language type is recorded;
scene three: and if TXT-EN or TXT-CN returns non-empty text, performing weight calculation: whether the TXT-EN or the TXT-CN contains the weight words set in the step two or not is judged, and the scene matched with the weight words is considered, so that if the weight words appear, the recognition result is shown to be answered by the user with a high probability; therefore, the identification result is the optimal result only when one of TXT-EN or TXT-CN contains the weight word; if TXT-EN or TXT-CN contains the weight word or does not contain the weight word, respectively calling the Chinese language model and the English language model for TXT-EN and TXT-CN, and scoring the respective returned results to obtain sourceEN (TXT-EN) and sourceCN (TXT-CN); meanwhile, considering different dimensions of scores of the Chinese and English models, an adjustment coefficient s is found according to actual scene data statistics; it is believed that the score dimensionality approaches sourceCN after sourceEN (TXT-EN) × s; meanwhile, considering the hesitation time coefficient delta T (DelayTime) -T (preset time threshold) in different language processing, and obtaining the most appropriate score processing by matching with the sensitivity coefficient; and (3) obtaining an empirical value a (sensitivity coefficient) and an adjustment coefficient s (adjustment coefficient) through a large amount of data verification and verification, and finally obtaining a score formula (calculation work of comprehensive score):
the English comprehensive score is as follows: sourceEN (TXT-EN) s-a△t
The Chinese comprehensive score is as follows: sourceCN (TXT-CN)
And comparing the English score (English comprehensive score) with the Chinese score (Chinese comprehensive score), and selecting the result with high score as the user identification result and the language.
In an embodiment, the active intelligent voice calling robot multilingual interaction method completes the judgment of the interactive language through a mode of recognition judgment, and finally solves the problem that the existing intelligent voice robot has defects in a multilingual scene.
Referring to fig. 2, in the use of the intelligent voice robot, voice categories to be supported are configured for different scenes: such as supporting english and chinese. When the intelligent robot executes the scene, the configuration is acquired. Processing different languages in corresponding scenes according to the configuration; generally, the configuration is a judgment condition that the language processing logic executes.
A speech recognition layer, which sets different language recognition engines according to the configuration of the scene, evaluates and scores multilingual speech recognition results through a specific language model, finds out the most appropriate result according to the evaluation result, marks the language used by the user corresponding to the most appropriate result, and provides the best result and the language of the result to the NLU as the basis for NLU judgment; (ii) a
The intelligent voice robot uses different semantic matching algorithms according to the language judged in the voice recognition process, improves the semantic recognition degree, and adjusts the output language (TTS) of the intelligent voice robot according to the user language;
at a psychological level, when a normal person receives a call, the first sentence, if familiar with language, will be answered normally. If the open-field white robot asks "Hello, this is XX filling from XX, may I spot to XXX? "if the person receiving the phone understands english, the word is answered smoothly. This is understood from two levels. First, the answer words semantically conform to the answer of the open field white; secondly, the speed of the answer is normal. Typically within 200ms to 500 ms. If the person who does not know English answers the phone, the person who does not know English firstly takes "stupid" and then answers "ask can say Chinese" or "you make a mistake". For different languages, there is a range of responses from the average person. We make use of this "Range"
The construction of Chinese dialect: under the condition that the dialect management system presets the dialect to be English, in order to cover more language scenes, aiming at the preset scene of the English dialect, a set of corresponding Chinese dialect scenes is added;
building a Chinese intention: under the original active outbound scene, a new field, namely a client intention scene branch, is added: the customer wants to speak the branch of the intent of Chinese. Possible expressions of corresponding scenes of the client need to be configured under the intention branch, such as: can one say Chinese? can you spread Chinese?
Definition of node hotwords in a specific scene of conversation management: this link is mainly to analyze the word technique that the AI may be asked reversely in a specific scene if the english expression ability of the user is weak, and summarize and conclude that the corresponding high-frequency hotword in the switching intention word technique is: chinese, English.
A new Chinese engine layer is added for a specific scene under the scene of the original English engine: for the opening of the english language, in the case of broadcasting english, a client who cannot say english generally asks the speech system backwards, is you able to say chinese? Aiming at the specific scene, a Chinese ASR engine is additionally added to supplement the 5% scene;
the advantages are that: the method covers multi-language user groups, and avoids the identification negative effects caused by using a support bilingual engine by using a smart method, such as the reaction rate of the whole system, the user identification accuracy rate aiming at 95% of scenes and the like.
In an example, in the application process of the active intelligent voice call robot multilingual interaction method, after obtaining the used voice data, the relevant decision process can be as shown in fig. 3, and first, a fast decision is made, a fast decision logic is made, and a simple implementation can call two or more ASR engines respectively for recognition. Which engine returns the results is determined to be the language. Without the results to make acoustic model decisions, acoustic models are mainly used to solve the problem of similar pronunciations in different languages, and may have correct results returned for different ASRs. Say that the Chinese spoken words "that (nei)" are very similar to the nigger pronunciation in English. The english ASR might typically recognize it as nigger. The sound is decomposed into IPA and then matching is done to IPA.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an active intelligent voice call robot multi-language interaction device according to an embodiment, including:
the first detection module 10 is used for detecting voice data sent by a user when the user enters a multilingual setting scene;
the sending module 20 is configured to send the voice data to each language recognition engine to obtain a recognition text returned by each language recognition engine;
the second detection module 30 is configured to detect whether each recognition text carries a preset weight word when each recognition text is not an empty text, and determine the text carrying the weight word as an effective text;
and the input module 40 is used for inputting the effective text into the NLU system, performing intention recognition on the effective text in the NLU system, and triggering interactive action according to an intention recognition result.
For the specific limitations of the active intelligent voice robot multilingual interaction apparatus, reference may be made to the above limitations of the active intelligent voice robot multilingual interaction method, which will not be described herein again. All modules in the active call-out intelligent voice robot multi-language interaction device can be completely or partially realized through software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize the active intelligent voice calling robot multi-language interaction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Based on the above examples, there is also provided in one embodiment a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement any one of the above embodiments of the active intelligent voice-over-outbound robot multi-lingual interaction method.
It will be understood by those skilled in the art that all or part of the processes of the method according to the above embodiments may be implemented by a computer program, which may be stored in a non-volatile computer-readable storage medium, and in the embodiments of the present invention, the program may be stored in the storage medium of a computer system and executed by at least one processor in the computer system, so as to implement the processes according to the embodiments including the above active intelligent voice robot multilingual interaction method. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
Accordingly, in one embodiment, a computer storage medium and a computer readable storage medium are provided, on which a computer program is stored, wherein the program, when executed by a processor, implements any one of the above-mentioned active intelligent voice-over-outbound robot multi-lingual interaction methods.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application merely distinguish similar objects, and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may exchange a specific order or sequence when allowed. It should be understood that "first \ second \ third" distinct objects may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be implemented in an order other than those illustrated or described herein.
The terms "comprising" and "having" and any variations thereof in the embodiments of the present application are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or device that comprises a list of steps or modules is not limited to the listed steps or modules but may alternatively include other steps or modules not listed or inherent to such process, method, product, or device.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An active intelligent voice robot multi-language interaction method for calling out is characterized by comprising the following steps:
s10, when the user enters a multilingual setting scene, detecting voice data sent by the user;
s20, sending the voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine;
s30, when each recognition text is not empty text, detecting whether each recognition text carries preset weight words or not, and determining the text carrying the weight words as effective text;
and S40, inputting the effective text into the NLU system, performing intention recognition on the effective text in the NLU system, and triggering interactive action according to an intention recognition result.
2. The active intelligent voice-robot multi-lingual interaction method of claim 1, wherein the language recognition engine comprises an English language recognition engine and a Chinese language recognition engine.
3. The active intelligent voice robot multilingual interaction method of claim 2, wherein sending the voice data to each of the speech recognition engines to obtain the recognition text returned by each of the speech recognition engines comprises:
sending the voice data to an English language recognition engine to obtain an English recognition text returned by the English language recognition engine;
and sending the voice data to a Chinese language recognition engine to obtain a Chinese recognition text returned by the Chinese language recognition engine.
4. The active intelligent voice-calling robot multilingual interaction method according to claim 1, further comprising, after detecting whether each recognition text carries a preset weight word:
if each recognition text carries a preset weight word or does not carry a preset weight word, calling the corresponding voice model for each recognition text respectively, detecting the text score of each recognition text by adopting each voice model, determining the comprehensive score of each recognition text according to the text score, the hesitation time coefficient and the adjustment coefficient of each recognition text, and determining the recognition text with the highest comprehensive score as the effective text.
5. The active intelligent voice robot multilingual interaction method of claim 1, wherein after sending the voice data to each of the speech recognition engines and obtaining the recognition text returned by each of the speech recognition engines, the method further comprises:
and if all the identification texts are empty texts, recording the used language as a default language, and triggering the interaction action by adopting the default language.
6. The active intelligent voice robot multilingual interaction method of claim 1, wherein after sending the voice data to each of the speech recognition engines and obtaining the recognition text returned by each of the speech recognition engines, the method further comprises:
and if one non-empty text exists in each recognition text, determining the non-empty text as the valid text.
7. The active intelligent voice-robot multilingual interaction method of outbound calls according to claim 1, wherein inputting the valid text into the NLU system, wherein the intent recognition of the valid text in the NLU system comprises:
inputting the effective text into the NLU system, enabling the NLU system to identify the language corresponding to the effective text to obtain the current language, and adopting a language algorithm model corresponding to the current language to identify the intention of the effective text.
8. An active intelligent voice robot multi-language interaction device of calling out, characterized by comprising:
the system comprises a first detection module, a second detection module and a third detection module, wherein the first detection module is used for detecting voice data sent by a user when the user enters a multilingual setting scene;
the sending module is used for sending the voice data to each language recognition engine to obtain a recognition text returned by each language recognition engine;
the second detection module is used for detecting whether each recognition text carries preset weight words or not when each recognition text is not empty text, and determining the text carrying the weight words as an effective text;
and the input module is used for inputting the effective text into the NLU system, performing intention recognition on the effective text in the NLU system, and triggering interactive action according to an intention recognition result.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202010316400.9A 2020-04-21 2020-04-21 Active outbound intelligent voice robot multilingual interaction method and device Active CN111627432B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010316400.9A CN111627432B (en) 2020-04-21 2020-04-21 Active outbound intelligent voice robot multilingual interaction method and device
PCT/CN2021/071368 WO2021212929A1 (en) 2020-04-21 2021-01-13 Multilingual interaction method and apparatus for active outbound intelligent speech robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010316400.9A CN111627432B (en) 2020-04-21 2020-04-21 Active outbound intelligent voice robot multilingual interaction method and device

Publications (2)

Publication Number Publication Date
CN111627432A true CN111627432A (en) 2020-09-04
CN111627432B CN111627432B (en) 2023-10-20

Family

ID=72258977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010316400.9A Active CN111627432B (en) 2020-04-21 2020-04-21 Active outbound intelligent voice robot multilingual interaction method and device

Country Status (2)

Country Link
CN (1) CN111627432B (en)
WO (1) WO2021212929A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112562640A (en) * 2020-12-01 2021-03-26 北京声智科技有限公司 Multi-language speech recognition method, device, system and computer readable storage medium
WO2021212929A1 (en) * 2020-04-21 2021-10-28 升智信息科技(南京)有限公司 Multilingual interaction method and apparatus for active outbound intelligent speech robot
CN113571064A (en) * 2021-07-07 2021-10-29 肇庆小鹏新能源投资有限公司 Natural language understanding method and device, vehicle and medium
CN114464179A (en) * 2022-01-28 2022-05-10 达闼机器人股份有限公司 Voice interaction method, system, device, equipment and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114918950A (en) * 2022-01-12 2022-08-19 国网吉林省电力有限公司延边供电公司 Intelligent robot for power supply of Xinji Jianfrontier
CN115134466A (en) * 2022-06-07 2022-09-30 马上消费金融股份有限公司 Intention recognition method and device and electronic equipment
CN116343786A (en) * 2023-03-07 2023-06-27 南方电网人工智能科技有限公司 Customer service voice analysis method, system, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012037790A (en) * 2010-08-10 2012-02-23 Toshiba Corp Voice interaction device
CN107818781A (en) * 2017-09-11 2018-03-20 远光软件股份有限公司 Intelligent interactive method, equipment and storage medium
CN109104534A (en) * 2018-10-22 2018-12-28 北京智合大方科技有限公司 A kind of system for improving outgoing call robot and being intended to Detection accuracy, recall rate
CN109522564A (en) * 2018-12-17 2019-03-26 北京百度网讯科技有限公司 Voice translation method and device
CN109712607A (en) * 2018-12-30 2019-05-03 联想(北京)有限公司 A kind of processing method, device and electronic equipment
US20200118544A1 (en) * 2019-07-17 2020-04-16 Lg Electronics Inc. Intelligent voice recognizing method, apparatus, and intelligent computing device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170011735A1 (en) * 2015-07-10 2017-01-12 Electronics And Telecommunications Research Institute Speech recognition system and method
CN105957516B (en) * 2016-06-16 2019-03-08 百度在线网络技术(北京)有限公司 More voice identification model switching method and device
US20180137109A1 (en) * 2016-11-11 2018-05-17 The Charles Stark Draper Laboratory, Inc. Methodology for automatic multilingual speech recognition
CN108335692B (en) * 2018-03-21 2021-03-05 上海智蕙林医疗科技有限公司 Voice switching method, server and system
CN109065020B (en) * 2018-07-28 2020-11-20 重庆柚瓣家科技有限公司 Multi-language category recognition library matching method and system
CN111627432B (en) * 2020-04-21 2023-10-20 升智信息科技(南京)有限公司 Active outbound intelligent voice robot multilingual interaction method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012037790A (en) * 2010-08-10 2012-02-23 Toshiba Corp Voice interaction device
CN107818781A (en) * 2017-09-11 2018-03-20 远光软件股份有限公司 Intelligent interactive method, equipment and storage medium
CN109104534A (en) * 2018-10-22 2018-12-28 北京智合大方科技有限公司 A kind of system for improving outgoing call robot and being intended to Detection accuracy, recall rate
CN109522564A (en) * 2018-12-17 2019-03-26 北京百度网讯科技有限公司 Voice translation method and device
CN109712607A (en) * 2018-12-30 2019-05-03 联想(北京)有限公司 A kind of processing method, device and electronic equipment
US20200118544A1 (en) * 2019-07-17 2020-04-16 Lg Electronics Inc. Intelligent voice recognizing method, apparatus, and intelligent computing device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021212929A1 (en) * 2020-04-21 2021-10-28 升智信息科技(南京)有限公司 Multilingual interaction method and apparatus for active outbound intelligent speech robot
CN112562640A (en) * 2020-12-01 2021-03-26 北京声智科技有限公司 Multi-language speech recognition method, device, system and computer readable storage medium
CN112562640B (en) * 2020-12-01 2024-04-12 北京声智科技有限公司 Multilingual speech recognition method, device, system, and computer-readable storage medium
CN113571064A (en) * 2021-07-07 2021-10-29 肇庆小鹏新能源投资有限公司 Natural language understanding method and device, vehicle and medium
CN113571064B (en) * 2021-07-07 2024-01-30 肇庆小鹏新能源投资有限公司 Natural language understanding method and device, vehicle and medium
CN114464179A (en) * 2022-01-28 2022-05-10 达闼机器人股份有限公司 Voice interaction method, system, device, equipment and storage medium
CN114464179B (en) * 2022-01-28 2024-03-19 达闼机器人股份有限公司 Voice interaction method, system, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111627432B (en) 2023-10-20
WO2021212929A1 (en) 2021-10-28

Similar Documents

Publication Publication Date Title
CN111627432B (en) Active outbound intelligent voice robot multilingual interaction method and device
CN110998717B (en) Automatically determining a language for speech recognition of a spoken utterance received through an automated assistant interface
CN110661927B (en) Voice interaction method and device, computer equipment and storage medium
KR102348904B1 (en) Method for providing chatting service with chatbot assisted by human counselor
US7529667B1 (en) Automated dialog system and method
US7127395B1 (en) Method and system for predicting understanding errors in a task classification system
US8862478B2 (en) Speech translation system, first terminal apparatus, speech recognition server, translation server, and speech synthesis server
US8144838B2 (en) Automated task classification system
CN112037799B (en) Voice interrupt processing method and device, computer equipment and storage medium
CN112262430A (en) Automatically determining language for speech recognition of a spoken utterance received via an automated assistant interface
KR20150085145A (en) System for translating a language based on user's reaction and method thereof
CN111429899A (en) Speech response processing method, device, equipment and medium based on artificial intelligence
KR102326853B1 (en) User adaptive conversation apparatus based on monitoring emotion and ethic and method for thereof
CN111159364A (en) Dialogue system, dialogue device, dialogue method, and storage medium
CN111986651A (en) Man-machine interaction method and device and intelligent interaction terminal
CN113643684A (en) Speech synthesis method, speech synthesis device, electronic equipment and storage medium
CN112087726A (en) Method and system for identifying polyphonic ringtone, electronic equipment and storage medium
JP4000828B2 (en) Information system, electronic equipment, program
CN113345437B (en) Voice interruption method and device
JP2005520194A (en) Generating text messages
CN111667829A (en) Information processing method and device, and storage medium
CN115662430B (en) Input data analysis method, device, electronic equipment and storage medium
KR102268376B1 (en) Apparatus and method for providing multilingual conversation service
CN116306660A (en) Man-machine conversation breaking method, device, equipment and storage medium
CN116895275A (en) Dialogue system and control method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant