CN111627432B - Active outbound intelligent voice robot multilingual interaction method and device - Google Patents

Active outbound intelligent voice robot multilingual interaction method and device Download PDF

Info

Publication number
CN111627432B
CN111627432B CN202010316400.9A CN202010316400A CN111627432B CN 111627432 B CN111627432 B CN 111627432B CN 202010316400 A CN202010316400 A CN 202010316400A CN 111627432 B CN111627432 B CN 111627432B
Authority
CN
China
Prior art keywords
text
recognition
language
texts
engine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010316400.9A
Other languages
Chinese (zh)
Other versions
CN111627432A (en
Inventor
李训林
王帅
张晋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shengzhi Information Technology Nanjing Co ltd
Original Assignee
Shengzhi Information Technology Nanjing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shengzhi Information Technology Nanjing Co ltd filed Critical Shengzhi Information Technology Nanjing Co ltd
Priority to CN202010316400.9A priority Critical patent/CN111627432B/en
Publication of CN111627432A publication Critical patent/CN111627432A/en
Priority to PCT/CN2021/071368 priority patent/WO2021212929A1/en
Application granted granted Critical
Publication of CN111627432B publication Critical patent/CN111627432B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses an active outbound intelligent voice robot multilingual interaction method, an active outbound intelligent voice robot multilingual interaction device, computer equipment and a storage medium.

Description

Active outbound intelligent voice robot multilingual interaction method and device
Technical Field
The application relates to the technical field of voice signal processing, in particular to an active outbound intelligent voice robot multilingual interaction method, an active outbound intelligent voice robot multilingual interaction device, computer equipment and a storage medium.
Background
With the advent of the cloud era and the continuous innovation of artificial intelligence technology, intelligent robots based on voice systems are entering into various industries; the current intelligent voice robot replaces a lot of tedious and repeated customer service work, and liberates manual productivity; a large number of traversals are provided for intelligent replies of various industries;
the active outbound intelligent voice robot guides a user to conduct a conversation in a Torons conversation mode on the premise of presetting a conversation scene, so that the marketing purpose is achieved. Its main core functional modules Are Speech Recognition (ASR), speech synthesis (TTS), dialog Management (DM), natural Language Processing (NLP), natural Language Understanding (NLU).
In overseas markets, the intelligent voice robots are mostly single languages, and the single language supports 95% of users. However, in the actual outbound scene, some users still have weaker expression capability in a single language scene, such as southeast Asia, the main language is English, and the fuqiao is more familiar with Chinese with the Chinese ratio of 5%. In case that the voice robot announcements are heard in english, such a user would ask the voice robot if other language services, such as chinese, can be provided. In such a scenario, the product value is reduced due to language problems, resulting in poor user experience of the corresponding product.
Disclosure of Invention
Aiming at the problems, the application provides an active outbound intelligent voice robot multilingual interaction method, an active outbound intelligent voice robot multilingual interaction device, computer equipment and a storage medium.
In order to achieve the purpose of the application, the multi-language interaction method of the active outbound intelligent voice robot comprises the following steps:
s10, detecting voice data sent by a user when the user enters a multilingual setting scene;
s20, sending the voice data to each language recognition engine to obtain recognition texts returned by each language recognition engine;
s30, when all the identification texts are not blank texts, detecting whether all the identification texts carry preset weight words, and determining the texts carrying the weight words as effective texts;
s40, inputting the effective text into an NLU system, carrying out intention recognition on the effective text in the NLU system, and triggering interaction according to the intention recognition result.
In one embodiment, the language recognition engine includes an English language recognition engine and a Chinese language recognition engine.
As one embodiment, sending the voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine includes:
the voice data is sent to an English language recognition engine to obtain English recognition text returned by the English language recognition engine;
and sending the voice data to a Chinese language recognition engine to obtain a Chinese recognition text returned by the Chinese language recognition engine.
In one embodiment, after detecting whether each recognition text carries a preset weight word, the method further comprises:
if all the recognition texts carry preset weight words or do not carry preset weight words, respectively calling corresponding voice models from all the recognition texts, detecting the text scores of all the recognition texts by adopting all the voice models, determining the comprehensive scores of all the recognition texts according to the text scores, the hesitation time coefficients and the adjustment coefficients of all the recognition texts, and determining the recognition text with the highest comprehensive score as the effective text.
In one embodiment, after sending the voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine, the method further comprises:
if each identification text is an empty text, recording the using language as a default language, and triggering the interaction action by adopting the default language.
In one embodiment, after sending the voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine, the method further comprises:
if one non-empty text exists in each identification text, the non-empty text is determined to be a valid text.
In one embodiment, inputting valid text into an NLU system, and in the NLU system, intent recognition of the valid text includes:
and inputting the effective text into an NLU system, enabling the NLU system to identify the language corresponding to the effective text, obtaining the current language, and carrying out intention identification on the effective text by adopting a language algorithm model corresponding to the current language.
An active outbound intelligent voice robot multilingual interaction device, comprising:
the first detection module is used for detecting voice data sent by a user when the user enters a multilingual setting scene;
the sending module is used for sending the voice data to each language identification engine to obtain identification texts returned by each language identification engine;
the second detection module is used for detecting whether each identification text carries a preset weight word or not when each identification text is not a blank text, and determining the text carrying the weight word as a valid text;
and the input module is used for inputting the effective text into the NLU system, carrying out intention recognition on the effective text in the NLU system, and triggering interaction action according to the intention recognition result.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the active outbound intelligent voice robot multilingual interaction method of any of the embodiments described above when the computer program is executed.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the active outbound intelligent voice robot multilingual interaction method of any of the above embodiments.
According to the active outbound intelligent voice robot multilingual interaction method, device, computer equipment and storage medium, voice data sent by a user can be detected when the user enters a multilingual setting scene, the voice data are sent to each language recognition engine to obtain recognition texts returned by each language recognition engine, when none of the recognition texts is an empty text, whether each recognition text carries a preset weight word or not is detected, the text carrying the weight word is determined to be an effective text, the effective text is input into an NLU (natural language understanding) system, intention recognition is carried out on the effective text in the NLU system, interaction is triggered according to an intention recognition result, multilingual service of the corresponding intelligent voice robot is achieved, the value of the intelligent voice robot is improved, and accordingly user experience is improved.
Drawings
FIG. 1 is a flow chart of a method of active outbound intelligent voice robot multilingual interaction of one embodiment;
FIG. 2 is a schematic diagram of the intelligent voice robot operation of one embodiment;
FIG. 3 is a language decision flow diagram of one embodiment;
FIG. 4 is a schematic diagram of a multi-lingual interaction device of an active outbound intelligent voice robot according to one embodiment;
FIG. 5 is a schematic diagram of a computer device of an embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The active outbound intelligent voice robot multilingual interaction method provided by the application can be applied to related intelligent voice robots. According to the intelligent voice robot, when a user enters a multilingual setting scene, voice data sent by the user can be detected, the voice data are sent to each language recognition engine to obtain recognition texts returned by each language recognition engine, when none of the recognition texts is an empty text, whether each recognition text carries a preset weight word or not is detected, the text carrying the weight word is determined to be an effective text, the effective text is input into an NLU (natural language understanding) system, intention recognition is carried out on the effective text in the NLU system, interaction is triggered according to the intention recognition result, multilingual service of the corresponding intelligent voice robot is achieved, the value of the intelligent voice robot is improved, and accordingly user experience is improved.
In one embodiment, as shown in fig. 1, a multilingual interaction method of an active outbound intelligent voice robot is provided, and the method is applied to the intelligent voice robot for illustration, and includes the following steps:
s10, detecting voice data sent by a user when the user enters a multilingual setting scene.
The intelligent voice robot can preset a speaking scene through a language identification setter, and can set possible language types of speaking in the scene needing multilingual identification, such as Chinese, english and the like; when a user enters a scene where the intelligent voice robot is located, the intelligent voice robot can use a preset language identification engine to identify languages corresponding to the user.
And S20, sending the voice data to each language recognition engine to obtain recognition texts returned by each language recognition engine.
In one embodiment, the language recognition engine includes an English language recognition engine and a Chinese language recognition engine. The english language recognition engine may be a default language recognition engine, and correspondingly, the english language may be a default language.
Specifically, sending the voice data to each language recognition engine, and obtaining the recognition text returned by each language recognition engine includes:
the voice data is sent to an English language recognition engine to obtain English recognition text returned by the English language recognition engine; further, the English recognition text may be noted as TXT-EN;
and sending the voice data to a Chinese language recognition engine to obtain a Chinese recognition text returned by the Chinese language recognition engine, and further, marking the Chinese recognition text as TXT-CN.
And S30, when all the identification texts are not blank texts, detecting whether all the identification texts carry preset weight words, and determining the texts carrying the weight words as effective texts.
The weight word can be preset in the intelligent voice robot. Specifically, for calculation, a specific weight word needs to be set, and if a preset speaking scene of the intelligent voice robot includes a chinese scene and an english scene, the setting process of the weight word may include:
according to the actual scene analysis, the language expression analysis possibly used by Chinese and English in the scene is provided with weight words, and the weight words relate to the voice habit and the related psychological level of the logic in each context of the user; for example, if a normal person is receiving a call, the first sentence, if familiar, will answer normally. If the open white robot (intelligent voice robot) asks "Hello, this is XX calling from XX, may I speak to XXX? If the person receiving the call understands English, the person can answer the sentence smoothly. From two planes. Firstly, the answer semantically accords with the answer of the open scene; secondly, the speed of the answer is normal. Typically within 200ms to 500 ms. If people who do not understand English call, the people first take a word of "the wizard" and then answer with Chinese "please ask for what can be said Chinese" or "you have made an error". For the open-time of different languages, the response answer of a general person is of a range. The intelligent voice robot uses the 'range' to set the adjustment coefficient of each language, and determines the hesitation time coefficient according to the time interval in the response process of the corresponding user.
The core function of the weight word is that the voice recognition result contains the preset weight word, which indicates that the user is most likely to express the language; the weight word rule is used as a part of core logic of language judgment; according to actual scene analysis, under an unfamiliar language scene, a user hesitates 300-500ms, and the more information, the longer the hesitation time is, so that a time threshold T is set according to scene analysis; the more time usage beyond T indicates less language familiarity.
S40, inputting the effective text into an NLU system, carrying out intention recognition on the effective text in the NLU system, and triggering interaction according to the intention recognition result.
The steps can input the effective text and the language corresponding to the effective text into the NLU system.
In one embodiment, inputting valid text into an NLU system, and in the NLU system, intent recognition of the valid text includes:
and inputting the effective text into an NLU system, enabling the NLU system to identify the language corresponding to the effective text, obtaining the current language, and carrying out intention identification on the effective text by adopting a language algorithm model corresponding to the current language.
Specifically, the NLU system acquires a valid text, and considers that different natural language processing models are required to be used for different languages, so that the NLU enters different processing rules and models by acquiring languages; taking the recognition result as intention response input, and carrying out intention recognition through pre-trained algorithm models of different languages; after the intention recognition is completed, triggering actions corresponding to the intention, such as broadcasting; and processing the action according to the language, calling a text-to-speech service (TTS) corresponding to the language to generate corresponding voice for broadcasting, and completing feedback communication with the user.
According to the active outbound intelligent voice robot multilingual interaction method, voice data sent by a user can be detected when the user enters a multilingual setting scene, the voice data are sent to each language recognition engine to obtain recognition texts returned by each language recognition engine, when none of the recognition texts is empty text, whether each recognition text carries a preset weight word or not is detected, the text carrying the weight word is determined to be an effective text, the effective text is input into an NLU (natural language understanding) system, intention recognition is carried out on the effective text in the NLU system, interaction is triggered according to the intention recognition result, multilingual service of the corresponding intelligent voice robot is achieved, the value of the intelligent voice robot is improved, and accordingly user experience is improved.
In one embodiment, after detecting whether each recognition text carries a preset weight word, the method further comprises:
if all the recognition texts carry preset weight words or do not carry preset weight words, respectively calling corresponding voice models from all the recognition texts, detecting the text scores of all the recognition texts by adopting all the voice models, determining the comprehensive scores of all the recognition texts according to the text scores, the hesitation time coefficients and the adjustment coefficients of all the recognition texts, and determining the recognition text with the highest comprehensive score as the effective text.
In one embodiment, after sending the voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine, the method further comprises:
if each identification text is an empty text, recording the using language as a default language, and triggering the interaction action by adopting the default language.
In one embodiment, after sending the voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine, the method further comprises:
if one non-empty text exists in each identification text, the non-empty text is determined to be a valid text.
Specifically, the language recognition engine includes an english language recognition engine and a chinese language recognition engine as examples, the intelligent voice robot sends user voices (voice data) to the english language recognition engine and the chinese language recognition engine for voice recognition, the result returned by the english engine is TXT-EN, and the result returned by the chinese engine is TXT-CN. The determination of the corresponding recognition result (valid text) may include:
scene one: TXT-EN or TXT-CN is empty text (effective results are not recognized), the current radio reception probability is considered to be noise, and the used language is recorded as a default language (such as English);
scene II: one of TXT-EN or TXT-CN is empty text (valid result is not recognized), the text is considered to be returned to be the correct language, and the language type is recorded;
scene III: and (4) returning a non-empty text by the TXT-EN or the TXT-CN, and calculating weights: whether the TXT-EN or the TXT-CN contains the weight words set in the step two or not, and considering the weight word fit scene, if the weight words appear, the recognition result is shown to be the user answer with high probability; so that the recognition result is the optimal result only when one of the TXT-EN or TXT-CN contains a weight word; if TXT-EN or TXT-CN contains weight words or does not contain weight words, respectively calling a Chinese language model and an English language model by the TXT-EN and the TXT-CN, and scoring the returned results to obtain sourcen (TXT-EN) and sourcCN (TXT-CN); meanwhile, according to the different dimensionalities of the Chinese and English model scores, the adjustment coefficient s is found according to actual scene data statistics; it is believed that the score dimension approaches sourceCN after sourceEN (TXT-EN) x; meanwhile, considering hesitation time coefficients delta t=delay time (user delay time) -T (preset time threshold) in different language processing, and obtaining the most suitable score processing when the hesitation time coefficients delta t=delay time) -T (preset time threshold) are matched with the sensitivity coefficients; the empirical values a (sensitivity coefficient) and s (adjustment coefficient) are obtained through a large number of data verification verifications, and a scoring formula (calculation work of comprehensive scores) is finally obtained:
the English comprehensive score is as follows: sourcen (TXT-EN) s-a △t
The Chinese comprehensive score is as follows: sourceCN (TXT-CN)
Comparing English score (English comprehensive score) with Chinese score (Chinese comprehensive score), and selecting high-scoring result as user identification result and language.
In one embodiment, the method for passively switching dialogue languages in a specific scene of the intelligent voice robot can comprise a voice synthesis module, a natural voice processing module, a natural language understanding module, a dialogue management module and a voice recognition module.
Referring to fig. 2, in the use of the intelligent voice robot, voice types to be supported are configured for different scenes: such as support for english and chinese. When the intelligent robot executes the scene, the configuration is acquired. Processing different languages in the corresponding scenes according to the configuration; generally, the configuration is a judgment condition for the language processing logic to execute.
The method comprises the steps of setting different language recognition engines according to the configuration of a scene, evaluating and scoring multilingual voice recognition results through a specific language model, finding out the most suitable result according to the evaluation result, marking the language used by a user corresponding to the most suitable result, and providing the best result and the language of the result to an NLU as the basis for NLU judgment; the method comprises the steps of carrying out a first treatment on the surface of the
The natural language understanding layer is compatible with different languages, according to the languages judged in the voice recognition flow, the intelligent voice robot can use different semantic matching algorithms to improve the semantic recognition degree, and the output language (TTS) of the intelligent voice robot is adjusted according to the user language;
at the psychological level, a normal person is receiving a call, and the first sentence, if familiar, will answer normally. If the open field robot asks "Hello, this is XX calling from XX, may I speak to XXX? If the person receiving the call understands English, the person can answer the sentence smoothly. From two planes. Firstly, the answer semantically accords with the answer of the open scene; secondly, the speed of the answer is normal. Typically within 200ms to 500 ms. If people who do not understand English call, the people first take a word of "the wizard" and then answer with Chinese "please ask for what can be said Chinese" or "you have made an error". For the open-time of different languages, the response answer of a general person is of a range. We use this "range"
Building Chinese speech technology: under the condition that the dialogue management system presets the dialect into English, a set of corresponding Chinese language scenes is newly added on the premise of aiming at the English language preset scene in order to cover more language scenes;
chinese intention building: under the original active outbound scene, a new field, namely a client intention scene branch is: the customer wants to speak the intent branch of chinese. Possible descriptions of the scene corresponding to the client to be configured under the intention branch are as follows: can say chinese? can you speak chinese?
Definition of node hotwords in a dialog management specific scenario: the link is mainly to analyze the fact that under a specific scene, if the English expression capability of a user is weak, the user can possibly ask the dialogue of the AI reversely, and summarize and generalize the corresponding high-frequency hotwords in the switching intention dialogue as: chinese, chinese, english.
A Chinese engine is newly added for a specific scene under the scene of the original English engine: for the opening of english, under the condition of playing english, a client who does not speak english will generally ask back to the speech system, do you speak chinese? For the specific scene, a layer of Chinese ASR engine is added to supplement 5% of the scene;
the method has the advantages that: the multi-language user group is covered, and recognition negative effects caused by supporting a bilingual engine, such as the reaction rate of the whole system, the recognition accuracy rate for the user in a 95% scene and the like, are avoided by using a smart method.
In an example, in the application process of the active outbound intelligent voice robot multilingual interaction method, after the voice data is obtained, the relevant decision process may refer to fig. 3, and first, a fast decision is made, and a fast decision logic is implemented, so that two or more ASR engines may be invoked for recognition respectively. Which engine returned the result determines the language. If there is no result to make an acoustic model decision, the acoustic model is mainly used to solve the problem of similar pronunciation in different languages, and there may be correct return results for different ASRs. For example, the chinese spoken word "that (nei)" sounds very similar to the trigger in english. English ASR may typically recognize it as a trigger. The sound is decomposed into IPA and then the matching is made to the IPA.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an active outbound intelligent voice robot multilingual interaction device according to an embodiment, including:
the first detection module 10 is configured to detect voice data sent by a user when the user enters a multilingual setting scene;
the sending module 20 is configured to send the voice data to each language recognition engine, so as to obtain a recognition text returned by each language recognition engine;
the second detection module 30 is configured to detect whether each recognition text carries a preset weight word when none of the recognition texts is a blank text, and determine the text carrying the weight word as a valid text;
and the input module 40 is used for inputting the effective text into the NLU system, carrying out intention recognition on the effective text in the NLU system, and triggering interaction according to the result of the intention recognition.
The specific limitation of the active outbound intelligent voice robot multilingual interaction device can be referred to as the limitation of the active outbound intelligent voice robot multilingual interaction method, and is not repeated herein. All or part of each module in the active outbound intelligent voice robot multilingual interaction device can be realized by software, hardware and combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by the processor is used for realizing an active outbound intelligent voice robot multilingual interaction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in FIG. 5 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
Based on the examples described above, in one embodiment there is also provided a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the active outbound intelligent voice robot multilingual interaction method as in any of the embodiments described above when the program is executed by the processor.
Those skilled in the art will appreciate that implementing all or part of the above-described embodiments of the method may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and as in the embodiment of the present application, the program may be stored in a storage medium of a computer system and executed by at least one processor in the computer system to implement the embodiment of the method for multilingual interaction of an active outbound intelligent voice robot as described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.
Accordingly, in one embodiment, there is also provided a computer storage medium, on which is stored a computer program, wherein the program when executed by a processor implements the active outbound intelligent voice robot multilingual interaction method according to any one of the above embodiments.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
It should be noted that, the term "first\second\third" related to the embodiment of the present application is merely to distinguish similar objects, and does not represent a specific order for the objects, it is to be understood that "first\second\third" may interchange a specific order or sequence where allowed. It is to be understood that the "first\second\third" distinguishing aspects may be interchanged where appropriate to enable embodiments of the application described herein to be implemented in sequences other than those illustrated or described.
The terms "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or modules is not limited to the particular steps or modules listed and may optionally include additional steps or modules not listed or inherent to such process, method, article, or device.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (9)

1. An active outbound intelligent voice robot multilingual interaction method is characterized by comprising the following steps:
s10, detecting voice data sent by a user when the user enters a multilingual setting scene;
s20, sending the voice data to each language recognition engine to obtain recognition texts returned by each language recognition engine;
s30, when all the identification texts are not blank texts, detecting whether all the identification texts carry preset weight words, and determining the texts carrying the weight words as effective texts;
s40, inputting the effective text into an NLU system, carrying out intention recognition on the effective text in the NLU system, and triggering interaction according to an intention recognition result;
after detecting whether each recognition text carries a preset weight word, the method further comprises the following steps:
if all the recognition texts carry preset weight words or do not carry preset weight words, respectively calling corresponding voice models from all the recognition texts, detecting the text scores of all the recognition texts by adopting all the voice models, determining the comprehensive scores of all the recognition texts according to the text scores, the hesitation time coefficients and the adjustment coefficients of all the recognition texts, and determining the recognition text with the highest comprehensive score as the effective text.
2. The method of claim 1, wherein the language recognition engine comprises an english language recognition engine and a chinese language recognition engine.
3. The method for multilingual interaction of an active outbound intelligent voice robot according to claim 2, wherein the step of transmitting voice data to each of the language recognition engines to obtain the recognition text returned by each of the language recognition engines comprises the steps of:
the voice data is sent to an English language recognition engine to obtain English recognition text returned by the English language recognition engine;
and sending the voice data to a Chinese language recognition engine to obtain a Chinese recognition text returned by the Chinese language recognition engine.
4. The method for multi-lingual interaction of an active outbound intelligent voice robot according to claim 1, wherein after sending voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine, further comprising:
if each identification text is an empty text, recording the using language as a default language, and triggering the interaction action by adopting the default language.
5. The method for multi-lingual interaction of an active outbound intelligent voice robot according to claim 1, wherein after sending voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine, further comprising:
if one non-empty text exists in each identification text, the non-empty text is determined to be a valid text.
6. The active outbound intelligent voice robot multilingual interaction method of claim 1, wherein inputting the valid text into the NLU system, and wherein performing intent recognition on the valid text in the NLU system comprises:
and inputting the effective text into an NLU system, enabling the NLU system to identify the language corresponding to the effective text, obtaining the current language, and carrying out intention identification on the effective text by adopting a language algorithm model corresponding to the current language.
7. An apparatus for implementing the active outbound intelligent voice robot multilingual interaction method of claim 1, comprising:
the first detection module is used for detecting voice data sent by a user when the user enters a multilingual setting scene;
the sending module is used for sending the voice data to each language identification engine to obtain identification texts returned by each language identification engine;
the second detection module is used for detecting whether each identification text carries a preset weight word or not when each identification text is not a blank text, and determining the text carrying the weight word as a valid text;
and the input module is used for inputting the effective text into the NLU system, carrying out intention recognition on the effective text in the NLU system, and triggering interaction action according to the intention recognition result.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 6 when the computer program is executed by the processor.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 6.
CN202010316400.9A 2020-04-21 2020-04-21 Active outbound intelligent voice robot multilingual interaction method and device Active CN111627432B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010316400.9A CN111627432B (en) 2020-04-21 2020-04-21 Active outbound intelligent voice robot multilingual interaction method and device
PCT/CN2021/071368 WO2021212929A1 (en) 2020-04-21 2021-01-13 Multilingual interaction method and apparatus for active outbound intelligent speech robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010316400.9A CN111627432B (en) 2020-04-21 2020-04-21 Active outbound intelligent voice robot multilingual interaction method and device

Publications (2)

Publication Number Publication Date
CN111627432A CN111627432A (en) 2020-09-04
CN111627432B true CN111627432B (en) 2023-10-20

Family

ID=72258977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010316400.9A Active CN111627432B (en) 2020-04-21 2020-04-21 Active outbound intelligent voice robot multilingual interaction method and device

Country Status (2)

Country Link
CN (1) CN111627432B (en)
WO (1) WO2021212929A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627432B (en) * 2020-04-21 2023-10-20 升智信息科技(南京)有限公司 Active outbound intelligent voice robot multilingual interaction method and device
CN112562640B (en) * 2020-12-01 2024-04-12 北京声智科技有限公司 Multilingual speech recognition method, device, system, and computer-readable storage medium
CN113571064B (en) * 2021-07-07 2024-01-30 肇庆小鹏新能源投资有限公司 Natural language understanding method and device, vehicle and medium
CN114918950A (en) * 2022-01-12 2022-08-19 国网吉林省电力有限公司延边供电公司 Intelligent robot for power supply of Xinji Jianfrontier
CN114464179B (en) * 2022-01-28 2024-03-19 达闼机器人股份有限公司 Voice interaction method, system, device, equipment and storage medium
CN115134466A (en) * 2022-06-07 2022-09-30 马上消费金融股份有限公司 Intention recognition method and device and electronic equipment
CN116343786A (en) * 2023-03-07 2023-06-27 南方电网人工智能科技有限公司 Customer service voice analysis method, system, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012037790A (en) * 2010-08-10 2012-02-23 Toshiba Corp Voice interaction device
CN107818781A (en) * 2017-09-11 2018-03-20 远光软件股份有限公司 Intelligent interactive method, equipment and storage medium
CN109104534A (en) * 2018-10-22 2018-12-28 北京智合大方科技有限公司 A kind of system for improving outgoing call robot and being intended to Detection accuracy, recall rate
CN109522564A (en) * 2018-12-17 2019-03-26 北京百度网讯科技有限公司 Voice translation method and device
CN109712607A (en) * 2018-12-30 2019-05-03 联想(北京)有限公司 A kind of processing method, device and electronic equipment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170011735A1 (en) * 2015-07-10 2017-01-12 Electronics And Telecommunications Research Institute Speech recognition system and method
CN105957516B (en) * 2016-06-16 2019-03-08 百度在线网络技术(北京)有限公司 More voice identification model switching method and device
US20180137109A1 (en) * 2016-11-11 2018-05-17 The Charles Stark Draper Laboratory, Inc. Methodology for automatic multilingual speech recognition
CN108335692B (en) * 2018-03-21 2021-03-05 上海智蕙林医疗科技有限公司 Voice switching method, server and system
CN109065020B (en) * 2018-07-28 2020-11-20 重庆柚瓣家科技有限公司 Multi-language category recognition library matching method and system
KR20210009596A (en) * 2019-07-17 2021-01-27 엘지전자 주식회사 Intelligent voice recognizing method, apparatus, and intelligent computing device
CN111627432B (en) * 2020-04-21 2023-10-20 升智信息科技(南京)有限公司 Active outbound intelligent voice robot multilingual interaction method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012037790A (en) * 2010-08-10 2012-02-23 Toshiba Corp Voice interaction device
CN107818781A (en) * 2017-09-11 2018-03-20 远光软件股份有限公司 Intelligent interactive method, equipment and storage medium
CN109104534A (en) * 2018-10-22 2018-12-28 北京智合大方科技有限公司 A kind of system for improving outgoing call robot and being intended to Detection accuracy, recall rate
CN109522564A (en) * 2018-12-17 2019-03-26 北京百度网讯科技有限公司 Voice translation method and device
CN109712607A (en) * 2018-12-30 2019-05-03 联想(北京)有限公司 A kind of processing method, device and electronic equipment

Also Published As

Publication number Publication date
CN111627432A (en) 2020-09-04
WO2021212929A1 (en) 2021-10-28

Similar Documents

Publication Publication Date Title
CN111627432B (en) Active outbound intelligent voice robot multilingual interaction method and device
US20220335930A1 (en) Utilizing pre-event and post-event input streams to engage an automated assistant
CN112262430A (en) Automatically determining language for speech recognition of a spoken utterance received via an automated assistant interface
CN110148416A (en) Audio recognition method, device, equipment and storage medium
CN111052229A (en) Automatically determining a language for speech recognition of a spoken utterance received via an automated assistant interface
US11978432B2 (en) On-device speech synthesis of textual segments for training of on-device speech recognition model
US20150199340A1 (en) System for translating a language based on user's reaction and method thereof
CN112673421A (en) Training and/or using language selection models to automatically determine a language for voice recognition of spoken utterances
CN107093425A (en) Speech guide system, audio recognition method and the voice interactive method of power system
KR20220088926A (en) Use of Automated Assistant Function Modifications for On-Device Machine Learning Model Training
KR102140391B1 (en) Search method and electronic device using the method
KR20220166848A (en) User Moderation for Hotword/Keyword Detection
US20240021207A1 (en) Multi-factor audio watermarking
KR20230005966A (en) Detect close matching hotwords or phrases
CN115083412B (en) Voice interaction method and related device, electronic equipment and storage medium
KR20210042520A (en) An electronic apparatus and Method for controlling the electronic apparatus thereof
CN112037772B (en) Response obligation detection method, system and device based on multiple modes
CN113077790B (en) Multi-language configuration method, multi-language interaction method, device and electronic equipment
CN111062200A (en) Phonetics generalization method, phonetics identification method, device and electronic equipment
CN115116442B (en) Voice interaction method and electronic equipment
CN115662430B (en) Input data analysis method, device, electronic equipment and storage medium
CN111324703A (en) Man-machine conversation method and doll simulating human voice to carry out man-machine conversation
CN117711389A (en) Voice interaction method, device, server and storage medium
JP2005122194A (en) Voice recognition and dialog device and voice recognition and dialog processing method
CN118020100A (en) Voice data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant