WO2021212929A1

WO2021212929A1 - Multilingual interaction method and apparatus for active outbound intelligent speech robot

Info

Publication number: WO2021212929A1
Application number: PCT/CN2021/071368
Authority: WO
Inventors: 李训林; 王帅; 张晋
Original assignee: 升智信息科技(南京)有限公司
Priority date: 2020-04-21
Filing date: 2021-01-13
Publication date: 2021-10-28
Also published as: CN111627432B; CN111627432A

Abstract

A multilingual interaction method and apparatus for an active outbound intelligent speech robot, and a computer device and a storage medium. The method comprises: when a user enters a multilingual setting scene, detecting speech data sent by a user (S10); sending the speech data to language recognition engines to obtain recognition texts returned by the language recognition engines (S20); when the recognition texts are not empty texts, detecting whether the recognition texts carry a preset weight word, and determining the text carrying the weight word to be an effective text (S30); and inputting the effective text into an NLU system, performing intention recognition on the effective text in the NLU system, and triggering an interaction action according to an intention recognition result (S40). According to the method, a multilingual service of the intelligent speech robot can be implemented, thus improving the corresponding user experience.

Description

Active outbound call intelligent voice robot multilingual interaction method and device

Technical field

The present invention relates to the technical field of voice signal processing, in particular to an active outbound intelligent voice robot multilingual interaction method, device, computer equipment and storage medium.

Background technique

With the advent of the cloud era and the continuous reform of artificial intelligence technology innovation, intelligent robots based on voice systems have entered all walks of life; the current intelligent voice robots have replaced a large number of boring and repetitive customer service tasks, liberating artificial productivity; for all walks of life Smart reply provides a lot of traversal;

The active outbound intelligent voice robot uses the Duolun dialogue method to guide users to dialogue on the premise of preset dialogue scenes, so as to achieve the purpose of marketing. Its main core functional modules are speech recognition (ASR), speech synthesis (TTS), dialogue management (DM), natural language processing (NLP), and natural language understanding (NLU).

In overseas markets, most intelligent voice robots are single-language, and single-language support accounts for 95% of users. However, in real outbound call scenarios, there are still some users whose expressive ability is weak in a single-language scenario. For example, in Southeast Asia, the main language is English, and the expatriate Chinese are more familiar with Chinese accounting for 5%. When hearing the voice robot broadcast in English, such users will ask the voice robot whether it can provide other language services, such as Chinese? In this scenario, the product value is reduced due to language problems, resulting in poor user experience of the corresponding product.

Summary of the invention

In view of the above problems, the present invention proposes an active outbound call intelligent voice robot multilingual interaction method, device, computer equipment and storage medium.

In order to achieve the purpose of the present invention, a multilingual interaction method for an active outbound intelligent voice robot is provided, which includes the following steps:

S10, when the user enters the multilingual setting scene, detect the voice data sent by the user;

S20: Send the voice data to each language recognition engine to obtain the recognized text returned by each language recognition engine;

S30: When all recognized texts are not empty texts, detect whether each recognized text carries preset weight words, and determine the text carrying the weight words as valid text;

S40: Input the valid text into the NLU system, perform intention recognition on the valid text in the NLU system, and trigger an interactive action according to the result of the intention recognition.

In one embodiment, the language recognition engine includes an English language recognition engine and a Chinese language recognition engine.

As an embodiment, the voice data is sent to each language recognition engine, and the recognized text returned by each language recognition engine includes:

Send the voice data to the English language recognition engine to obtain the English recognition text returned by the English language recognition engine;

The voice data is sent to the Chinese language recognition engine, and the Chinese recognition text returned by the Chinese language recognition engine is obtained.

In one embodiment, after detecting whether each recognized text carries preset weight words, the method further includes:

If each recognized text carries preset weight words or none of them carries preset weight words, then each recognized text is called the corresponding voice model, and each voice model is used to detect the text score of each recognized text, and the text score of each recognized text is determined according to the recognition text. The text score, hesitation time coefficient, and adjustment coefficient determine the comprehensive score of each recognized text, and the recognized text with the highest comprehensive score is determined as the effective text.

In one embodiment, after the voice data is sent to each language recognition engine, and the recognition text returned by each language recognition engine is obtained, the method further includes:

If each recognized text is an empty text, the recording language is the default language, and the default language is used to trigger the interactive action.

If there is a non-empty text in each recognized text, the non-empty text is determined as a valid text.

In one embodiment, inputting valid text into the NLU system, and performing intent recognition on the valid text in the NLU system includes:

Input the valid text into the NLU system, make the NLU system recognize the language corresponding to the valid text, obtain the current language, and use the language algorithm model corresponding to the current language to perform intent recognition on the valid text.

A multilingual interaction device for an active outbound intelligent voice robot, including:

The first detection module is used to detect the voice data sent by the user when the user enters the multilingual setting scene;

The sending module is used to send voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine;

The second detection module is used to detect whether each recognized text carries preset weight words when each recognized text is not empty text, and determine the text carrying the weight words as valid text;

The input module is used for inputting valid text into the NLU system, in the NLU system for intent recognition of the valid text, and triggering interactive actions according to the intent recognition result.

A computer device, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor. The processor implements the active outbound call intelligent voice robot of any of the above-mentioned embodiments when the processor executes the computer program The steps of a multilingual interactive method.

A computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of the active outbound intelligent voice robot multilingual interaction method of any one of the above embodiments are realized.

The above-mentioned active outbound call intelligent voice robot multilingual interaction method, device, computer equipment and storage medium can detect the voice data sent by the user when the user enters the multilingual setting scene, and send the voice data to each language recognition engine to obtain each The recognized text returned by the language recognition engine, when each recognized text is not empty, detects whether each recognized text carries preset weight words, determines the text with weight words as valid text, and enters the valid text into NLU (Natural Language Understanding) The system, in the NLU system, performs intent recognition on valid texts, and triggers interactive actions based on the intent recognition results to realize the multilingual services of the corresponding intelligent voice robot, improve the value of the intelligent voice robot, and improve the corresponding user experience.

Description of the drawings

Fig. 1 is a flow chart of an embodiment of an active outbound call intelligent voice robot multilingual interaction method;

Fig. 2 is a schematic diagram of a working process of an intelligent voice robot according to an embodiment;

Figure 3 is a language decision flow chart of an embodiment;

FIG. 4 is a schematic structural diagram of an active outbound call intelligent voice robot multilingual interaction device according to an embodiment;

Fig. 5 is a schematic diagram of a computer device according to an embodiment.

Detailed ways

In order to make the purpose, technical solutions, and advantages of this application clearer and clearer, the following further describes the application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.

The reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.

The multilingual interaction method of the active outbound intelligent voice robot provided in this application can be applied to related intelligent voice robots. The above intelligent voice robot can detect the voice data sent by the user when the user enters the multilingual setting scene, send the voice data to each language recognition engine, and obtain the recognized text returned by each language recognition engine, and each recognized text is not empty text At the time, detect whether each recognized text carries preset weight words, determine the text with weight words as valid text, input the valid text into the NLU (Natural Language Understanding) system, and perform intent recognition on the valid text in the NLU system. The recognition result triggers interactive actions to realize the multilingual services of the corresponding intelligent voice robot, and increase the value of the intelligent voice robot, thereby enhancing the corresponding user experience.

In one embodiment, as shown in FIG. 1, an active outbound call intelligent voice robot multilingual interaction method is provided. Taking the method applied to an intelligent voice robot as an example for description, the method includes the following steps:

S10: When the user enters the multilingual setting scene, detect the voice data sent by the user.

The intelligent voice robot can use the language recognition setting device to preset the scene of speech and art. In the scene where multilingual recognition is required, set the possible language types of speech, such as Chinese and English; when the user enters the scene where the intelligent speech robot is located , The intelligent voice robot will use the preset language recognition engine to identify the user's corresponding language.

S20: Send the voice data to each language recognition engine to obtain the recognized text returned by each language recognition engine.

In one embodiment, the language recognition engine includes an English language recognition engine and a Chinese language recognition engine. The above-mentioned English language recognition engine may be the default language recognition engine, and correspondingly, the English language may be the default language.

Specifically, the voice data is sent to each language recognition engine, and the recognized text returned by each language recognition engine includes:

Send the voice data to the English language recognition engine to obtain the English recognition text returned by the English language recognition engine; further, the English recognition text can be recorded as TXT-EN;

The voice data is sent to the Chinese language recognition engine to obtain the Chinese recognized text returned by the Chinese language recognition engine. Further, the Chinese recognized text can be recorded as TXT-CN.

S30: When each recognized text is not an empty text, detect whether each recognized text carries a preset weight word, and determine the text carrying the weight word as a valid text.

The above weighted words can be preset in the intelligent voice robot. Specifically, for calculation, a specific weight word needs to be set. If the preset speech scene of the intelligent voice robot includes a Chinese scene and an English scene, the weight word setting process may include:

According to the actual scenario analysis, the language expression analysis that may be used in Chinese and English in this scenario sets weight words. The logic of weight words can be based on the user's voice habits and related psychological aspects in each context; for example, a normal person is receiving a call , If the first sentence is in a familiar language, it will be answered normally. If the opening robot (intelligent voice robot) asks "Hello, this is XX calling from XX, may I speak to XXX?", if the person answering the call understands English, it will answer this sentence smoothly. To understand from two levels. One is that the answer is semantically consistent with the answer of the opening statement; the other is that the answering speed is normal. Generally within 200ms to 500ms. If someone who doesn't understand English answers the phone, he will first "mute", and then answer in Chinese "May I speak Chinese" or "You made a wrong number". For opening remarks in different languages, the average person's response has a range. The intelligent voice robot uses this "range" to set adjustment coefficients for each language, and determines the hesitation time coefficient according to the time interval during the response of the corresponding user.

The core function of the weight word is that the speech recognition result contains the preset weight word, which indicates that the user is most likely to express the language; the weight word rule will be a part of the core logic of language judgment; according to the actual scene Analysis shows that in unfamiliar language scenarios, users will hesitate for 300-500ms. The more information, the longer the hesitation time. Therefore, the time threshold T is set according to the scene analysis; after T, the more time used, the lower the language familiarity.

The above steps can input the valid text and the language corresponding to the valid text into the NLU system.

Specifically, the NLU system obtains valid text, considering that different languages need to use different natural language processing models, so NLU enters different processing rules and models by acquiring language types; and uses the recognition results as the intentional response input, and then passes through pre-training A good algorithm model for different languages is used for intent recognition; after the intent recognition is completed, the action corresponding to the intent will be triggered, such as broadcasting; and the action will be processed according to the language, and the text-to-speech service (TTS) corresponding to the language will be called to generate the corresponding The voice is broadcast to complete the feedback exchange with the user.

The above active outbound call intelligent voice robot multilingual interaction method can detect the voice data sent by the user when the user enters the multilingual setting scene, send the voice data to each language recognition engine, and obtain the recognition text returned by each language recognition engine. When each recognized text is not empty text, detect whether each recognized text carries preset weight words, determine the text with weight words as valid text, and input the valid text into the NLU (Natural Language Understanding) system, in the NLU system Recognize the intent of the valid text and trigger interactive actions according to the result of the intent recognition to realize the multilingual services of the corresponding intelligent voice robot, increase the value of the intelligent voice robot, and improve the corresponding user experience.

Specifically, the language recognition engine includes an English language recognition engine and a Chinese language recognition engine as an example. The intelligent voice robot sends the user's voice (voice data) to the English language recognition engine and the Chinese language recognition engine for voice recognition, and the results returned by the English engine It is TXT-EN, and the result returned by the Chinese engine is TXT-CN. The process of determining the corresponding recognition result (valid text) can include:

Scenario 1: TXT-EN or TXT-CN are both empty texts (no valid results are recognized), it is considered that the current reception is likely to be noise, and the recording language is the default language (such as English);

Scenario 2: If one of TXT-EN or TXT-CN is empty text (no valid result is recognized), it is considered that the text returned is the correct language, and the language type is recorded;

Scenario 3: TXT-EN or TXT-CN both return non-empty text, then weight calculation: Whether TXT-EN or TXT-CN contains the weighted words set in step 2, consider the weighted words to fit the scene, so if there are weighted words, It means that the recognition result is probably the user’s answer; therefore, the recognition result is the optimal result only when one of TXT-EN or TXT-CN contains a weighted word; if TXT-EN or TXT-CN both contain a weighted word or If not included at the same time, TXT-EN and TXT-CN will be called Chinese and English language models respectively, and the results will be scored to obtain sourceEN (TXT-EN) and sourceCN (TXT-CN); both Chinese and English For the different dimensions of the English model score, we find the adjustment coefficient s according to the actual scene data statistics; think that when sourceEN(TXT-EN)*s, the scoring dimension is close to sourceCN; also consider the hesitation time coefficient Δt=DelayTime( User hesitation time)-T (preset time threshold), used in conjunction with the sensitivity coefficient to obtain the most suitable score processing; through a large amount of data verification and verification, the empirical values a (sensitivity coefficient) and s (adjustment coefficient) are finally obtained Formula for scoring (calculation of comprehensive score):

The English comprehensive score is: sourceEN(TXT-EN)*sa ^△t

Chinese comprehensive score: sourceCN (TXT-CN)

Compare English score (English comprehensive score) and Chinese score (Chinese comprehensive score), and select the result with high score as the user identification result and language.

In one embodiment, the above-mentioned active outbound call intelligent voice robot multilingual interaction method completes the determination of the interactive language by identifying and determining the mode, and finally solves the existing defects of the existing intelligent voice robot in the multilingual scene, and the intelligent voice robot in the specific scene The method for passively switching dialogue languages may include a speech synthesis module, a natural speech processing module, a natural language understanding module, a dialogue management module, and a speech recognition module.

As shown in Figure 2, in the use of intelligent voice robots, for different scenarios, configure the types of voices that need to be supported: for example, English and Chinese are supported. When the intelligent robot executes the scene, it will get the configuration. According to this configuration, different language processing is performed in the corresponding scene; generally speaking, this configuration is the judgment condition for the execution of language processing logic.

At the speech recognition level, different language recognition engines are set up according to the configuration of the scene, and the results of multilingual speech recognition are evaluated and scored through a unique language model. According to the evaluation results, the most suitable results are found, and the most suitable results are corresponded. The language marks the language used by the user, and provides the best result and the language of the result to NLU as the basis for NLU judgment;

At the level of natural language understanding, it is compatible with different languages. According to the languages determined in the speech recognition process, the intelligent voice robot will use different semantic matching algorithms to improve semantic recognition, and adjust the output language (TTS) of the intelligent voice robot according to the user's language;

Psychologically, when a normal person receives a phone call, if the first sentence is in a familiar language, they will answer normally. If the robot asks "Hello, this is XX calling from XX, may I speak to XXX?" in the opening remarks, if the person answering the call understands English, it will answer this sentence smoothly. To understand from two levels. One is that the answer is semantically consistent with the answer of the opening statement; the other is that the answering speed is normal. Generally within 200ms to 500ms. If someone who doesn't understand English answers the phone, he will first "mute", and then answer in Chinese "May I speak Chinese" or "You made a wrong number". For opening remarks in different languages, the average person's response has a range. We use this "scope"

Construction of Chinese dialect: In the case that the dialogue management system defaults to English, in order to cover more language scenarios, a set of corresponding Chinese dialect scenarios is added under the premise of this English dialect preset scenario;

Chinese intention construction: In the original active outbound opening scene, a new field is added, namely the customer intention scene branch: the intention branch of the customer who wants to speak Chinese. Under this intention branch, you need to configure the possible statements of the customer's corresponding scenario, such as: Can you speak Chinese? can you speak chinese?

The definition of hot words of nodes in specific scenarios of dialogue management: This link is mainly to analyze the specific scenarios, if the user's English expression ability is weak, they may ask the AI's words skills, and summarize the corresponding words in the switching intention words. High-frequency hot words are: Chinese, Mandarin, Chinese, English.

In the original English engine scenario, a new layer of Chinese engine is added for specific scenarios: For English opening remarks, in the case of broadcasting in English, customers who do not speak English will generally ask the voice system, can you speak Chinese? For this specific scenario, we need to add a layer of Chinese ASR engine as a supplement to the 5% scenario;

Its advantage features are: covering multilingual user groups, using clever methods to avoid the negative impact of recognition caused by the use of bilingual engines, such as the overall system response rate, and the accuracy of user recognition in 95% scenarios.

In an example, in the application process of the above-mentioned active outbound call intelligent voice robot multilingual interaction method, after the voice data for use is obtained, the relevant decision-making process can be referred to as shown in Figure 3. First, make quick decision, quick decision logic, simple Implementations can call two or more ASR engines for identification respectively. Which kind of engine returns the result is judged as the language. If there is no result to make an acoustic model decision, the acoustic model is mainly used to solve the problem of similar pronunciation in different languages, and there may be correct return results for different ASRs. For example, the spoken Chinese "那(nei)个" sounds similar to the English nigger. English ASR may generally recognize it as a nigger. Decompose the sound into IPA, and then match the IPA.

Referring to Fig. 4, Fig. 4 is a schematic structural diagram of an active outbound call intelligent voice robot multilingual interaction device according to an embodiment, including:

The first detection module 10 is configured to detect the voice data sent by the user when the user enters the multilingual setting scene;

The sending module 20 is used to send voice data to each language recognition engine to obtain the recognized text returned by each language recognition engine;

The second detection module 30 is configured to detect whether each recognized text carries a preset weight word when each recognized text is not an empty text, and determine the text carrying the weight word as a valid text;

The input module 40 is used for inputting valid text into the NLU system, in the NLU system for intent recognition of the valid text, and triggering interactive actions according to the intent recognition result.

For the specific limitation of the active outbound intelligent voice robot multilingual interaction device, please refer to the above limitation on the active outbound intelligent voice robot multilingual interaction method, which will not be repeated here. The various modules in the above-mentioned active outbound intelligent voice robot multilingual interaction device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.

In one embodiment, a computer device is provided. The computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 5. The computer equipment includes a processor, a memory, a network interface, a display screen and an input device connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize an active outbound call intelligent voice robot multilingual interaction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, or it can be a button, trackball or touch pad set on the housing of the computer equipment , It can also be an external keyboard, touchpad, or mouse.

Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

Based on the above example, in one embodiment, a computer device is further provided. The computer device includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the The program implements any active outbound call intelligent voice robot multilingual interaction method as in the foregoing embodiments.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The program can be stored in a non-volatile computer readable storage. In the medium, as in the embodiment of the present invention, the program can be stored in the storage medium of the computer system and executed by at least one processor in the computer system to realize multilingual interaction including the active outbound intelligent voice robot described above The flow of an embodiment of the method. Wherein, the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Accordingly, in one embodiment, there is also provided a computer storage medium, a computer readable storage medium, on which a computer program is stored, where the program is executed by the processor to implement any active type in the above-mentioned embodiments. Multilingual interaction method for outbound intelligent voice robots.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.

It should be noted that the term "first\second\third" involved in the embodiments of this application only distinguishes similar objects, and does not represent a specific order for the objects. Understandably, "first\second\third" "Three" can be interchanged in specific order or precedence when permitted. It should be understood that the objects distinguished by "first\second\third" can be interchanged under appropriate circumstances, so that the embodiments of the present application described herein can be implemented in an order other than those illustrated or described herein.

The terms "include" and "have" and any variations thereof in the embodiments of the present application are intended to cover non-exclusive inclusions. For example, a process, method, device, product, or device that includes a series of steps or modules is not limited to the listed steps or modules, but optionally includes unlisted steps or modules, or optionally also includes Other steps or modules inherent to these processes, methods, products or equipment.

The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

A multilingual interaction method for an active outbound intelligent voice robot, which is characterized in that it includes the following steps:

S10: When the user enters the multilingual setting scene, detect the voice data sent by the user;

S20: Send the voice data to each language recognition engine to obtain the recognized text returned by each language recognition engine;

S30: When all recognized texts are not empty texts, detect whether each recognized text carries preset weight words, and determine the text carrying the weight words as valid text;

S40: Input the valid text into the NLU system, perform intention recognition on the valid text in the NLU system, and trigger an interactive action according to the result of the intention recognition.
The active outbound call intelligent voice robot multilingual interaction method according to claim 1, wherein the language recognition engine includes an English language recognition engine and a Chinese language recognition engine.
The active outbound call intelligent voice robot multilingual interaction method according to claim 2, wherein the voice data is sent to each language recognition engine, and the recognition text returned by each language recognition engine includes:

Send the voice data to the English language recognition engine to obtain the English recognition text returned by the English language recognition engine;

The voice data is sent to the Chinese language recognition engine, and the Chinese recognized text returned by the Chinese language recognition engine is obtained.
The active outbound call intelligent voice robot multilingual interaction method according to claim 1, characterized in that, after detecting whether each recognized text carries preset weight words, it further comprises:

If each recognized text carries preset weight words or none of the preset weight words, then each recognized text is called the corresponding voice model, and each voice model is used to detect the text score of each recognized text. The text score, hesitation time coefficient and adjustment coefficient determine the comprehensive score of each recognized text, and the recognized text with the highest comprehensive score is determined as the effective text.
The active outbound call intelligent voice robot multilingual interaction method according to claim 1, characterized in that the voice data is sent to each language recognition engine, and after the recognition text returned by each language recognition engine is obtained, the method further comprises:

If each recognized text is an empty text, the recording language is the default language, and the default language is used to trigger the interactive action.
The active outbound call intelligent voice robot multilingual interaction method according to claim 1, characterized in that the voice data is sent to each language recognition engine, and after the recognition text returned by each language recognition engine is obtained, the method further comprises:

If there is a non-empty text in each recognized text, the non-empty text is determined as a valid text.
The active outbound call intelligent voice robot multilingual interaction method according to claim 1, wherein inputting valid text into the NLU system, and performing intent recognition on the valid text in the NLU system includes:

Input the valid text into the NLU system, make the NLU system recognize the language corresponding to the valid text, obtain the current language, and use the language algorithm model corresponding to the current language to perform intent recognition on the valid text.
A multilingual interaction device for an active outbound intelligent voice robot, which is characterized in that it includes:

The first detection module is used to detect the voice data sent by the user when the user enters the multilingual setting scene;

The sending module is used to send voice data to each language recognition engine to obtain the recognition text returned by each language recognition engine;

The second detection module is used to detect whether each recognized text carries preset weight words when each recognized text is not empty text, and determine the text carrying the weight words as valid text;

The input module is used for inputting valid text into the NLU system, in the NLU system for intent recognition of the valid text, and triggering interactive actions according to the intent recognition result.
A computer device comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor implements any one of claims 1 to 7 when the computer program is executed The steps of the method.
A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the steps of any one of claims 1 to 7 when the computer program is executed by a processor.