CN113053389A

CN113053389A - Voice interaction system and method for switching languages by one key and electronic equipment

Info

Publication number: CN113053389A
Application number: CN202110268209.6A
Authority: CN
Inventors: 李旭滨; 陈晓松; 陈吉胜
Original assignee: Unisound Intelligent Technology Co Ltd; Xiamen Yunzhixin Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd; Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2021-06-29

Abstract

The application discloses a voice interaction system and method for switching languages by one key and electronic equipment, and belongs to the field of voice interaction equipment. The system comprises: the switching information receiving module is used for receiving language switching information, and the language switching information comprises a first target language and a second target language; the voice recognition module is used for unloading the voice recognition engine corresponding to the first target language and calling the voice recognition engine corresponding to the second target language to generate a recognition result corresponding to the second target language; the translation module is used for switching a translation mode according to the language switching information and translating the recognition result corresponding to the second target language to generate a translation result; and the voice generating module is used for unloading the voice generating engine corresponding to the first target language and calling the voice generating engine corresponding to the second target language to generate the audio information corresponding to the second target language.

Description

Voice interaction system and method for switching languages by one key and electronic equipment

Technical Field

The application belongs to the field of voice interaction equipment, and particularly relates to a voice interaction system and method for switching languages by one key and electronic equipment.

Background

In an economically developed city, intelligent interaction equipment is needed to be used on many occasions to realize communication among people in different languages, so as to meet the requirements of different crowds, for example, a high-grade hotel located in hong Kong may need to receive guests who speak Mandarin, Cantonese, English or Japanese. This requires intelligent interaction devices to enable multi-language switching.

However, the existing voice recognition device usually only supports waking up and recognizing in one language, even if multi-language switching is supported, the switching can be realized only by manually modifying configuration and restarting the device in a common mode, and the mode of realizing language switching by modifying configuration and restarting is not flexible and real-time enough to meet the requirement of instant communication of users.

Content of application

The embodiment of the application aims to provide a voice interaction system and method for switching languages by one key and electronic equipment, and can solve the problems that the existing voice interaction equipment is not flexible enough and inconvenient to operate.

In order to solve the technical problem, the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides a voice interaction system for switching languages by one key, where the system includes: the switching information receiving module is used for receiving language switching information, and the language switching information comprises a first target language and a second target language; the voice recognition module is used for unloading the voice recognition engine corresponding to the first target language and calling the voice recognition engine corresponding to the second target language to generate a recognition result corresponding to the second target language; the translation module is used for switching a translation mode according to the language switching information and translating the recognition result corresponding to the second target language to generate a translation result; and the voice generating module is used for unloading the voice generating engine corresponding to the first target language and calling the voice generating engine corresponding to the second target language to generate the audio information corresponding to the second target language.

Further, the speech recognition module is further configured to: receiving voice data corresponding to the second target language; generating a character recognition result corresponding to the voice data; and sending the character recognition result to a translation module.

Further, the translation module is further to: receiving the character recognition result; and generating a translation result corresponding to the first target language according to the character recognition result.

Further, the system further comprises: the semantic analysis module is used for analyzing corresponding semantic information according to the translation result; and the display switching module is used for calling a corresponding language display mode according to the language switching information.

Further, the system further comprises: and the prompting module is used for generating prompting information under the condition that the language switching is successful.

In a second aspect, an embodiment of the present application provides a voice interaction method for switching languages by one key, where the method includes: receiving language switching information, wherein the language switching information comprises a first target language and a second target language; unloading the voice recognition engine corresponding to the first target language, and calling the voice recognition engine corresponding to the second target language to generate a recognition result corresponding to the second target language; switching a translation mode according to the language switching information, and translating the recognition result corresponding to the second target language to generate a translation result; and unloading the voice generating engine corresponding to the first target language, and calling the voice generating engine corresponding to the second target language to generate the audio information corresponding to the second target language.

Further, the method further comprises: and analyzing corresponding semantic information according to the translation result, wherein the semantic information comprises keywords contained in the translation result.

Further, the method further comprises: and calling language display modes according to the language switching information, wherein the language display modes correspond to all target languages in the language switching information one by one.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the second aspect.

In a fourth aspect, the present application provides a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the second aspect.

In the embodiment of the application, a voice interaction system for switching languages by one key is provided, and the language switching information can be sent to a voice recognition module, a translation module and a voice generation module at the same time, the voice recognition module and the voice generation module call an engine corresponding to the language switching information to execute a voice recognition function or generate voice, so that the switching of languages can be realized without resetting system configuration and restarting equipment by a user, and the requirement of the user for quickly and flexibly switching languages is met.

Drawings

FIG. 1 is a schematic structural diagram of a voice interaction system for switching languages by one key according to the present embodiment;

FIG. 2 is a schematic diagram of engine invocation of a one-key language-switching speech interaction system according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a voice interaction method for switching languages by one key according to the embodiment.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The following describes in detail a voice interaction system, a method and an electronic device for switching languages by one key according to an embodiment of the present application with reference to the accompanying drawings.

The present embodiment provides a voice interaction system for switching languages by one key, and referring to fig. 1, the system includes:

and the switching information receiving module 01 is connected to the voice recognition module, the translation module and the voice generation module, and is used for receiving language switching information, wherein the language switching information comprises a first target language and a second target language.

For example, the language switching information may be to switch a chinese mode to an english mode, and the first target language may be chinese and the second target language may be english.

The user can input the switching information through an operation interface (such as an operation interface of the voice interaction device) or a control key of the system of the embodiment. For example, if "english mode" is input through the touch screen, the system accepts the switching information and converts any current language mode into english mode.

The speech recognition module 02 is configured to unload the speech recognition engine corresponding to the first target language and call the speech recognition engine corresponding to the second target language to generate a recognition result corresponding to the second target language. In one possible example, the recognition mode at the current time is to recognize Chinese, and the speech recognition engine at this time is a Chinese recognition engine that recognizes the received Chinese speech. When language switching information for switching the Chinese mode into the English mode is received, namely the first target language is Chinese and the second target language is English, the voice recognition engine needs to be replaced, then the voice recognition module unloads the Chinese recognition engine, calls the English recognition engine and starts the English recognition engine, so that the English recognition engine is used for recognizing English voice data and generating a recognition result, wherein the recognition result is English.

The translation module 03 is configured to switch a translation mode according to the language switching information, and translate the recognition result corresponding to the second target language to generate a translation result. Specifically, since the speech recognition module only recognizes the information of the second target language, and the interaction is two-way, the above example is continued, when an english user and a chinese user interact with each other, two languages of interaction will be inevitably performed, that is, english is translated into chinese, or chinese is translated into english, and therefore, the second target language needs to be translated.

The speech generating module 04 is configured to unload the speech generating engine corresponding to the first target language, and call the speech generating engine corresponding to the second target language to generate the audio information corresponding to the second target language. The speech generating engine may be a tts (text To speech) engine, that is, from text To speech, the received text information may be sent out in a speech manner, thereby implementing communication with the interactive object.

In addition, after the language switching is completed, the speech recognition module of this embodiment is further configured to: receiving voice data corresponding to the second target language; generating a character recognition result corresponding to the voice data; and sending the character recognition result to a translation module.

Further, the translation module is further configured to receive the text recognition result, and generate a translation result corresponding to the first target language according to the text recognition result.

The system of this embodiment further comprises: the semantic analysis module 05 is used for analyzing corresponding semantic information according to the translation result, for example, the translation result is that a cup of water is wanted by oneself, then the semantic analysis module can extract main information water so that a user can quickly know the intention of the other side, and can also be linked with other devices according to the semantic information, if the semantic information is water, the system is linked with an intelligent water delivery robot, so that the instant water delivery effect is realized, and the semantic analysis module has strong expansibility.

The system of this embodiment further comprises: and the display switching module 06 is configured to invoke a corresponding language display mode according to the language switching information. The language display mode may be a display skin corresponding to any language, for example, when the language switching information is chinese, the display mode is chinese, that is, the display skin is chinese skin.

The system of this embodiment further comprises: and the prompting module 07 is connected with the display switching module and is used for generating prompting information under the condition that the language switching is successful, for example, after the language is successfully switched into Chinese, voice information of 'the system is successfully switched into a Chinese mode' is generated through voice interaction equipment.

The voice interaction system capable of switching languages by one key provided by the embodiment can send language switching information to the voice recognition module, the translation module and the voice generation module, the voice recognition module and the voice generation module call the engine corresponding to the language switching information to execute a voice recognition function or generate voice, the switching of languages can be realized without resetting system configuration and restarting equipment by a user, and the requirement of the user for switching languages rapidly and flexibly is met.

In a specific example, the process of the method is described by taking the example of switching from Chinese to English:

after a user presses a language switching key of the voice interaction equipment, the system respectively sends language switching information to the voice recognition module, the translation module, the voice generation module and the display switching module by using a key callback function; after receiving the language switching information, the voice recognition module stops the current recognition process, resets and unloads the Chinese recognition engine, and loads and starts the English recognition engine; after receiving the language switching information, the translation module switches the translation mode into an English translation mode, and all input identification contents are translated from English to Chinese mandarin; after receiving the language switching information, the speech generation module stops the current speech generation process, resets and unloads the Chinese TTS engine, and loads and starts the English TTS engine; the display switching module switches to English theme skin; and the prompting module prompts that the current English language is successfully switched to by voice and UI.

The embodiment also provides a voice interaction method for switching languages by one key, and referring to fig. 2, the method includes:

step S1: receiving language switching information, wherein the language switching information comprises a first target language and a second target language;

the language switching information may be to switch the chinese mode to the english mode, and then the first target language may be chinese and the second target language may be english.

Step S2: unloading the voice recognition engine corresponding to the first target language, and calling the voice recognition engine corresponding to the second target language to generate a recognition result corresponding to the second target language;

in one possible example, the recognition mode at the current time is to recognize Chinese, and the speech recognition engine at this time is a Chinese recognition engine that recognizes the received Chinese speech. When language switching information for switching the Chinese mode into the English mode is received, namely the first target language is Chinese and the second target language is English, the voice recognition engine needs to be replaced, then the voice recognition module unloads the Chinese recognition engine, calls the English recognition engine and starts the English recognition engine, so that the English recognition engine is used for recognizing English voice data and generating a recognition result, wherein the recognition result is English.

Step S3: switching a translation mode according to the language switching information, and translating the recognition result corresponding to the second target language to generate a translation result;

since the speech recognition module only recognizes the information of the second target language, and the interaction is two-way, the above example is continued, when an english user and a chinese user interact with each other, two languages of interaction will be necessarily performed, that is, english is translated into chinese, or chinese is translated into english, and therefore, the second target language needs to be translated.

Step S4: and unloading the voice generating engine corresponding to the first target language, and calling the voice generating engine corresponding to the second target language to generate the audio information corresponding to the second target language.

The speech generating engine may be a tts (text To speech) engine, that is, from text To speech, the received text information may be sent out in a speech manner, thereby implementing communication with the interactive object.

The method further comprises the following steps: and analyzing corresponding semantic information according to the translation result, wherein the semantic information comprises keywords contained in the translation result. For example, if the translation result is that "i want a cup of water", then the semantic analysis module can extract main information "water" so that the user can know the intention of the other party quickly, and can also perform linkage with other devices according to the semantic information, for example, when the semantic information is "water", the system is linked with the intelligent water delivery robot, so that an instant water delivery effect is realized, and the system has strong expansibility.

In addition, the embodiment may also call up a language display mode according to the language switching information, where the language display mode corresponds to all target languages in the language switching information one to one, that is, each language has a corresponding skin.

The voice interaction method for switching languages by one key provided by this embodiment can send language switching information to the voice recognition module, the translation module and the voice generation module at the same time, and the voice recognition module and the voice generation module call an engine corresponding to the language switching information to execute a voice recognition function or generate voice, so that switching of languages can be realized without resetting system configuration and restarting equipment by a user, and the requirement of the user for switching languages rapidly and flexibly is met.

The embodiment also provides an electronic device, which comprises a processor, a memory and a program or an instruction stored on the memory and capable of running on the processor, wherein the program or the instruction realizes the step of the voice interaction method for switching languages by one key when being executed by the processor.

The embodiment further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and the program or the instruction, when executed by the processor, implements the step of the voice interaction method for switching languages by one key, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A voice interactive system for switching languages by one key, the system comprising:

the switching information receiving module is used for receiving language switching information, and the language switching information comprises a first target language and a second target language;

the voice recognition module is used for unloading the voice recognition engine corresponding to the first target language and calling the voice recognition engine corresponding to the second target language to generate a recognition result corresponding to the second target language;

the translation module is used for switching a translation mode according to the language switching information and translating the recognition result corresponding to the second target language to generate a translation result;

and the voice generating module is used for unloading the voice generating engine corresponding to the first target language and calling the voice generating engine corresponding to the second target language to generate the audio information corresponding to the second target language.

2. The system according to claim 1, wherein the speech recognition module is further configured to:

receiving voice data corresponding to the second target language;

generating a character recognition result corresponding to the voice data;

and sending the character recognition result to a translation module.

3. The system according to claim 2, wherein the translation module is further configured to:

receiving the character recognition result;

and generating a translation result corresponding to the first target language according to the character recognition result.

4. The system of claim 1, further comprising:

the semantic analysis module is used for analyzing corresponding semantic information according to the translation result;

and the display switching module is used for calling a corresponding language display mode according to the language switching information.

5. The system of claim 1, further comprising:

and the prompting module is used for generating prompting information under the condition that the language switching is successful.

6. A voice interaction method for switching languages by one key is characterized by comprising the following steps:

receiving language switching information, wherein the language switching information comprises a first target language and a second target language;

unloading the voice recognition engine corresponding to the first target language, and calling the voice recognition engine corresponding to the second target language to generate a recognition result corresponding to the second target language;

switching a translation mode according to the language switching information, and translating the recognition result corresponding to the second target language to generate a translation result; and the number of the first and second groups,

and unloading the voice generating engine corresponding to the first target language, and calling the voice generating engine corresponding to the second target language to generate the audio information corresponding to the second target language.

7. The method for voice interaction with language switching by one key according to claim 6, further comprising:

and analyzing corresponding semantic information according to the translation result, wherein the semantic information comprises keywords contained in the translation result.

8. The method for voice interaction with language switching by one key according to claim 6, further comprising:

and calling language display modes according to the language switching information, wherein the language display modes correspond to all target languages in the language switching information one by one.

9. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the one-click language-switching voice interaction method according to claims 6-8.

10. A readable storage medium, on which a program or instructions are stored, which when executed by a processor, implement the steps of the one-touch language-switching voice interaction method according to claims 6-8.