CN113053389A - Voice interaction system and method for switching languages by one key and electronic equipment - Google Patents

Voice interaction system and method for switching languages by one key and electronic equipment Download PDF

Info

Publication number
CN113053389A
CN113053389A CN202110268209.6A CN202110268209A CN113053389A CN 113053389 A CN113053389 A CN 113053389A CN 202110268209 A CN202110268209 A CN 202110268209A CN 113053389 A CN113053389 A CN 113053389A
Authority
CN
China
Prior art keywords
language
switching
target language
voice
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110268209.6A
Other languages
Chinese (zh)
Inventor
李旭滨
陈晓松
陈吉胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Xiamen Yunzhixin Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202110268209.6A priority Critical patent/CN113053389A/en
Publication of CN113053389A publication Critical patent/CN113053389A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning

Abstract

The application discloses a voice interaction system and method for switching languages by one key and electronic equipment, and belongs to the field of voice interaction equipment. The system comprises: the switching information receiving module is used for receiving language switching information, and the language switching information comprises a first target language and a second target language; the voice recognition module is used for unloading the voice recognition engine corresponding to the first target language and calling the voice recognition engine corresponding to the second target language to generate a recognition result corresponding to the second target language; the translation module is used for switching a translation mode according to the language switching information and translating the recognition result corresponding to the second target language to generate a translation result; and the voice generating module is used for unloading the voice generating engine corresponding to the first target language and calling the voice generating engine corresponding to the second target language to generate the audio information corresponding to the second target language.

Description

Voice interaction system and method for switching languages by one key and electronic equipment
Technical Field
The application belongs to the field of voice interaction equipment, and particularly relates to a voice interaction system and method for switching languages by one key and electronic equipment.
Background
In an economically developed city, intelligent interaction equipment is needed to be used on many occasions to realize communication among people in different languages, so as to meet the requirements of different crowds, for example, a high-grade hotel located in hong Kong may need to receive guests who speak Mandarin, Cantonese, English or Japanese. This requires intelligent interaction devices to enable multi-language switching.
However, the existing voice recognition device usually only supports waking up and recognizing in one language, even if multi-language switching is supported, the switching can be realized only by manually modifying configuration and restarting the device in a common mode, and the mode of realizing language switching by modifying configuration and restarting is not flexible and real-time enough to meet the requirement of instant communication of users.
Content of application
The embodiment of the application aims to provide a voice interaction system and method for switching languages by one key and electronic equipment, and can solve the problems that the existing voice interaction equipment is not flexible enough and inconvenient to operate.
In order to solve the technical problem, the present application is implemented as follows:
in a first aspect, an embodiment of the present application provides a voice interaction system for switching languages by one key, where the system includes: the switching information receiving module is used for receiving language switching information, and the language switching information comprises a first target language and a second target language; the voice recognition module is used for unloading the voice recognition engine corresponding to the first target language and calling the voice recognition engine corresponding to the second target language to generate a recognition result corresponding to the second target language; the translation module is used for switching a translation mode according to the language switching information and translating the recognition result corresponding to the second target language to generate a translation result; and the voice generating module is used for unloading the voice generating engine corresponding to the first target language and calling the voice generating engine corresponding to the second target language to generate the audio information corresponding to the second target language.
Further, the speech recognition module is further configured to: receiving voice data corresponding to the second target language; generating a character recognition result corresponding to the voice data; and sending the character recognition result to a translation module.
Further, the translation module is further to: receiving the character recognition result; and generating a translation result corresponding to the first target language according to the character recognition result.
Further, the system further comprises: the semantic analysis module is used for analyzing corresponding semantic information according to the translation result; and the display switching module is used for calling a corresponding language display mode according to the language switching information.
Further, the system further comprises: and the prompting module is used for generating prompting information under the condition that the language switching is successful.
In a second aspect, an embodiment of the present application provides a voice interaction method for switching languages by one key, where the method includes: receiving language switching information, wherein the language switching information comprises a first target language and a second target language; unloading the voice recognition engine corresponding to the first target language, and calling the voice recognition engine corresponding to the second target language to generate a recognition result corresponding to the second target language; switching a translation mode according to the language switching information, and translating the recognition result corresponding to the second target language to generate a translation result; and unloading the voice generating engine corresponding to the first target language, and calling the voice generating engine corresponding to the second target language to generate the audio information corresponding to the second target language.
Further, the method further comprises: and analyzing corresponding semantic information according to the translation result, wherein the semantic information comprises keywords contained in the translation result.
Further, the method further comprises: and calling language display modes according to the language switching information, wherein the language display modes correspond to all target languages in the language switching information one by one.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the second aspect.
In a fourth aspect, the present application provides a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the second aspect.
In the embodiment of the application, a voice interaction system for switching languages by one key is provided, and the language switching information can be sent to a voice recognition module, a translation module and a voice generation module at the same time, the voice recognition module and the voice generation module call an engine corresponding to the language switching information to execute a voice recognition function or generate voice, so that the switching of languages can be realized without resetting system configuration and restarting equipment by a user, and the requirement of the user for quickly and flexibly switching languages is met.
Drawings
FIG. 1 is a schematic structural diagram of a voice interaction system for switching languages by one key according to the present embodiment;
FIG. 2 is a schematic diagram of engine invocation of a one-key language-switching speech interaction system according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a voice interaction method for switching languages by one key according to the embodiment.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
The following describes in detail a voice interaction system, a method and an electronic device for switching languages by one key according to an embodiment of the present application with reference to the accompanying drawings.
The present embodiment provides a voice interaction system for switching languages by one key, and referring to fig. 1, the system includes:
and the switching information receiving module 01 is connected to the voice recognition module, the translation module and the voice generation module, and is used for receiving language switching information, wherein the language switching information comprises a first target language and a second target language.
For example, the language switching information may be to switch a chinese mode to an english mode, and the first target language may be chinese and the second target language may be english.
The user can input the switching information through an operation interface (such as an operation interface of the voice interaction device) or a control key of the system of the embodiment. For example, if "english mode" is input through the touch screen, the system accepts the switching information and converts any current language mode into english mode.
The speech recognition module 02 is configured to unload the speech recognition engine corresponding to the first target language and call the speech recognition engine corresponding to the second target language to generate a recognition result corresponding to the second target language. In one possible example, the recognition mode at the current time is to recognize Chinese, and the speech recognition engine at this time is a Chinese recognition engine that recognizes the received Chinese speech. When language switching information for switching the Chinese mode into the English mode is received, namely the first target language is Chinese and the second target language is English, the voice recognition engine needs to be replaced, then the voice recognition module unloads the Chinese recognition engine, calls the English recognition engine and starts the English recognition engine, so that the English recognition engine is used for recognizing English voice data and generating a recognition result, wherein the recognition result is English.
The translation module 03 is configured to switch a translation mode according to the language switching information, and translate the recognition result corresponding to the second target language to generate a translation result. Specifically, since the speech recognition module only recognizes the information of the second target language, and the interaction is two-way, the above example is continued, when an english user and a chinese user interact with each other, two languages of interaction will be inevitably performed, that is, english is translated into chinese, or chinese is translated into english, and therefore, the second target language needs to be translated.
The speech generating module 04 is configured to unload the speech generating engine corresponding to the first target language, and call the speech generating engine corresponding to the second target language to generate the audio information corresponding to the second target language. The speech generating engine may be a tts (text To speech) engine, that is, from text To speech, the received text information may be sent out in a speech manner, thereby implementing communication with the interactive object.
In addition, after the language switching is completed, the speech recognition module of this embodiment is further configured to: receiving voice data corresponding to the second target language; generating a character recognition result corresponding to the voice data; and sending the character recognition result to a translation module.
Further, the translation module is further configured to receive the text recognition result, and generate a translation result corresponding to the first target language according to the text recognition result.
The system of this embodiment further comprises: the semantic analysis module 05 is used for analyzing corresponding semantic information according to the translation result, for example, the translation result is that a cup of water is wanted by oneself, then the semantic analysis module can extract main information water so that a user can quickly know the intention of the other side, and can also be linked with other devices according to the semantic information, if the semantic information is water, the system is linked with an intelligent water delivery robot, so that the instant water delivery effect is realized, and the semantic analysis module has strong expansibility.
The system of this embodiment further comprises: and the display switching module 06 is configured to invoke a corresponding language display mode according to the language switching information. The language display mode may be a display skin corresponding to any language, for example, when the language switching information is chinese, the display mode is chinese, that is, the display skin is chinese skin.
The system of this embodiment further comprises: and the prompting module 07 is connected with the display switching module and is used for generating prompting information under the condition that the language switching is successful, for example, after the language is successfully switched into Chinese, voice information of 'the system is successfully switched into a Chinese mode' is generated through voice interaction equipment.
The voice interaction system capable of switching languages by one key provided by the embodiment can send language switching information to the voice recognition module, the translation module and the voice generation module, the voice recognition module and the voice generation module call the engine corresponding to the language switching information to execute a voice recognition function or generate voice, the switching of languages can be realized without resetting system configuration and restarting equipment by a user, and the requirement of the user for switching languages rapidly and flexibly is met.
In a specific example, the process of the method is described by taking the example of switching from Chinese to English:
after a user presses a language switching key of the voice interaction equipment, the system respectively sends language switching information to the voice recognition module, the translation module, the voice generation module and the display switching module by using a key callback function; after receiving the language switching information, the voice recognition module stops the current recognition process, resets and unloads the Chinese recognition engine, and loads and starts the English recognition engine; after receiving the language switching information, the translation module switches the translation mode into an English translation mode, and all input identification contents are translated from English to Chinese mandarin; after receiving the language switching information, the speech generation module stops the current speech generation process, resets and unloads the Chinese TTS engine, and loads and starts the English TTS engine; the display switching module switches to English theme skin; and the prompting module prompts that the current English language is successfully switched to by voice and UI.
The embodiment also provides a voice interaction method for switching languages by one key, and referring to fig. 2, the method includes:
step S1: receiving language switching information, wherein the language switching information comprises a first target language and a second target language;
the language switching information may be to switch the chinese mode to the english mode, and then the first target language may be chinese and the second target language may be english.
The user can input the switching information through an operation interface (such as an operation interface of the voice interaction device) or a control key of the system of the embodiment. For example, if "english mode" is input through the touch screen, the system accepts the switching information and converts any current language mode into english mode.
Step S2: unloading the voice recognition engine corresponding to the first target language, and calling the voice recognition engine corresponding to the second target language to generate a recognition result corresponding to the second target language;
in one possible example, the recognition mode at the current time is to recognize Chinese, and the speech recognition engine at this time is a Chinese recognition engine that recognizes the received Chinese speech. When language switching information for switching the Chinese mode into the English mode is received, namely the first target language is Chinese and the second target language is English, the voice recognition engine needs to be replaced, then the voice recognition module unloads the Chinese recognition engine, calls the English recognition engine and starts the English recognition engine, so that the English recognition engine is used for recognizing English voice data and generating a recognition result, wherein the recognition result is English.
Step S3: switching a translation mode according to the language switching information, and translating the recognition result corresponding to the second target language to generate a translation result;
since the speech recognition module only recognizes the information of the second target language, and the interaction is two-way, the above example is continued, when an english user and a chinese user interact with each other, two languages of interaction will be necessarily performed, that is, english is translated into chinese, or chinese is translated into english, and therefore, the second target language needs to be translated.
Step S4: and unloading the voice generating engine corresponding to the first target language, and calling the voice generating engine corresponding to the second target language to generate the audio information corresponding to the second target language.
The speech generating engine may be a tts (text To speech) engine, that is, from text To speech, the received text information may be sent out in a speech manner, thereby implementing communication with the interactive object.
The method further comprises the following steps: and analyzing corresponding semantic information according to the translation result, wherein the semantic information comprises keywords contained in the translation result. For example, if the translation result is that "i want a cup of water", then the semantic analysis module can extract main information "water" so that the user can know the intention of the other party quickly, and can also perform linkage with other devices according to the semantic information, for example, when the semantic information is "water", the system is linked with the intelligent water delivery robot, so that an instant water delivery effect is realized, and the system has strong expansibility.
In addition, the embodiment may also call up a language display mode according to the language switching information, where the language display mode corresponds to all target languages in the language switching information one to one, that is, each language has a corresponding skin.
The voice interaction method for switching languages by one key provided by this embodiment can send language switching information to the voice recognition module, the translation module and the voice generation module at the same time, and the voice recognition module and the voice generation module call an engine corresponding to the language switching information to execute a voice recognition function or generate voice, so that switching of languages can be realized without resetting system configuration and restarting equipment by a user, and the requirement of the user for switching languages rapidly and flexibly is met.
The embodiment also provides an electronic device, which comprises a processor, a memory and a program or an instruction stored on the memory and capable of running on the processor, wherein the program or the instruction realizes the step of the voice interaction method for switching languages by one key when being executed by the processor.
The embodiment further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and the program or the instruction, when executed by the processor, implements the step of the voice interaction method for switching languages by one key, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A voice interactive system for switching languages by one key, the system comprising:
the switching information receiving module is used for receiving language switching information, and the language switching information comprises a first target language and a second target language;
the voice recognition module is used for unloading the voice recognition engine corresponding to the first target language and calling the voice recognition engine corresponding to the second target language to generate a recognition result corresponding to the second target language;
the translation module is used for switching a translation mode according to the language switching information and translating the recognition result corresponding to the second target language to generate a translation result;
and the voice generating module is used for unloading the voice generating engine corresponding to the first target language and calling the voice generating engine corresponding to the second target language to generate the audio information corresponding to the second target language.
2. The system according to claim 1, wherein the speech recognition module is further configured to:
receiving voice data corresponding to the second target language;
generating a character recognition result corresponding to the voice data;
and sending the character recognition result to a translation module.
3. The system according to claim 2, wherein the translation module is further configured to:
receiving the character recognition result;
and generating a translation result corresponding to the first target language according to the character recognition result.
4. The system of claim 1, further comprising:
the semantic analysis module is used for analyzing corresponding semantic information according to the translation result;
and the display switching module is used for calling a corresponding language display mode according to the language switching information.
5. The system of claim 1, further comprising:
and the prompting module is used for generating prompting information under the condition that the language switching is successful.
6. A voice interaction method for switching languages by one key is characterized by comprising the following steps:
receiving language switching information, wherein the language switching information comprises a first target language and a second target language;
unloading the voice recognition engine corresponding to the first target language, and calling the voice recognition engine corresponding to the second target language to generate a recognition result corresponding to the second target language;
switching a translation mode according to the language switching information, and translating the recognition result corresponding to the second target language to generate a translation result; and the number of the first and second groups,
and unloading the voice generating engine corresponding to the first target language, and calling the voice generating engine corresponding to the second target language to generate the audio information corresponding to the second target language.
7. The method for voice interaction with language switching by one key according to claim 6, further comprising:
and analyzing corresponding semantic information according to the translation result, wherein the semantic information comprises keywords contained in the translation result.
8. The method for voice interaction with language switching by one key according to claim 6, further comprising:
and calling language display modes according to the language switching information, wherein the language display modes correspond to all target languages in the language switching information one by one.
9. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the one-click language-switching voice interaction method according to claims 6-8.
10. A readable storage medium, on which a program or instructions are stored, which when executed by a processor, implement the steps of the one-touch language-switching voice interaction method according to claims 6-8.
CN202110268209.6A 2021-03-12 2021-03-12 Voice interaction system and method for switching languages by one key and electronic equipment Pending CN113053389A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110268209.6A CN113053389A (en) 2021-03-12 2021-03-12 Voice interaction system and method for switching languages by one key and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110268209.6A CN113053389A (en) 2021-03-12 2021-03-12 Voice interaction system and method for switching languages by one key and electronic equipment

Publications (1)

Publication Number Publication Date
CN113053389A true CN113053389A (en) 2021-06-29

Family

ID=76511634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110268209.6A Pending CN113053389A (en) 2021-03-12 2021-03-12 Voice interaction system and method for switching languages by one key and electronic equipment

Country Status (1)

Country Link
CN (1) CN113053389A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108335692A (en) * 2018-03-21 2018-07-27 上海木爷机器人技术有限公司 A kind of method for switching languages, server and system
WO2019111346A1 (en) * 2017-12-06 2019-06-13 ソースネクスト株式会社 Full-duplex speech translation system, full-duplex speech translation method, and program
CN109949795A (en) * 2019-03-18 2019-06-28 北京猎户星空科技有限公司 A kind of method and device of control smart machine interaction
CN109977429A (en) * 2019-04-03 2019-07-05 新疆语视未来信息科技有限公司 A kind of information interacting method based on translation content instant playback
CN111325039A (en) * 2020-01-21 2020-06-23 陈刚 Language translation method, system, program and handheld terminal based on real-time call
CN111798836A (en) * 2020-08-03 2020-10-20 上海茂声智能科技有限公司 Method, device, system, equipment and storage medium for automatically switching languages

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019111346A1 (en) * 2017-12-06 2019-06-13 ソースネクスト株式会社 Full-duplex speech translation system, full-duplex speech translation method, and program
CN108335692A (en) * 2018-03-21 2018-07-27 上海木爷机器人技术有限公司 A kind of method for switching languages, server and system
CN109949795A (en) * 2019-03-18 2019-06-28 北京猎户星空科技有限公司 A kind of method and device of control smart machine interaction
CN109977429A (en) * 2019-04-03 2019-07-05 新疆语视未来信息科技有限公司 A kind of information interacting method based on translation content instant playback
CN111325039A (en) * 2020-01-21 2020-06-23 陈刚 Language translation method, system, program and handheld terminal based on real-time call
CN111798836A (en) * 2020-08-03 2020-10-20 上海茂声智能科技有限公司 Method, device, system, equipment and storage medium for automatically switching languages

Similar Documents

Publication Publication Date Title
CN101207656B (en) Method and system for switching between modalities in speech application environment
CN1333385C (en) Voice browser dialog enabler for a communication system
US8442563B2 (en) Automated text-based messaging interaction using natural language understanding technologies
US9807243B2 (en) Method and system for voice transmission control
JP2002125047A (en) Method and device for interpretation service
CN104575499B (en) Voice control method of mobile terminal and mobile terminal
CN101052069B (en) Translation method for voice conversation
CN103744836A (en) Man-machine conversation method and device
WO2015188454A1 (en) Method and device for quickly accessing ivr menu
JP2010026686A (en) Interactive communication terminal with integrative interface, and communication system using the same
JP3322262B2 (en) Wireless mobile terminal communication system
CN111554280A (en) Real-time interpretation service system for mixing interpretation contents using artificial intelligence and interpretation contents of interpretation experts
CN111681650A (en) Intelligent conference control method and device
CN104732982A (en) Method and device for recognizing voice in interactive voice response (IVR) service
US10984229B2 (en) Interactive sign language response system and method
CN113053389A (en) Voice interaction system and method for switching languages by one key and electronic equipment
JP2003316383A (en) Voice response system
CN113449197A (en) Information processing method, information processing apparatus, electronic device, and storage medium
CN108962246B (en) Voice control method, device and computer readable storage medium
CN112040326A (en) Bullet screen control method and system, television and storage medium
CN112133306A (en) Response method and device based on express delivery user and computer equipment
CN114493513B (en) Voice processing-based hotel management method and device and electronic equipment
CN117059082B (en) Outbound call conversation method, device, medium and computer equipment based on large model
JP2002297646A (en) System, method, and program for service
KR100574231B1 (en) Method For Telephone Interpretation In Intelligent Peripheral System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210629