CN110457716B

CN110457716B - Voice output method and mobile terminal

Info

Publication number: CN110457716B
Application number: CN201910660382.3A
Authority: CN
Inventors: 蔡展望
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2019-07-22
Filing date: 2019-07-22
Publication date: 2023-06-06
Anticipated expiration: 2039-07-22
Also published as: CN110457716A

Abstract

The embodiment of the invention provides a voice output method and a mobile terminal, wherein the mobile terminal is in communication connection with a headset, the mobile terminal is a voice input end, the headset is a voice output end, or the headset is a voice input end, and the mobile terminal is a voice output end; the method comprises the following steps: acquiring first voice data of a user from a voice input end, and determining a target translation language corresponding to the first voice data; translating the first voice data according to the target translation language to obtain second voice data; and playing the second voice data through the voice output terminal. Through the embodiment, when language communication is carried out among users in different languages, the efficiency of real-time communication of the users can be improved.

Description

Voice output method and mobile terminal

Technical Field

The present invention relates to the field of communications, and in particular, to a voice output method and a mobile terminal.

Background

With the development of social and economic diversification, people have higher and higher frequencies of contacting foreigners in life and work, and under the condition of not communicating with the foreigners, the language spoken by the user is translated by means of translation software of some intelligent mobile terminals.

At present, when two users communicate with each other by using different languages, the same application program with a voice translation function is required to be installed on mobile terminals of the two users, one user inputs voice in the mobile terminal, the application program uploads voice data input by the user to a server for translation, and the server sends translated text information to the application program of the other user so as to realize communication among the users with different languages. Therefore, both users need to install the same software, and if one user is not installed, the translation effect cannot be achieved, so that the efficiency of real-time communication of the users is reduced.

Disclosure of Invention

The embodiment of the invention aims to provide a voice output method and a mobile terminal, which are used for improving the efficiency of real-time communication of users when the users in different languages communicate with each other.

In order to solve the technical problems, the embodiment of the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a voice output method, which is applied to a mobile terminal, where the mobile terminal is in communication connection with an earphone, the mobile terminal is a voice input end, the earphone is a voice output end, or the earphone is a voice input end, and the mobile terminal is a voice output end; the method comprises the following steps:

acquiring first voice data of a user from a voice input end, and determining a target translation language corresponding to the first voice data;

translating the first voice data according to the target translation language to obtain second voice data;

and playing the second voice data through the voice output terminal.

In a second aspect, an embodiment of the present invention provides a mobile terminal, where the mobile terminal is in communication connection with an earphone, and the mobile terminal is a voice input end, and the earphone is a voice output end, or the earphone is a voice input end, and the mobile terminal is a voice output end; comprising the following steps:

the voice data acquisition module is used for acquiring first voice data of a user from a voice input end and determining a target translation language corresponding to the first voice data;

the target voice translation module is used for translating the first voice data according to the target translation language to obtain second voice data;

and the voice data playing module is used for playing the second voice data through the voice output end.

In a third aspect, an embodiment of the present invention provides a mobile terminal, including: a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the speech output method as described in the first aspect above.

In a fourth aspect, embodiments of the present invention provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the speech output method according to the first aspect described above.

In the embodiment of the invention, the mobile terminal is a voice input end, the earphone is a voice output end, or the earphone is a voice input end, the mobile terminal is a voice output end, first voice data of a user are obtained from the voice input end, and a target translation language corresponding to the first voice data is determined; translating the first voice data according to the target translation language to obtain second voice data; and playing the second voice data through the voice output terminal. In this embodiment, the same translation software is not required to be installed on the mobile terminals of the two users, and the translation of the first voice data and the playing of the translated voice data can be completed through the mobile terminal of one user, so that the efficiency of real-time communication of the users is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a voice output method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of module composition of a mobile terminal according to an embodiment of the present invention;

fig. 3 is a schematic hardware structure of a mobile terminal according to an embodiment of the present invention.

Detailed Description

In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, shall fall within the scope of the invention.

The embodiment of the invention provides a voice output method, a mobile terminal and a computer readable storage medium. The voice data transmission method is applied to the mobile terminal, can be executed by the mobile terminal, and comprises intelligent terminals such as mobile phones, tablet computers, wearable devices and the like.

Fig. 1 is a flow chart of a voice output method according to an embodiment of the present invention, which is applied to a mobile terminal, and the mobile terminal is in communication connection with an earphone, wherein the mobile terminal is a voice input end, the earphone is a voice output end, or the earphone is a voice input end, and the mobile terminal is a voice output end, as shown in fig. 1, and the method includes the following steps:

s102, acquiring first voice data of a user from a voice input end, and determining a target translation language corresponding to the first voice data;

s104, translating the first voice data according to the target translation language to obtain second voice data;

s106, playing the second voice data through the voice output terminal.

In the step S102, the mobile terminal obtains the first voice data of the user from the voice input end, and determines the target translation language corresponding to the first voice data, where obtaining the first voice data of the user from the voice input end includes:

(S1) acquiring first voice data of a user, and taking equipment for inputting the first voice data in the mobile terminal and the earphone as a voice input end;

(S2) using the other device of the mobile terminal and the earphone as a voice output terminal.

In the above actions (S1) to (S2), when the first voice data of the user is acquired, the mobile terminal may be used as a voice input terminal, the earphone may be used as a voice output terminal, under which condition the mobile terminal is used as the voice input terminal to receive the voice data input by the user, the earphone is used as the voice output terminal to play the voice data, or the mobile terminal is used as the voice input terminal, the mobile terminal is used as the voice output terminal, under which condition the earphone is used as the voice input terminal to receive the voice data input by the user, and the mobile terminal is used as the voice output terminal to play the voice data. In a preferred embodiment, the headset is a wireless headset, such as a bluetooth headset.

In one embodiment, the pressure sensing button of the continuous touch bluetooth headset is set to a translation mode for starting the bluetooth headset twice, and the mobile terminal takes the bluetooth headset as a voice input end and takes the mobile terminal as a voice output end after the bluetooth headset starts the translation mode.

In the embodiment of the invention, when the mobile terminal is used as a voice input end and the earphone is used as a voice output end, the mobile terminal acquires first voice data of a user from a microphone of the mobile terminal, determines a target translation language to be translated for the first voice data, for example, the user needs to translate the first voice data into English, and determines that the English is the target translation language corresponding to the first voice data; when the earphone is used as a voice input end and the mobile terminal is used as a voice output end, the mobile terminal acquires first voice data input by a user from the earphone, determines a target translation language to be translated for the first voice data, for example, the user needs to translate the first voice data into a french, and determines that the french is the target translation language corresponding to the first voice data.

In step S104, the mobile terminal translates the first voice data according to the target translation language to obtain the second voice data. In this embodiment, the mobile terminal is pre-installed with translation software for implementing a multi-language translation function, such as "have a dictionary", "Jinshan dictionary", and the like, and in one embodiment, the target translation language is english, and the first speech data is "Hello" speech data, then the mobile terminal translates "Hello" to obtain "Hello" speech data.

In step S106, the mobile terminal plays the second voice data through the voice output terminal. When the voice output end is a mobile terminal, the mobile terminal plays the second voice data through the loudspeaker of the mobile terminal. When the voice output end is the earphone, the mobile terminal sends the second voice data to the earphone connected with the mobile terminal, the second voice data is played through the loudspeaker of the earphone, in one embodiment, when the voice output end is the earphone, before the second voice data is played through the earphone, after the mobile terminal translates to obtain the second voice data, the mobile terminal encodes the second voice data according to a preset encoding rule to obtain encoded second voice data, the mobile terminal sends the encoded second voice data to the earphone, the voice decoding module of the earphone decodes the encoded second voice data, and sends the decoded second voice data to the loudspeaker of the earphone, and the loudspeaker of the earphone plays the second voice data.

In the embodiment of the present invention, translating the first voice data according to the target translation language to obtain the second voice data includes:

(a1) Converting the first voice data into a first text according to the language to which the first voice data belongs;

(a2) Translating the first text according to the target translation language to obtain a second text;

(a3) And converting the second text into second voice data corresponding to the target translation language.

In the above actions (a 1) and (a 2), the mobile terminal converts the first voice data into the first text according to the language to which the first voice data belongs, and the mobile terminal translates the first text according to the target translation language to obtain the second text. For example, the voice input end receives voice data of Chinese thank you, converts the voice data into text data of Chinese language "thank you", and translates the text data of "thank you" into text data of "thank you" if the target translation language is english.

In the above-mentioned action (a 3), the mobile terminal converts the second text into second voice data corresponding to the target translation language. In this embodiment, the mobile terminal converts the second text into the first voice data corresponding to the target translation language through a text-to-speech (TTS) module, which may be a TTS (text-to-speech) module, or other text-to-speech modules, which is not particularly limited herein.

In the embodiment of the invention, the method further comprises the following steps:

(b1) Determining a first distance between the mobile terminal and the user, and determining a second distance between the earphone and the user;

(b2) If the first distance is greater than the second distance, the earphone is used as a voice input end, the mobile terminal is used as a voice output end, and otherwise, the mobile terminal is used as the voice input end, and the earphone is used as the voice output end.

In the above actions (b 1) and (b 2), a first distance between the mobile terminal and the user is determined, and a second distance between the earphone and the user is determined, if the first distance is greater than the second distance, the earphone is used as the voice input end, the mobile terminal is used as the voice output end, otherwise, the mobile terminal is used as the voice input end, and the earphone is used as the voice output end. For example, the mobile terminal is located on the hand of the user a, a first distance between the mobile terminal and the user a is determined to be 0, the earphone is located on the user B2 m away from the user a, a second distance is determined to be 2, and if the second distance is determined to be greater than the first distance, the mobile terminal is used as a voice input end, and the earphone is used as a voice output end.

In a specific embodiment, the mobile terminal and the earphone are both located on the user, that is, the first distance is equal to the second distance, the earphone may be used as a voice input end, the mobile terminal may be used as a voice output end, or the mobile terminal may be used as a voice input end, and the earphone may be used as a voice output end.

In the embodiment of the invention, determining the target translation language corresponding to the first voice data comprises the following steps: and receiving a language setting instruction of the user, and determining a target translation language corresponding to the first voice data according to the language setting instruction. The mobile terminal receives the language setting instruction of the user, determines the target translation language corresponding to the first voice data according to the language setting instruction, for example, the mobile terminal receives the English setting instruction of the user, and determines that the target translation language corresponding to the first voice data is English.

In the embodiment of the invention, determining the target translation language corresponding to the first voice data comprises the following steps:

(c1) And obtaining the geographic position information of the mobile terminal, and determining the target translation language corresponding to the first voice data according to the geographic position information.

In the above-mentioned action (c 1), the mobile terminal obtains the geographic location information thereof, and determines the target translation language corresponding to the first voice data according to the geographic location information, for example, the mobile terminal obtains the geographic location information of "us, washington", and the language corresponding to the geographic location is english, and determines the english as the target translation language corresponding to the first voice data.

In the embodiment of the present invention, when the mobile terminal is a voice input terminal and the earphone is a voice output terminal, after obtaining the first voice data of the user from the voice input terminal, the method further includes:

(d1) Noise reduction processing is carried out on the first voice data through a first processor in the mobile terminal;

or,

when the earphone is a voice input end and the mobile terminal is a voice output end, acquiring first voice data of a user from the voice input end, including:

(d2) And acquiring first voice data of the user from the earphone, wherein the first voice data is voice data obtained by performing noise reduction processing on original voice data input by the user by a second processor in the earphone.

In the above-mentioned operation (d 1), when the mobile terminal is a voice input terminal and the earphone is a voice output terminal, after the first voice data of the user is obtained from the voice input terminal, the first voice data is subjected to noise reduction processing by a first processor in the mobile terminal, where the first processor in the mobile terminal may be a DSP (digital signal processor ) or other processors capable of reducing noise of the voice data, and the present invention is not limited thereto.

In the above-mentioned action (d 2), when the earphone is a voice input end and the mobile terminal is a voice output end, the mobile terminal obtains the first voice data of the user from the earphone, where the first voice data is the voice data obtained after the second processor in the earphone performs the noise reduction processing on the original voice data input by the user, where the second processor in the earphone may be a DSP or other processors capable of performing noise reduction on the voice data, and the present invention is not limited in particular.

In one embodiment, the earphone is a voice input end, the mobile terminal is a voice output end, the earphone receives original voice data input by a user, performs noise reduction and coding processing on the original voice data to obtain first voice data, and sends the first voice data to the mobile terminal. After the mobile terminal acquires the first voice data from the earphone, the mobile terminal firstly decodes the first voice data to obtain decoded first voice data.

Fig. 2 is a schematic diagram of module components of a mobile terminal according to an embodiment of the present invention, where the mobile terminal is in communication connection with an earphone, the mobile terminal is a voice input end, the earphone is a voice output end, or the earphone is a voice input end, and the mobile terminal is a voice output end, as shown in fig. 2, where the mobile terminal includes:

a voice data obtaining module 21, configured to obtain first voice data of a user from a voice input end, and determine a target translation language corresponding to the first voice data;

a target speech translation module 22, configured to translate the first speech data according to the target translation language, to obtain second speech data;

the voice data playing module 23 is configured to play the second voice data through a voice output terminal.

Optionally, the target speech translation module 22 is specifically configured to:

according to the language of the first voice data, converting the first voice data into a first text;

translating the first text according to the target translation language to obtain a second text;

and converting the second text into second voice data corresponding to the target translation language.

Optionally, the voice data acquisition module 21 is specifically configured to:

acquiring first voice data of a user, and taking equipment used for inputting the first voice data in the mobile terminal and the earphone as the voice input end;

and taking the other device of the mobile terminal and the earphone as the voice output end.

Optionally, the mobile terminal further includes:

a distance determining module, configured to determine a first distance between the mobile terminal and a user, and determine a second distance between the earphone and the user;

and the port setting module is used for taking the earphone as the voice input end and the mobile terminal as the voice output end if the first distance is larger than the second distance, otherwise taking the mobile terminal as the voice input end and the earphone as the voice output end.

Optionally, the voice data acquisition module 21 is specifically configured to:

and obtaining the geographic position information of the mobile terminal, and determining a target translation language corresponding to the first voice data according to the geographic position information.

Optionally, when the mobile terminal is a voice input terminal and the wireless earphone is a voice output terminal, the method further includes:

and the first unit is used for carrying out noise reduction processing on the first voice data by a first processor in the mobile terminal after acquiring the first voice data of the user from the voice input end.

Or,

when the wireless earphone is a voice input terminal and the mobile terminal is a voice output terminal, the voice data acquisition module 21 includes:

and the second unit is used for acquiring first voice data of the user from the earphone, wherein the first voice data is voice data obtained by performing noise reduction processing on original voice data input by the user by a second processor in the earphone.

The mobile terminal in this embodiment can implement each process implemented in the foregoing embodiment of the voice output method, and achieve the same effect, which is not repeated here.

Fig. 3 is a schematic hardware structure of a mobile terminal according to an embodiment of the present invention, and as shown in fig. 3, the mobile terminal 800 includes, but is not limited to: radio frequency unit 801, network module 802, audio output unit 803, input unit 804, sensor 805, display unit 806, user input unit 807, interface unit 808, memory 809, processor 810, and power supply 811. Those skilled in the art will appreciate that the mobile terminal structure shown in fig. 3 is not limiting of the mobile terminal and that the mobile terminal may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. In the embodiment of the invention, the mobile terminal comprises, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer and the like.

The memory 809 stores a computer program that, when executed by the processor 810, can implement the following procedures:

and playing the second voice data through the voice output terminal.

Optionally, when the computer program is executed by the processor 810, translating the first voice data according to the target translation language to obtain second voice data, including:

Optionally, the computer program is executed by the processor 810,

acquiring first voice data of a user from a voice input end, wherein the first voice data comprises:

the method further comprises the steps of: and taking the other device of the mobile terminal and the earphone as the voice output end.

Optionally, when executed by the processor 810, the computer program further comprises:

determining a first distance between the mobile terminal and a user, and determining a second distance between the earphone and the user;

and if the first distance is larger than the second distance, taking the earphone as the voice input end, taking the mobile terminal as the voice output end, otherwise, taking the mobile terminal as the voice input end and taking the earphone as the voice output end.

Optionally, when the computer program is executed by the processor 810, determining the target translation language corresponding to the first voice data includes:

Optionally, when the mobile terminal is a voice input terminal and the earphone is a voice output terminal and the computer program is executed by the processor 810, after obtaining the first voice data of the user from the voice input terminal, the computer program further includes:

noise reduction processing is carried out on the first voice data through a first processor in the mobile terminal;

or,

when the earphone is a voice input end and the mobile terminal is a voice output end, acquiring first voice data of a user from the voice input end, including: and acquiring first voice data of a user from the earphone, wherein the first voice data is voice data obtained by performing noise reduction processing on original voice data input by the user by a second processor in the earphone.

It should be understood that, in the embodiment of the present invention, the radio frequency unit 801 may be used for receiving and transmitting signals during the process of receiving and transmitting information or communication, specifically, receiving downlink data from a base station, and then processing the received downlink data by the processor 810; and, the uplink data is transmitted to the base station. In general, the radio frequency unit 801 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 801 may also communicate with networks and other devices through a wireless communication system.

The mobile terminal provides wireless broadband internet access to the user through the network module 802, such as helping the user to send and receive e-mail, browse web pages, access streaming media, etc.

The audio output unit 803 may convert audio data received by the radio frequency unit 801 or the network module 802 or stored in the memory 809 into an audio signal and output as sound. Also, the audio output unit 803 may also provide audio output (e.g., a call signal reception sound, a message reception sound, etc.) related to a specific function performed by the mobile terminal 800. The audio output unit 803 includes a speaker, a buzzer, a receiver, and the like.

The input unit 804 is used for receiving an audio or video signal. The input unit 804 may include a graphics processor (Graphics Processing Unit, GPU) 8041 and a microphone 8042, the graphics processor 8041 processing image data of still pictures or video obtained by an image capturing apparatus (such as a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 806. The image frames processed by the graphics processor 8041 may be stored in the memory 809 (or other storage medium) or transmitted via the radio frequency unit 801 or the network module 802. The microphone 8042 can receive sound, and can process such sound into audio data. The processed audio data may be converted into a format output that can be transmitted to the mobile communication base station via the radio frequency unit 801 in case of a telephone call mode.

The mobile terminal 800 also includes at least one sensor 805, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor and a proximity sensor, wherein the ambient light sensor can adjust the brightness of the display panel 8061 according to the brightness of ambient light, and the proximity sensor can turn off the display panel 8061 and/or the backlight when the mobile terminal 800 moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for recognizing the gesture of the mobile terminal (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; the sensor 805 may also include a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, etc., which are not described herein.

The display unit 806 is used to display information input by a user or information provided to the user. The display unit 806 may include a display panel 8061, and the display panel 8061 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 807 is operable to receive input numeric or character information and to generate key signal inputs related to user settings and function controls of the mobile terminal. In particular, the user input unit 807 includes a touch panel 8071 and other input devices 8072. Touch panel 8071, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on touch panel 8071 or thereabout using any suitable object or accessory such as a finger, stylus, etc.). The touch panel 8071 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts it into touch point coordinates, sends the touch point coordinates to the processor 810, and receives and executes commands sent from the processor 810. In addition, the touch panel 8071 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 8071, the user input unit 807 can include other input devices 8072. In particular, other input devices 8072 may include, but are not limited to, physical keyboards, function keys (e.g., volume control keys, switch keys, etc.), trackballs, mice, joysticks, and so forth, which are not described in detail herein.

Further, the touch panel 8071 may be overlaid on the display panel 8061, and when the touch panel 8071 detects a touch operation thereon or thereabout, the touch operation is transmitted to the processor 810 to determine a type of touch event, and then the processor 810 provides a corresponding visual output on the display panel 8061 according to the type of touch event. Although the touch panel 8071 and the display panel 8061 are two independent components for implementing the input and output functions of the mobile terminal, in some embodiments, the touch panel 8071 and the display panel 8061 may be integrated to implement the input and output functions of the mobile terminal, which is not limited herein.

The interface unit 808 is an interface through which an external device is connected to the mobile terminal 800. For example, the external devices may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 808 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the mobile terminal 800 or may be used to transmit data between the mobile terminal 800 and an external device.

The memory 809 can be used to store software programs as well as various data. The memory 809 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory 809 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 810 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by running or executing software programs and/or modules stored in the memory 809 and calling data stored in the memory 809, thereby performing overall monitoring of the mobile terminal. The processor 810 may include one or more processing units; preferably, the processor 810 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 810.

The mobile terminal 800 may also include a power supply 811 (e.g., a battery) for powering the various components, and preferably the power supply 811 may be logically connected to the processor 810 through a power management system that can perform functions such as managing charge, discharge, and power consumption.

In addition, the mobile terminal 800 includes some functional modules, which are not shown, and will not be described herein.

Preferably, the embodiment of the present invention further provides a mobile terminal, including a processor, a memory, and a computer program stored in the memory and capable of running on the processor, where the computer program when executed by the processor implements each process of the above embodiment of the voice data transmission method of the mobile terminal, and the same technical effects can be achieved, and for avoiding repetition, a detailed description is omitted herein.

Further, the embodiment of the present invention further provides a computer readable storage medium, on which a computer program is stored, where the computer program when executed by a processor implements each process of the foregoing method embodiment for voice data transmission of a mobile terminal, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the above-mentioned embodiments of the present invention.

The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.

Claims

1. The voice output method is characterized by being applied to a mobile terminal, wherein the mobile terminal is in communication connection with an earphone, the mobile terminal is a voice input end, the earphone is a voice output end, or the earphone is a voice input end, and the mobile terminal is a voice output end; the method comprises the following steps:

playing the second voice data through a voice output end;

the method further comprises the steps of:

2. The method of claim 1, wherein translating the first speech data according to the target translation language to obtain second speech data comprises:

3. The method according to claim 1 or 2, wherein obtaining first speech data of the user from the speech input comprises:

4. The method of claim 1 or 2, wherein determining the target translation language corresponding to the first speech data comprises:

5. The method of claim 1, wherein the step of determining the position of the substrate comprises,

when the mobile terminal is a voice input end and the earphone is a voice output end, after the first voice data of the user is obtained from the voice input end, the method further comprises the steps of: noise reduction processing is carried out on the first voice data through a first processor in the mobile terminal;

or,

6. The mobile terminal is characterized by being in communication connection with an earphone, wherein the mobile terminal is a voice input end, the earphone is a voice output end, or the earphone is a voice input end, and the mobile terminal is a voice output end; the mobile terminal further includes:

the voice data playing module is used for playing the second voice data through the voice output end;

the mobile terminal further includes:

7. The mobile terminal of claim 6, wherein the target speech translation module is specifically configured to:

8. The mobile terminal according to any one of claims 6 and 7, wherein the voice data acquisition module is specifically configured to:

9. The mobile terminal according to any one of claims 6 and 7, wherein the voice data acquisition module is specifically configured to:

10. The mobile terminal of claim 6, wherein the mobile terminal comprises a mobile terminal,

when the mobile terminal is a voice input end and the wireless earphone is a voice output end, the method further comprises:

a first unit for performing noise reduction processing on first voice data of a user through a first processor in the mobile terminal after the first voice data is acquired from a voice input end,

or,

when the wireless earphone is a voice input end and the mobile terminal is a voice output end, the voice data acquisition module comprises:

11. A mobile terminal, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the speech output method according to any one of claims 1 to 5.