CN113286217A

CN113286217A - Call voice translation method and device and earphone equipment

Info

Publication number: CN113286217A
Application number: CN202110443370.2A
Authority: CN
Inventors: 牛红霞; 张爽
Original assignee: Beijing Sogou Intelligent Technology Co Ltd
Current assignee: Beijing Sogou Intelligent Technology Co Ltd
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2021-08-20

Abstract

The embodiment of the invention provides a conversation voice translation method, a device and earphone equipment, wherein the method is applied to the earphone equipment and comprises the following steps: the method comprises the steps that in the process that the earphone equipment is used for communication, the earphone equipment obtains first communication voice data of a first user, wherein the first user is other communication participating users; playing the first call voice data; acquiring a first translation text corresponding to the first call voice data and displaying the first translation text; and acquiring first translation voice data corresponding to the first translation text, and playing the first translation voice data. Furthermore, during the communication process, special translation equipment is not needed; therefore, translation operation in a cross-language and cross-space telephone communication scene is simplified, and conversation voice translation efficiency is improved. And moreover, the voice of the call does not need to be played outside, information leakage can be avoided, and user experience is improved.

Description

Call voice translation method and device and earphone equipment

Technical Field

The invention relates to the technical field of data processing, in particular to a method and a device for translating call voice and earphone equipment.

Background

With the advancement of globalization, the communication between the business and the life of each country becomes more and more frequent; such as international trade, international conference, international travel, etc.

Because the languages used by each country/region are different, the languages become one of the main obstacles in the business and life communication process of each country; in order to solve the language barrier, the translation equipment is produced at the right moment; such as a translator, translation pen, etc.

In a cross-language and cross-space telephone communication scene, a user needs to start a hands-free mode; then adopt special translation equipment to pick up the sound and translate the conversation pronunciation, the operation is very loaded down with trivial details.

Disclosure of Invention

The embodiment of the invention provides a call voice translation method, which is used for simplifying translation operation in a cross-language and cross-space telephone communication scene and improving call voice translation efficiency.

Correspondingly, the embodiment of the invention also provides a call voice translation device and earphone equipment, which are used for ensuring the realization and application of the method.

In order to solve the above problem, an embodiment of the present invention discloses a method for translating a call voice, which is applied to an earphone device, and specifically includes: the method comprises the steps that in the process that the earphone equipment is used for communication, the earphone equipment obtains first communication voice data of a first user, wherein the first user is other communication participating users; playing the first call voice data; acquiring a first translation text corresponding to the first call voice data and displaying the first translation text; and acquiring first translation voice data corresponding to the first translation text, and playing the first translation voice data.

Optionally, the playing the first translation speech data includes: playing the first translation voice data while playing the first call voice data; or alternatively playing the first call voice data and the first translation voice data.

Optionally, the playing the first translation voice data while playing the first call voice data includes: and playing the first translation voice data at a volume larger than that of the first call voice data.

Optionally, the earphone device includes an earphone, and the playing the first call voice data includes: simultaneously playing the first call voice data in two earphones in the earphone; the playing the first translation voice data while playing the first call voice data includes: and playing the first conversation voice data in one earphone of the earphones, and playing the first translation voice data in the other earphone of the earphones.

Optionally, the method further comprises: and in the process that the earphone equipment is used for communication, identifying a speaker in the communication process in a display interface.

Optionally, the method further comprises: displaying a first language used by the first user on a display interface, and/or displaying a second language used by a second user on the display interface; wherein the second user is a user using the headset device.

Optionally, the method further comprises: and acquiring a first voice recognition text corresponding to the first call voice data and displaying the first voice recognition text on a display interface.

Optionally, the method further comprises: acquiring second communication voice data of a second user; sending the second communication voice data to an electronic device used by a first user; or acquiring second communication voice data of a second user and acquiring a second translation text corresponding to the second communication voice data; sending the second communication voice data and the second translation text to the electronic equipment used by the first user; or acquiring second communication voice data of a second user, acquiring a second translation text corresponding to the second communication voice data, and synthesizing second translation voice data based on the second translation text; sending the second communication voice data, the second translation text and the second translation voice data to an electronic device used by a first user; or acquiring second communication voice data of a second user and acquiring second translation voice data corresponding to the second communication voice data; sending the second communication voice data and the second translation voice data to an electronic device used by a first user; wherein the second user is a user using the headset device.

Optionally, the method further comprises: and if the second translation text is obtained, displaying the second translation text on a display interface.

Optionally, the method further comprises: acquiring a second voice recognition text corresponding to the second communication voice data; and displaying the second voice recognition text on a display interface.

Optionally, the obtaining of the first translation text corresponding to the first call voice data includes: sending a translation request to a server based on the first call voice data; receiving a first translation text returned by the server, wherein the first translation text is obtained by the server by translating the first call voice data in response to the translation request; and/or the earphone equipment translates the first call voice data locally to obtain a first translation text.

Optionally, the method further comprises: when the earphone equipment is used for communication, a communication voice processing function inlet is displayed on a display interface, wherein the communication voice processing function inlet comprises a communication voice translation function inlet; receiving a touch operation of a second user for the call voice translation function entry, and executing the step of acquiring a first translation text corresponding to the first call voice data; wherein the second user is a user using the headset device.

Optionally, the obtaining of the first translation text corresponding to the first call voice data includes: acquiring a second language used by a second user, wherein the second user is a user using the earphone device; and acquiring a first translation text corresponding to the first call voice data based on the second language used by the second user.

Optionally, the obtaining of the second language used by the second user includes at least one of: determining a second language used by the second user according to the system language of the earphone equipment; determining a second language used by the second user according to the language corresponding to the second communication voice data of the second user; and determining a second language used by the second user according to the language selected by the second user on the display interface.

Optionally, the earphone device comprises an earphone and an earphone receiving device connected with the earphone; the playing the first call voice data includes: the earphone plays the first call voice data; the obtaining a first translation text corresponding to the first call voice data and displaying the first translation text includes: the earphone storage device acquires a first translation text corresponding to the first call voice data and displays the first translation text on a display interface; the obtaining of the first translation voice data corresponding to the first translation text includes: the earphone storage device acquires first translation voice data corresponding to the first translation text; the playing the first translated speech data includes: the earphone plays the first translation voice data.

Optionally, the headset device comprises a headset housing means; the playing the first call voice data includes: the earphone accommodating device plays the first call voice data; the obtaining a first translation text corresponding to the first call voice data and displaying the first translation text includes: the earphone storage device acquires a first translation text corresponding to the first call voice data and displays the first translation text on a display interface; the obtaining of the first translation voice data corresponding to the first translation text includes: the earphone storage device acquires first translation voice data corresponding to the first translation text; the playing the first translated speech data includes: the earphone receiving device plays the first translation voice data.

The embodiment of the invention also discloses a device for translating the call voice, which is applied to the earphone equipment and specifically comprises the following steps: the first voice acquisition module is used for acquiring first call voice data of a first user by the earphone device in a call process of the earphone device, wherein the first user is a user participating in other calls; the first playing module is used for playing the first call voice data; the text acquisition module is used for acquiring a first translation text corresponding to the first call voice data and displaying the first translation text; the second voice acquisition module is used for acquiring first translation voice data corresponding to the first translation text; and the second playing module is used for playing the first translation voice data.

Optionally, the second playing module includes: the synchronous playing sub-module is used for playing the first translation voice data while playing the first communication voice data; and the alternate playing sub-module is used for alternately playing the first call voice data and the first translation voice data.

Optionally, the synchronous playing sub-module is configured to play the first translation voice data at a volume greater than that of the first call voice data.

Optionally, the first playing module is configured to play the first call voice data in two of the earphones at the same time; the synchronous playing sub-module is configured to play the first call voice data in one of the earphones, and play the first translation voice data in another one of the earphones.

Optionally, the call speech translation apparatus further includes: and the identification display module is used for identifying the speaker in the communication process in a display interface in the communication process of the earphone equipment.

Optionally, the call speech translation apparatus further includes: the language display module is used for displaying a first language used by the first user on a display interface and/or displaying a second language used by the second user on the display interface; wherein the second user is a user using the headset device.

Optionally, the call speech translation apparatus further includes: and the first recognition text display module is used for acquiring a first voice recognition text corresponding to the first call voice data and displaying the first voice recognition text on a display interface.

Optionally, the call speech translation apparatus further includes: the data transmission module is used for acquiring second communication voice data of a second user; sending the second communication voice data to an electronic device used by a first user; or acquiring second communication voice data of a second user and acquiring a second translation text corresponding to the second communication voice data; sending the second communication voice data and the second translation text to the electronic equipment used by the first user; or acquiring second communication voice data of a second user, acquiring a second translation text corresponding to the second communication voice data, and synthesizing second translation voice data based on the second translation text; sending the second communication voice data, the second translation text and the second translation voice data to an electronic device used by a first user; or acquiring second communication voice data of a second user and acquiring second translation voice data corresponding to the second communication voice data; sending the second communication voice data and the second translation voice data to an electronic device used by a first user; wherein the second user is a user using the headset device.

Optionally, the call speech translation apparatus further includes: and the translation text display module is used for displaying the second translation text on a display interface if the second translation text is obtained.

Optionally, the call speech translation apparatus further includes: the second recognition text display module is used for acquiring a second voice recognition text corresponding to the second communication voice data; and displaying the second voice recognition text on a display interface.

Optionally, the text obtaining module is configured to send a translation request to a server based on the first call voice data; receiving a first translation text returned by the server, wherein the first translation text is obtained by the server by translating the first call voice data in response to the translation request; and/or the earphone equipment translates the first call voice data locally to obtain a first translation text.

Optionally, the call speech translation apparatus further includes: the entrance display module is used for displaying a call voice processing function entrance on a display interface when the earphone equipment is used for calling, wherein the call voice processing function entrance comprises a call voice translation function entrance; the text acquisition module is used for receiving touch operation of a second user on the call voice translation function entry and executing the step of acquiring a first translation text corresponding to the first call voice data; wherein the second user is a user using the headset device.

Optionally, the text obtaining module is configured to obtain a second language used by a second user, where the second user is a user using the headset device; and acquiring a first translation text corresponding to the first call voice data based on the second language used by the second user.

Optionally, the text obtaining module is configured to determine, according to a system language of the headset device, a second language used by the second user; and/or determining a second language used by the second user according to the language corresponding to the second communication voice data of the second user; and/or determining a second language used by the second user according to the language selected by the second user on the display interface.

Optionally, the earphone device comprises an earphone and an earphone receiving device connected with the earphone; the earphone comprises the first playing module and the second playing module, and the earphone storage device comprises the text acquisition module and the second voice acquisition module.

Optionally, the headset device comprises a headset housing means; the earphone storage device comprises the first playing module, the text acquisition module, a second voice acquisition module and the second playing module.

The embodiment of the invention also discloses a readable storage medium, and when the instructions in the storage medium are executed by the processor of the earphone device, the earphone device can execute the conversation voice translation method in any one of the embodiments of the invention.

The embodiment of the invention also discloses a headset device, which comprises a memory and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs are configured to be executed by one or more processors and comprise instructions for: the method comprises the steps that in the process that the earphone equipment is used for communication, the earphone equipment obtains first communication voice data of a first user, wherein the first user is other communication participating users; playing the first call voice data; acquiring a first translation text corresponding to the first call voice data and displaying the first translation text; and acquiring first translation voice data corresponding to the first translation text, and playing the first translation voice data.

Optionally, the headset device further comprises instructions for: and in the process that the earphone equipment is used for communication, identifying a speaker in the communication process in a display interface.

Optionally, the headset device further comprises instructions for: displaying a first language used by the first user on a display interface, and/or displaying a second language used by a second user on the display interface; wherein the second user is a user using the headset device.

Optionally, the headset device further comprises instructions for: and acquiring a first voice recognition text corresponding to the first call voice data and displaying the first voice recognition text on a display interface.

Optionally, the headset device further comprises instructions for: acquiring second communication voice data of a second user; sending the second communication voice data to an electronic device used by a first user; or acquiring second communication voice data of a second user and acquiring a second translation text corresponding to the second communication voice data; sending the second communication voice data and the second translation text to the electronic equipment used by the first user; or acquiring second communication voice data of a second user, acquiring a second translation text corresponding to the second communication voice data, and synthesizing second translation voice data based on the second translation text; sending the second communication voice data, the second translation text and the second translation voice data to an electronic device used by a first user; or acquiring second communication voice data of a second user and acquiring second translation voice data corresponding to the second communication voice data; sending the second communication voice data and the second translation voice data to an electronic device used by a first user; wherein the second user is a user using the headset device.

Optionally, the headset device further comprises instructions for: and if the second translation text is obtained, displaying the second translation text on a display interface.

Optionally, the headset device further comprises instructions for: acquiring a second voice recognition text corresponding to the second communication voice data; and displaying the second voice recognition text on a display interface.

Optionally, the headset device further comprises instructions for: the acquiring of the first translation text corresponding to the first call voice data includes: sending a translation request to a server based on the first call voice data; receiving a first translation text returned by the server, wherein the first translation text is obtained by the server by translating the first call voice data in response to the translation request; and/or the earphone equipment translates the first call voice data locally to obtain a first translation text.

Optionally, the headset device further comprises instructions for: when the earphone equipment is used for communication, a communication voice processing function inlet is displayed on a display interface, wherein the communication voice processing function inlet comprises a communication voice translation function inlet; receiving a touch operation of a second user for the call voice translation function entry, and executing the step of acquiring a first translation text corresponding to the first call voice data; wherein the second user is a user using the headset device.

The embodiment of the invention has the following advantages:

in the embodiment of the invention, in the process of using the earphone device for communication, the earphone device can acquire first communication voice data of a first user, wherein the first user is a user participating in other communication; then, on one hand, the first call voice data can be played, on the other hand, a first translation text corresponding to the first call voice data can be acquired and displayed, and first translation voice data corresponding to the first translation text can be acquired and played. Furthermore, during the communication process, special translation equipment is not needed; therefore, translation operation in a cross-language and cross-space telephone communication scene is simplified, and conversation voice translation efficiency is improved. And moreover, the voice of the call does not need to be played outside, information leakage can be avoided, and user experience is improved.

Secondly, the time length required for translating the voice data corresponding to the first communication voice data is longer than the time length required for translating the text corresponding to the first communication voice data; and then the translation text is displayed first and then the translation voice is played, so that the user can quickly know the speaking content of the other party, and the omission of important information is avoided.

Drawings

FIG. 1 is a flowchart illustrating steps of a method for call speech translation according to an embodiment of the present invention;

FIG. 2 is a flow chart of steps in an alternative embodiment of a call speech translation method of the present invention;

FIG. 3A is a diagram of a display interface displaying first translated text, in accordance with an embodiment of the present invention;

FIG. 3B is a diagram of a display interface displaying languages used by a first user according to an embodiment of the present invention;

FIG. 3C is a diagram of a display interface displaying a first translated text and a first speech recognized text, in accordance with an embodiment of the present invention;

FIG. 3D is a diagram illustrating a display interface displaying speech processing function entries, in accordance with an embodiment of the present invention;

FIG. 3E is a diagram of a display interface displaying second translated text, in accordance with an embodiment of the present invention;

FIG. 3F is a diagram of a display interface displaying a second speech recognition text, in accordance with an embodiment of the present invention;

fig. 3G is a schematic diagram of a display interface displaying an opposite-end speech identifier according to an embodiment of the present invention;

fig. 3H is a schematic diagram of a display interface displaying a local end speech identifier according to an embodiment of the present invention;

FIG. 4 is a flow chart of steps in an alternative embodiment of a call speech translation method of the present invention;

fig. 5 is a block diagram illustrating a structure of an embodiment of a call speech translation apparatus according to the present invention;

fig. 6 is a block diagram showing the structure of an alternative embodiment of a call speech translation apparatus according to the present invention;

fig. 7 is a block diagram illustrating a headset device for call speech translation, according to an example embodiment;

fig. 8 is an electronic device for call speech translation according to another exemplary embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

The conversation voice translation method provided by the embodiment of the invention can be applied to a cross-language and cross-space telephone communication scene, and the earphone equipment used by a conversation user executes the conversation voice translation method to translate voice data in a conversation process; furthermore, special translation equipment is not needed, and the communication voice is not needed to be played; therefore, translation operation in a cross-language and cross-space telephone communication scene is simplified, and conversation voice translation efficiency is improved.

In an alternative embodiment of the invention, the headset device may have a talk function. Correspondingly, a call scenario may be that a call is directly made between the earphone devices; another call scenario may be that the earphone device is in a call with an electronic device with a call function other than the earphone device; yet another call scenario may be: the earphone devices are respectively connected with electronic devices having a call function other than the earphone devices, and then the earphone devices can respectively communicate through the electronic devices connected thereto.

In an alternative embodiment of the present invention, the headset device may not have a call function. Correspondingly, a call scenario may be: the earphone devices are respectively connected with electronic devices having a call function other than the earphone devices, and then the earphone devices can respectively communicate through the electronic devices connected thereto.

The call may be a voice call/video call performed by dialing, or a voice call/video call performed by instant messaging software, which is not limited in this embodiment of the present invention.

The following describes a call speech translation method provided by an embodiment of the present invention, taking an earphone device used by one of users as an example. Here, for convenience of distinction, a user using the above-described earphone device as an illustrative example may be referred to as a second user, and another call participation user may be referred to as a first user. The electronic device used by the first user for communication can be an earphone device, or an electronic device with a communication function except the earphone device; the embodiment of the present invention is not limited thereto; and the first user may include at least one.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a call speech translation method according to the present invention is shown, which may specifically include the following steps:

102, in the process that the earphone device is used for communication, the earphone device acquires first communication voice data of a first user, wherein the first user is other communication participating users.

In the embodiment of the invention, when the earphone device is used for talking and a first user speaks, the electronic device used by the first user can acquire first talking voice data of the first user; then sending the first call voice data to the earphone device; and the earphone device can acquire first call voice data of the first user.

The process of using the earphone device for communication may include: a call outgoing process, a call incoming process and a call process after call connection.

Step 104, playing the first call voice data; and acquiring a first translation text corresponding to the first call voice data and displaying the first translation text.

And 106, acquiring first translation voice data corresponding to the first translation text, and playing the first translation voice data.

In the embodiment of the invention, after the earphone device acquires the first call voice data, on one hand, the first call voice data can be played; on the other hand, a translation result corresponding to the first call voice data can be obtained.

Wherein the translation result may include the first translated text and the first translated speech data. The time length required for translating the voice data corresponding to the first communication voice data is longer than the time length required for determining the translation text corresponding to the first communication voice data; therefore, in order to enable a user to quickly acquire the speaking content of the other party and avoid omission of important information, in the embodiment of the invention, after acquiring the first conversation voice data, the earphone device can acquire the first translation text corresponding to the first conversation voice data. The first communication voice data can be translated locally by the earphone device to obtain a first translation text; or the earphone device communicates with the server, and the server translates the first call voice data to obtain a first translation text and returns the first translation text to the earphone device. The headset device may in turn present the first translated text. The earphone device is provided with a display module, and the display module can display the first translation text on a corresponding display interface.

And then acquiring first translation voice data corresponding to the first translation text. Wherein the first translated speech data may be synthesized locally by the headset device based on the first translated text; alternatively, the headset device communicates with the server, and the server synthesizes the first translated speech data based on the first translated text and returns the synthesized first translated speech data to the headset device. After the earphone device acquires the first translation voice data, the first translation voice data can be played.

In summary, in the embodiment of the present invention, in a process that the earphone device is used for a call, the earphone device may obtain first call voice data of a first user, where the first user is a user participating in another call; then, on one hand, the first call voice data can be played, on the other hand, a first translation text corresponding to the first call voice data can be acquired and displayed, and first translation voice data corresponding to the first translation text can be acquired and played. Furthermore, during the communication process, special translation equipment is not needed; therefore, translation operation in a cross-language and cross-space telephone communication scene is simplified, and conversation voice translation efficiency is improved. And moreover, the voice of the call does not need to be played outside, information leakage can be avoided, and user experience is improved.

Secondly, the time length required for translating the voice data corresponding to the first communication voice data is longer than the time length required for translating the text corresponding to the first communication voice data; and then the translation text is displayed first and then the translation voice is played, so that the speaking content of the other party can be quickly obtained, and the omission of important information is avoided.

In an optional embodiment of the present invention, the headset device may interpret the speech data simultaneously during the process of interpreting the speech data. Correspondingly, the playing the first translation voice data may include: and playing the first translation voice data while playing the first call voice data. That is, playing the first interpreted voice data and the first call voice data can be regarded as synchronized. And then through broadcasting the primary sound and the translation voice of the first user simultaneously, the user can know the information such as the emotion of the user who communicates with the user while knowing the user semantics of the communication with the user, and the communication experience of the user is improved.

In an optional embodiment of the present invention, the headset device may alternatively translate the call voice data. Correspondingly, the playing the first translation voice data includes: and alternately playing the first call voice data and the first translation voice data. After playing a section of first communication voice data, playing a corresponding section of first translation voice data. And further, the user can clearly know the content of each sentence of the user who talks with the user.

In an optional embodiment of the present invention, the acquiring a first translation text corresponding to the first call voice data includes: sending a translation request to a server based on the first call voice data; receiving a first translation text returned by the server, wherein the first translation text is obtained by the server by translating the first call voice data in response to the translation request; and/or the earphone equipment translates the first call voice data locally to obtain a first translation text.

In one example, it may be determined whether the headset device is processing locally and is able to determine the first translated text. If the earphone device cannot determine the first translation text locally, whether the earphone device is connected to the network is judged. Upon determining that the headset device is connected to the network, first translated text is retrieved from the server.

In one example, it may also be determined whether the headset device is connected to the network; when the earphone device is determined to be connected to the network, acquiring a first translation text from a server; when the earphone accommodating device is determined not to be connected to the network, the earphone device carries out translation locally, and a first translation text is determined. The following examples may be used for illustration.

Referring to fig. 2, a flow chart of steps of an alternative embodiment of a call speech translation method of the present invention is shown.

Step 202, in the process that the earphone device is used for communication, the earphone device acquires first communication voice data of a first user, wherein the first user is other communication participating users.

And step 204, playing the first call voice data.

Step 206, judging whether the earphone device is connected to the network.

In the embodiment of the present invention, the headset device has a networking function, and can be connected to a server through a network.

After the earphone device acquires the first call voice data, whether the earphone device is connected to a network can be judged; if it is determined that the headset device is connected to a network, step 208 may be performed; if it is determined that the headset device is not connected to a network, step 212 may be performed.

In one example of the present invention, the headset device may include a headset having a networking function and/or a headset receiving device having a networking function, and a headset receiving device connected to the headset. The determining whether the headset device is connected to a network may include: and judging whether the earphone is connected to a network or whether the earphone accommodating device is connected to the network. When it is determined that the headset is connected to a network or the headset receiving apparatus is connected to a network, it may be determined that the headset device is connected to a network; otherwise, determining that the earphone device is not connected to the network.

In an example of the present invention, the earphone device may include an earphone receiving apparatus having a networking function, and the determining whether the earphone device is connected to a network may include: and judging whether the earphone accommodating device is connected to a network. When it is determined that the headset storing apparatus is connected to a network, it may be determined that the headset device is connected to the network; otherwise, determining that the earphone device is not connected to the network.

In an example of the present invention, the headset and the headset receiving device may be wirelessly connected, for example, connected via bluetooth, and the embodiment of the present invention is not limited thereto.

And step 208, if the earphone device is determined to be connected to the network, sending a translation request to a server based on the first call voice data.

Step 210, receiving a first translation text returned by the server, where the first translation text is obtained by the server by translating the first call voice data in response to the translation request.

In the embodiment of the present invention, if it is determined that the headset device is connected to the network, the headset device may be connected to the server. The earphone device can further generate a translation request based on the first call voice data and send the translation request to the server; wherein the translation request comprises a request to translate the first call voice data into a first translation text.

After receiving the translation request, the server may parse the translation request; then carrying out voice recognition on the first call voice data according to the analysis result to obtain a first voice recognition text; and translating based on the first voice recognition text to obtain a first translation text and returning the first translation text to the earphone equipment. The earphone device can further acquire a first translation text corresponding to the first call voice data, and display the first translation text on a display interface as shown in fig. 3A; the first translation displayed on the display interface of the headset storage device in FIG. 3A is "hello, happy, has the opportunity to personally try you, let us start a bar! ".

Step 212, if it is determined that the headset device is not connected to the network, the headset device locally translates the first call voice data to obtain a first translation text.

In the embodiment of the present invention, the earphone device has a data processing capability, and if it is determined that the earphone device is not connected to the network, the earphone device may locally translate the first call voice data to obtain the first translation text.

Certainly, in the embodiment of the present invention, the headset device may obtain the first translation text from the server and translate the first translation text locally to obtain the first translation text; then, the first translation text acquired from the server and the locally determined first translation text may be fused, and the fusion result is used as the final first translation text.

And step 214, displaying the first translation text.

The earphone storage device is provided with a display module, and the first translation text can be displayed in a corresponding display interface through the display module.

And step 216, acquiring first translation voice data corresponding to the first translation text.

In an example of the present invention, if it is determined that the headset device is connected to a network, the server may determine first translated speech data corresponding to the first translated text. At this time, the translation request may further include: a request to translate the first call voice data into first translated voice data. Therefore, the server can synthesize first translation voice data based on the first translation text according to the analysis result of the translation request; the first translated speech data is then returned to the headset device. And then the earphone device can acquire the first translation voice data. Of course, if it is determined that the headset device is connected to the network, the headset device may locally synthesize the first translation speech data based on the first translation text; the embodiments of the present invention are not limited in this regard.

If it is determined that the headset device is not connected to the network, first translated speech data may be synthesized locally by the headset device based on the first translated text.

Step 218, playing the first translation voice data while playing the first call voice data.

In the embodiment of the present invention, after obtaining the first translation voice data, the earphone device may continue to play the first communication voice data, and simultaneously play the first translation voice data.

In an optional embodiment of the present invention, a manner of playing the first translation speech data while playing the first call speech data may include: and playing the first translation voice data at a volume larger than that of the first call voice data. And then the first communication voice data is played as background sound, so that the second user can not only experience the tone of the first user, but also know the content of the speech of the first user. The volume of the first communication voice data being played may be turned down, and then the first translation voice data may be played at a volume greater than the volume of the first communication voice data.

In an optional embodiment of the invention, the earphone device may comprise an earphone; correspondingly, the playing the first call voice data includes: and simultaneously playing the first call voice data in two earphones of the earphone. Further, a manner of playing the first translation voice data while playing the first call voice data may include: and playing the first conversation voice data in one earphone of the earphones, and playing the first translation voice data in the other earphone of the earphones. For example, playing the first call voice data on the left earphone and playing the first translation voice data on the right earphone; or the first translation voice data is played on the left earphone, and the first communication voice data is played on the right earphone. Therefore, the effects of enabling the second user to experience the tone of the first user and enabling the second user to know the content of the speech of the first user can be achieved.

Of course, in the embodiment of the present invention, if it is determined that the headset device is not connected to the network, a networking prompt is performed. For example, non-networked text or icons may be presented on a display interface; to prompt the second user to network the headset device. After the user networks the headset device, the above steps 208 to 210 and steps 214 to 218 may be performed.

In summary, in the embodiments of the present invention, it may be determined whether the earphone device is connected to a network; if the earphone device is determined to be connected to the network, sending a translation request to a server based on the first call voice data; receiving a first translation text returned by the server, wherein the first translation text is obtained by the server by translating the first call voice data in response to the translation request; and then can improve earphone equipment's translation degree of accuracy, improve user experience.

Secondly, in the embodiment of the present invention, if it is determined that the headset device is not connected to the network, the headset device locally translates the first call voice data to obtain a first translation text; and furthermore, under the condition that the earphone device is not networked, the earphone device can also realize the translation of the call voice.

In an optional embodiment of the present invention, the method further comprises: and displaying the first language used by the first user on a display interface. When the first translation text corresponding to the first call voice data is obtained, the language corresponding to the determined first call voice data, namely the first language used by the first user when the earphone device/server translates the first call voice data, can also be obtained. Of course, the first language used by the first user may also be determined according to the setting of the second user on the language in the call speech translation in the headset device. Then, a first language used by a first user can be displayed on a display interface of the earphone storage device; and further, the second user can always know the first language used by the first user in the conversation process. Referring to fig. 3B, the display interface of the headset receiving device in fig. 3B displays that the first language used by the first user is english.

In an optional embodiment of the present invention, the method further comprises: and acquiring a first voice recognition text corresponding to the first call voice data and displaying the first voice recognition text on a display interface. When a first translation text corresponding to first call voice data is acquired, acquiring a first voice recognition text corresponding to the first call voice data, wherein the first voice recognition text corresponding to the first call voice data is obtained by the earphone device/the server; the first speech recognition text may then be presented on a display interface of the headset storage device. And the second user can better understand the content of the speech of the first user by combining the first speech recognition text and the first translation text. Reference may be made to fig. 3C; wherein, the display interface of the earphone receiving device in FIG. 3C shows the first speech recognition text "Hello, I am gladto have the first speech recognition text of interactive television in person, let's get started! "and first translation text" hello, happy, has the opportunity to personally interview you, let us start a bar! ".

In an optional embodiment of the present invention, the method further comprises: displaying a call voice processing function inlet on a display interface, wherein the call voice processing function inlet comprises a call voice translation function inlet; and receiving touch operation of a second user aiming at the call voice translation function entrance, and executing the step of acquiring a first translation text corresponding to the first call voice data.

When the earphone equipment is used for communication, a communication voice processing function inlet can be displayed in a display interface of the earphone accommodating device; the call voice processing function portal may include a variety of portals such as a call voice translation function portal, a change of voice function portal, a noise reduction function portal, etc., as shown in fig. 3D. When a user triggers any call voice processing function entrance, the earphone device can perform corresponding voice processing on call voice data. When the user triggers the call speech translation function entry in the display interface of the headset storage device, the headset storage device may receive a touch operation of a second user on the call speech translation function entry, and then may execute the above steps 104 to 108.

In an example of the embodiment of the present invention, a plurality of function entries of the earphone storing device, such as a translation entry, a recording entry, a shorthand entry, a sound change entry, and the like, may also be displayed in the display interface of the earphone storing device. When a user needs to use the earphone equipment for telephone translation, the translation entry can be triggered, and then the earphone equipment can display the translation function entry on a display interface; the translation function portal may include a plurality of, such as a conversation translation function portal, a telephone translation function portal, and so forth. The user can perform touch operation aiming at the telephone translation function inlet, and the earphone accommodating device receives the touch operation aiming at the telephone translation function inlet of the user and then displays a call voice processing function inlet in a display interface of the earphone accommodating device; and receiving touch operation of a second user aiming at the call voice translation function entrance, and executing the step of acquiring a first translation text corresponding to the first call voice data.

Of course, the touch operation of the second user for the call voice translation function entry is received, and the step of determining whether the earphone device is connected to the network may also be performed. That is, when it is determined that the user needs to interpret the call voice, it is determined whether the ear speaker device is connected to the network.

In an optional embodiment of the present invention, the acquiring a first translation text corresponding to the first call voice data includes: acquiring a second language used by the second user; and acquiring a first translation text corresponding to the first call voice data based on the second language used by the second user. The method comprises the steps that voice recognition can be carried out on first call voice data to obtain a corresponding first voice recognition text; and then translating the first voice recognition text based on the second language used by the second user to obtain a first translation text.

In an optional embodiment of the present invention, the obtaining of the second language used by the second user includes at least one of the following:

(1) and determining a second language used by the second user according to the system language of the earphone equipment. When the user uses the earphone device for the first time, the system language of the earphone device can be set to be the common language; therefore, the system language of the headset device can be obtained, and the system language of the headset device is used as the language used by the second user.

(2) And determining a second language used by the second user according to the language corresponding to the second communication voice data of the second user. When the earphone equipment is used for conversation, the earphone equipment can also acquire second communication voice data of a second user in the speaking process of the second user; then, the language corresponding to the second communication voice data can be identified; further, the language corresponding to the second communication voice data may be used as the language used by the second user.

(3) And determining a second language used by the second user according to the language selected by the second user on the display interface. In one example, after the second user performs a touch operation on the call voice translation function entry, a plurality of language options may be displayed in the display interface of the earphone storage device; the second user can select the language options corresponding to the commonly used languages for the home terminal. And the earphone equipment can be used as a second language used by the second user according to the language option selected by the second user for the home terminal and corresponding language.

In an optional embodiment of the present invention, the method may further include: and displaying the second language used by the second user on the display interface. That is, the second language used by the second user who uses the earphone device is displayed in the display interface of the earphone storage device, so that the second user can conveniently confirm whether the language identification of the earphone device is accurate.

In an optional embodiment of the invention, when the first user also uses the headset device, then the method further comprises: acquiring second communication voice data of the second user; and sending the second communication voice data to the electronic equipment used by the first user. The headset device used by the first user may then perform the call speech translation method of embodiments of the present invention.

When the first user does not use the earphone device, the electronic device used by the first user may not be capable of translating the second voice call data of the second user after receiving the second voice call data; in an alternative embodiment of the present invention, the method further comprises: acquiring second communication voice data of the second user and acquiring a second translation text corresponding to the second communication voice data; and sending the second communication voice data and the second translated text to the electronic device used by the first user. After the electronic equipment used by the first user receives the second communication voice data and the second translation text, on one hand, the second communication voice data can be played; another aspect may present the second translated text. Of course, the electronic device used by the first user may also synthesize the second translated speech data based on the second translated text and then play the second translated speech data while playing the second translated speech data.

When the first user does not use the earphone device, the electronic device used by the first user may not only be unable to translate the second voice call data of the second user, but also unable to perform voice synthesis after receiving the second voice call data; in an optional embodiment of the present invention, therefore, the method may further include: acquiring second communication voice data of the second user, acquiring second translation texts corresponding to the second communication voice data, and synthesizing second translation voice data based on the second translation texts; and sending the second communication voice data, the second translation text and the second translation voice data to the electronic equipment used by the first user. Correspondingly, after the electronic equipment used by the first user receives the second communication voice data, the second translation text and the second translation voice data, on one hand, the second communication voice data and the second translation voice data can be played simultaneously; another aspect may present the second translated text.

When the first user does not use the earphone device, the electronic device used by the first user may not only be unable to translate the second voice call data of the second user, but also unable to perform voice synthesis after receiving the second voice call data; in an optional embodiment of the present invention, therefore, the method may further include: acquiring second communication voice data of the second user, and acquiring second translation voice data corresponding to the second communication voice data; and sending the second communication voice data and the second translation voice data to the electronic equipment used by the first user. Correspondingly, after the electronic device used by the first user receives the second communication voice data and the second translation voice data, on one hand, the second communication voice data and the second translation voice data can be played simultaneously.

In an optional embodiment of the present invention, the method further comprises: and if the second translation text is obtained, displaying the second translation text on a display interface. If the second speech data is "i call x", this subject is graduation from university. "in the case of the voice data, the corresponding second translation text is" My name is from university ", which can be shown as the content displayed on the display interface of the earphone accommodation device in fig. 3E.

Of course, in the embodiment of the present invention, the method further includes: acquiring a second voice recognition text corresponding to the second communication voice data; displaying the second speech recognition text on a display interface; so that the second user recognizes whether the headset device speech recognition is accurate. If the second speech data is the speech data of "i call x, i.e., the department graduates to university", the corresponding second speech recognition text may be as shown in the content displayed on the display interface of the headset storage device in fig. 3F. Of course, the second translated text and the second speech recognized text may be presented simultaneously; the embodiments of the present invention are not limited in this regard.

Of course, in the embodiment of the present invention, the headset device may further send the second speech recognition text to the electronic device used by the first user. The corresponding electronic equipment used by the first user can also display the second voice recognition text; the embodiments of the present invention are not limited in this regard.

In an optional embodiment of the present invention, the method may further include: and in the process that the earphone equipment is used for communication, identifying a speaker in the communication process in the display interface. The identification mode can be text identification, image identification or animation identification; the embodiments of the present invention are not limited in this regard. For example, during the speaking process of the first user, text information such as "the opposite end is speaking" may be presented; the content displayed by the display interface of the earphone accommodating device in the third embodiment of fig. 3G can be referred to; for another example, during the second user's speaking, a text message such as "you are speaking. Reference may be made to the content displayed on the display interface of the headset storage device in fig. 3H. And when the playing of the first translation text and the first translation voice is delayed, the second user can know the current speaking party according to the displayed identification without interrupting the speaking of the first user.

The following description will be given taking as an example the earphone device including an earphone and an earphone storage device.

The earphone receiving device can be used for receiving earphones, charging the earphones, providing services such as data processing, transmission and storage and providing information display services. Furthermore, in the embodiment of the present invention, the earphone storage device may acquire the first translation text corresponding to the first call voice data and display the first translation text; acquiring first translation voice data corresponding to the first translation text; and playing the first call voice data and the first translation voice data by the earphone. .

Referring to fig. 4, a flow chart of steps of an alternative embodiment of a call speech translation method of the present invention is shown.

Step 402, in a process that the earphone device is used for a call, the earphone device acquires first call voice data of a first user, wherein the first user is another call participating user, and the earphone device includes an earphone and an earphone storage device connected with the earphone.

Step 404, the earphone plays the first call voice data.

When the communication scene is that the communication between the earphone devices is directly carried out, the communication between the earphone accommodating devices can be directly carried out, and at the moment, the earphone accommodating device of the second user can directly receive the first communication voice data sent by the earphone accommodating device of the first user.

When the call scene is that the earphone device is in call with an electronic device with a call function other than the earphone device, the earphone storage device may be in call with the electronic device other than the earphone device, and at this time, the earphone storage device of the second user may directly receive first call voice data sent by the electronic device of the first user other than the earphone device.

When the communication scene is that the earphone device is respectively connected with the electronic device with the communication function except the earphone device, and then the earphone device can respectively communicate through the electronic device connected with the earphone device, the connection mode of the earphone device and the electronic device may be as follows: the earphone is connected with the electronic equipment. In one example, the headset and an electronic device other than the headset device may be wirelessly connected via bluetooth; and the earphone device can be connected through Bluetooth.

At this time, the electronic device of the second user, except the earphone device, may receive the first call voice data sent by the electronic device of the first user, except the earphone device, and then the electronic device of the second user, except the earphone device, may send the first call voice data to the earphone of the second user, and then the earphone of the second user may play the first call voice data on one hand and send the first call voice data to the earphone storage device of the second user on the other hand.

Step 406, the headset receiving device acquires a first translation text corresponding to the first call voice data and displays the first translation text.

In the embodiment of the invention, the display module is arranged in the earphone storage device, and after the first translation text is acquired by the earphone storage device, the first translation text can be displayed in the display interface corresponding to the display module.

In one example, when the number of words in the first translated text is large, the first translated text may be displayed on screen in a display interface of the headset storage device.

In one example, when the number of words in the first translated text is large, the first translated text may be displayed in a scrolling manner in a display interface of the headset storage device.

When the first translation text is displayed in a display interface in a screen-sharing manner, the earphone storage device can automatically turn over the screen; or displaying the first translation text of the next screen according to the screen turning operation or the sliding operation of the user.

Step 408, the earphone accommodating device acquires first translation voice data corresponding to the first translation text.

Step 410, the headset storage device sends the first translated speech data to the headset.

Step 412, the earphone plays the first translation voice data.

In summary, in the embodiment of the present invention, the earphone device may include an earphone and an earphone receiving device connected to the earphone; then the earphone plays the first call voice data, and meanwhile, the earphone storage device acquires a first translation text corresponding to the first call voice data and displays the first translation text; acquiring first translation voice data corresponding to the first translation text; then the earphone plays the first translation voice data; and then through earphone and earphone storage device cooperation, realize translating conversation pronunciation.

Of course, the earphone device may also only include an earphone receiving device, and the method of the embodiment of the present invention may be executed by the earphone receiving device. In other words, in the process that the earphone equipment is used for communication, the earphone accommodating device acquires first communication voice data of a first user, wherein the first user is other communication participating users; the earphone accommodating device plays the first call voice data; the earphone storage device acquires a first translation text corresponding to the first call voice data and displays the first translation text; and acquiring first translation voice data corresponding to the first translation text, and playing the first translation voice data.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

The embodiment of the invention also provides a device for translating the call voice, which is applied to the earphone equipment.

Referring to fig. 5, a block diagram of a structure of an embodiment of a call speech translation apparatus of the present invention is shown, which may specifically include the following modules:

a first voice obtaining module 502, configured to, in a process that the headset device is used for a call, obtain first call voice data of a first user, where the first user is a user participating in another call;

a first playing module 504, configured to play the first call voice data;

a text obtaining module 506, configured to obtain a first translation text corresponding to the first call voice data and display the first translation text;

a second speech obtaining module 508, configured to obtain first translation speech data corresponding to the first translation text;

a second playing module 510, configured to play the first translated speech data.

Referring to fig. 6, a block diagram of an alternative embodiment of a call speech translation apparatus of the present invention is shown.

In an optional embodiment of the present invention, the second playing module 510 includes:

the synchronous playing submodule 5102 is configured to play the first translation voice data while playing the first call voice data;

the alternate playing sub-module 5104 is configured to play the first call voice data and the first translation voice data alternately.

In an alternative embodiment of the present invention, the synchronous playing sub-module 5102 is configured to play the first translation voice data at a volume greater than that of the first call voice data.

In an optional embodiment of the present invention, the first playing module 504 is configured to play the first call voice data in two earphones of the one earphone at the same time;

the synchronous playing sub-module 5102 is configured to play the first speech data in one of the headsets and play the first translated speech data in the other one of the headsets.

In an optional embodiment of the present invention, the call speech translation apparatus further includes:

and an identifier display module 512, configured to identify a speaker in a call process in a display interface when the earphone device is used for a call.

a language display module 514, configured to display, on a display interface, a first language used by the first user, and/or display, on a display interface, a second language used by a second user; wherein the second user is a user using the headset device.

and a first recognition text display module 516, configured to obtain a first speech recognition text corresponding to the first call speech data and display the first speech recognition text on a display interface.

a data sending module 518, configured to obtain second communication voice data of a second user; sending the second communication voice data to an electronic device used by a first user; or acquiring second communication voice data of a second user and acquiring a second translation text corresponding to the second communication voice data; sending the second communication voice data and the second translation text to the electronic equipment used by the first user; or acquiring second communication voice data of a second user, acquiring a second translation text corresponding to the second communication voice data, and synthesizing second translation voice data based on the second translation text; sending the second communication voice data, the second translation text and the second translation voice data to an electronic device used by a first user; or acquiring second communication voice data of a second user and acquiring second translation voice data corresponding to the second communication voice data; sending the second communication voice data and the second translation voice data to an electronic device used by a first user; wherein the second user is a user using the headset device.

and the translation text display module 520 is configured to display the second translation text on a display interface if the second translation text is acquired.

a second recognition text display module 522, configured to obtain a second speech recognition text corresponding to the second communication speech data; and displaying the second voice recognition text on a display interface.

In an optional embodiment of the present invention, the text obtaining module 506 is configured to send a translation request to a server based on the first call voice data; receiving a first translation text returned by the server, wherein the first translation text is obtained by the server by translating the first call voice data in response to the translation request; and/or the earphone equipment translates the first call voice data locally to obtain a first translation text.

a portal presentation module 524, configured to present, when the earphone device is used for a call, a call voice processing function portal on a display interface, where the call voice processing function portal includes a call voice translation function portal;

the text obtaining module 506 is configured to receive a touch operation of a second user on the call voice translation function entry, and execute the step of obtaining a first translation text corresponding to the first call voice data; wherein the second user is a user using the headset device.

In an optional embodiment of the present invention, the text obtaining module 506 is configured to obtain a second language used by a second user, where the second user is a user using the headset device; and acquiring a first translation text corresponding to the first call voice data based on the second language used by the second user.

In an optional embodiment of the present invention, the text obtaining module 506 is configured to determine a second language used by the second user according to a system language of the headset device; and/or determining a second language used by the second user according to the language corresponding to the second communication voice data of the second user; and/or determining a second language used by the second user according to the language selected by the second user on the display interface.

In an optional embodiment of the present invention, the earphone device includes an earphone and an earphone receiving device connected to the earphone; the headset comprises the first playing module 504 and the second playing module 510, and the headset storage device comprises the text obtaining module 506 and the second voice obtaining module 508.

In an alternative embodiment of the present invention, the earphone device includes an earphone receiving means; the headset storage device includes the first playing module 504, the text obtaining module 506, a second voice obtaining module 508, and the second playing module 510.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

Fig. 7 is a block diagram illustrating a structure of an earphone device 700 for call speech translation according to an exemplary embodiment.

Referring to fig. 7, the headset device 700 may include one or more of the following components: a processing component 702, a memory 704, a power component 706, a multimedia component 708, an audio component 710, an input/output (I/O) interface 712, a sensor component 714, and a communication component 716.

The processing component 702 generally controls the overall operation of the headset device 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing element 702 may include one or more processors 720 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 702 may include one or more modules that facilitate interaction between the processing component 702 and other components. For example, the processing component 702 can include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.

The memory 704 is configured to store various types of data to support operations at the headset device 700. Examples of such data include instructions for any application or method operating on the headset device 700, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 704 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power component 706 provides power to the various components of the headset device 700. The power components 706 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the headset device 700.

The multimedia component 708 comprises a screen providing an output interface between the earphone device 700 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 708 includes a front facing camera and/or a rear facing camera. When the headset device 700 is in an operation mode, such as a photographing mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 710 is configured to output and/or input audio signals. For example, the audio component 710 includes a Microphone (MIC) configured to receive external audio signals when the headset device 700 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 704 or transmitted via the communication component 716. In some embodiments, audio component 710 also includes a speaker for outputting audio signals.

The I/O interface 712 provides an interface between the processing component 702 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 714 includes one or more sensors for providing various aspects of state assessment for the headset device 700. For example, the sensor component 714 may detect an open/closed state of the headphone device 700, the relative positioning of the components, such as a display and keypad of the headphone device 700, the sensor component 714 may also detect a change in position of the headphone device 700 or a component of the headphone device 700, the presence or absence of user contact with the headphone device 700, the orientation or acceleration/deceleration of the headphone device 700, and a change in temperature of the headphone device 700. The sensor assembly 714 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 714 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 714 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 716 is configured to facilitate wired or wireless communication between the headset device 700 and other devices. The headset device 700 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 714 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 714 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the ear speaker device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 704 comprising instructions, executable by the processor 720 of the headset device 700 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of a headset device, enable the headset device to perform a call speech translation method, the method comprising: the method comprises the steps that in the process that the earphone equipment is used for communication, the earphone equipment obtains first communication voice data of a first user, wherein the first user is other communication participating users; playing the first call voice data; acquiring a first translation text corresponding to the first call voice data and displaying the first translation text; and acquiring first translation voice data corresponding to the first translation text, and playing the first translation voice data.

Fig. 8 is a schematic structural diagram of an electronic device 800 for call speech translation according to another exemplary embodiment of the present invention. The electronic device 800 may be a server, which may vary widely due to configuration or performance, and may include one or more Central Processing Units (CPUs) 822 (e.g., one or more processors) and memory 832, one or more storage media 830 (e.g., one or more mass storage devices) storing applications 842 or data 844. Memory 832 and storage medium 830 may be, among other things, transient or persistent storage. The program stored in the storage medium 830 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 822 may be configured to communicate with the storage medium 830 to execute a series of instruction operations in the storage medium 830 on the server.

The server may also include one or more power supplies 826, one or more wired or wireless network interfaces 850, one or more input-output interfaces 858, one or more keyboards 856, and/or one or more operating systems 841, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

In an exemplary embodiment, the server is configured to be executed by the one or more central processors 1022 the one or more programs including instructions for: receiving a translation request sent by the earphone device, translating the first call voice data based on the translation request to obtain a first translation text, and returning the first translation text to the earphone device.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The present invention provides a method for translating a call speech, a device for translating a call speech and an earphone device, which are introduced in detail, and a specific example is applied in the text to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A conversation voice translation method is applied to an earphone device, and comprises the following steps:

the method comprises the steps that in the process that the earphone equipment is used for communication, the earphone equipment obtains first communication voice data of a first user, wherein the first user is other communication participating users;

playing the first call voice data; acquiring a first translation text corresponding to the first call voice data and displaying the first translation text;

and acquiring first translation voice data corresponding to the first translation text, and playing the first translation voice data.

2. The method of claim 1, wherein the playing the first translated speech data comprises:

playing the first translation voice data while playing the first call voice data; or the like, or, alternatively,

and alternately playing the first call voice data and the first translation voice data.

3. The method of claim 2, wherein playing the first translated speech data while playing the first call speech data comprises:

and playing the first translation voice data at a volume larger than that of the first call voice data.

4. The method of claim 2, wherein the headset device comprises a headset, and wherein playing the first call voice data comprises:

simultaneously playing the first call voice data in two earphones in the earphone;

the playing the first translation voice data while playing the first call voice data includes:

and playing the first conversation voice data in one earphone of the earphones, and playing the first translation voice data in the other earphone of the earphones.

5. The method of claim 1, further comprising:

and in the process that the earphone equipment is used for communication, identifying a speaker in the communication process in a display interface.

6. The method of claim 1, further comprising:

displaying a first language used by the first user on a display interface, and/or displaying a second language used by a second user on the display interface;

wherein the second user is a user using the headset device.

7. The method of claim 1, further comprising:

and acquiring a first voice recognition text corresponding to the first call voice data and displaying the first voice recognition text on a display interface.

8. The method of claim 1, further comprising:

acquiring second communication voice data of a second user; sending the second communication voice data to an electronic device used by a first user; or the like, or, alternatively,

acquiring second communication voice data of a second user and acquiring a second translation text corresponding to the second communication voice data; sending the second communication voice data and the second translation text to the electronic equipment used by the first user; or the like, or, alternatively,

acquiring second communication voice data of a second user, acquiring second translation texts corresponding to the second communication voice data, and synthesizing second translation voice data based on the second translation texts; sending the second communication voice data, the second translation text and the second translation voice data to an electronic device used by a first user; or the like, or, alternatively,

acquiring second communication voice data of a second user, and acquiring second translation voice data corresponding to the second communication voice data; sending the second communication voice data and the second translation voice data to an electronic device used by a first user;

wherein the second user is a user using the headset device.

9. The method of claim 8, further comprising:

and if the second translation text is obtained, displaying the second translation text on a display interface.

10. The method of claim 8, further comprising:

acquiring a second voice recognition text corresponding to the second communication voice data;

and displaying the second voice recognition text on a display interface.

11. The method of claim 1,

the acquiring of the first translation text corresponding to the first call voice data includes:

sending a translation request to a server based on the first call voice data; receiving a first translation text returned by the server, wherein the first translation text is obtained by the server by translating the first call voice data in response to the translation request; and/or the earphone equipment translates the first call voice data locally to obtain a first translation text.

12. The method of claim 1, further comprising:

when the earphone equipment is used for communication, a communication voice processing function inlet is displayed on a display interface, wherein the communication voice processing function inlet comprises a communication voice translation function inlet;

receiving a touch operation of a second user for the call voice translation function entry, and executing the step of acquiring a first translation text corresponding to the first call voice data;

wherein the second user is a user using the headset device.

13. The method of claim 1, wherein the obtaining the first translation text corresponding to the first call voice data comprises:

acquiring a second language used by a second user, wherein the second user is a user using the earphone device;

and acquiring a first translation text corresponding to the first call voice data based on the second language used by the second user.

14. The method of claim 13, wherein said obtaining the second language used by the second user comprises at least one of:

determining a second language used by the second user according to the system language of the earphone equipment;

determining a second language used by the second user according to the language corresponding to the second communication voice data of the second user;

and determining a second language used by the second user according to the language selected by the second user on the display interface.

15. The method of claim 1, wherein the headset device comprises a headset and a headset receiving device connected to the headset;

the playing the first call voice data includes:

the earphone plays the first call voice data;

the obtaining a first translation text corresponding to the first call voice data and displaying the first translation text includes:

the earphone storage device acquires a first translation text corresponding to the first call voice data and displays the first translation text on a display interface;

the obtaining of the first translation voice data corresponding to the first translation text includes:

the earphone storage device acquires first translation voice data corresponding to the first translation text;

the playing the first translated speech data includes:

the earphone plays the first translation voice data.

16. The method of claim 1, wherein the headset apparatus comprises a headset receiving device;

the playing the first call voice data includes:

the earphone accommodating device plays the first call voice data;

the playing the first translated speech data includes:

the earphone receiving device plays the first translation voice data.

17. A speech translation apparatus for a call, applied to a headset device, the apparatus comprising:

the first voice acquisition module is used for acquiring first call voice data of a first user by the earphone device in a call process of the earphone device, wherein the first user is a user participating in other calls;

the first playing module is used for playing the first call voice data;

the text acquisition module is used for acquiring a first translation text corresponding to the first call voice data and displaying the first translation text;

the second voice acquisition module is used for acquiring first translation voice data corresponding to the first translation text;

and the second playing module is used for playing the first translation voice data.

18. The call speech translation device according to claim 17, wherein the second playing module comprises:

the synchronous playing sub-module is used for playing the first translation voice data while playing the first communication voice data;

and the alternate playing sub-module is used for alternately playing the first call voice data and the first translation voice data.

19. The call speech translation device according to claim 18,

and the synchronous playing submodule is used for playing the first translation voice data at a volume larger than that of the first communication voice data.

20. The call speech translation device according to claim 18,

the first playing module is configured to play the first call voice data in two of the earphones at the same time;

the synchronous playing sub-module is configured to play the first call voice data in one of the earphones, and play the first translation voice data in another one of the earphones.

21. The call speech translation apparatus according to claim 17, further comprising:

and the identification display module is used for identifying the speaker in the communication process in a display interface in the communication process of the earphone equipment.

22. The call speech translation apparatus according to claim 17, further comprising:

the language display module is used for displaying a first language used by the first user on a display interface and/or displaying a second language used by the second user on the display interface; wherein the second user is a user using the headset device.

23. The call speech translation apparatus according to claim 17, further comprising:

and the first recognition text display module is used for acquiring a first voice recognition text corresponding to the first call voice data and displaying the first voice recognition text on a display interface.

24. The call speech translation apparatus according to claim 17, further comprising:

the data transmission module is used for acquiring second communication voice data of a second user; sending the second communication voice data to an electronic device used by a first user; or acquiring second communication voice data of a second user and acquiring a second translation text corresponding to the second communication voice data; sending the second communication voice data and the second translation text to the electronic equipment used by the first user; or acquiring second communication voice data of a second user, acquiring a second translation text corresponding to the second communication voice data, and synthesizing second translation voice data based on the second translation text; sending the second communication voice data, the second translation text and the second translation voice data to an electronic device used by a first user; or acquiring second communication voice data of a second user and acquiring second translation voice data corresponding to the second communication voice data; sending the second communication voice data and the second translation voice data to an electronic device used by a first user; wherein the second user is a user using the headset device.

25. The call speech translation device according to claim 24, further comprising:

and the translation text display module is used for displaying the second translation text on a display interface if the second translation text is obtained.

26. The call speech translation device according to claim 24, further comprising:

the second recognition text display module is used for acquiring a second voice recognition text corresponding to the second communication voice data; and displaying the second voice recognition text on a display interface.

27. The call speech translation device according to claim 17,

the text acquisition module is used for sending a translation request to a server based on the first call voice data; receiving a first translation text returned by the server, wherein the first translation text is obtained by the server by translating the first call voice data in response to the translation request; and/or the earphone equipment translates the first call voice data locally to obtain a first translation text.

28. The call speech translation apparatus according to claim 17, further comprising:

the entrance display module is used for displaying a call voice processing function entrance on a display interface when the earphone equipment is used for calling, wherein the call voice processing function entrance comprises a call voice translation function entrance;

the text acquisition module is used for receiving touch operation of a second user on the call voice translation function entry and executing the step of acquiring a first translation text corresponding to the first call voice data; wherein the second user is a user using the headset device.

29. The call speech translation device according to claim 17,

the text acquisition module is configured to acquire a second language used by a second user, where the second user is a user using the headset device; and acquiring a first translation text corresponding to the first call voice data based on the second language used by the second user.

30. The call speech translation device of claim 29,

the text acquisition module is configured to determine, according to a system language of the headset device, a second language used by the second user; and/or determining a second language used by the second user according to the language corresponding to the second communication voice data of the second user; and/or determining a second language used by the second user according to the language selected by the second user on the display interface.

31. The call speech translation apparatus according to claim 17, wherein the ear set includes an ear piece and an ear piece housing means connected to the ear piece; the earphone comprises the first playing module and the second playing module, and the earphone storage device comprises the text acquisition module and the second voice acquisition module.

32. The call speech translation apparatus according to claim 17, wherein the ear set includes an ear receiver; the earphone storage device comprises the first playing module, the text acquisition module, a second voice acquisition module and the second playing module.

33. An earphone device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

34. The headset device of claim 33, wherein the playing the first translated speech data comprises:

35. The headset device of claim 34, wherein playing the first translated speech data concurrently with playing the first call speech data comprises:

36. The headset device of claim 34, wherein the headset device comprises a headset, and wherein playing the first call voice data comprises:

37. The headset device of claim 33, further comprising instructions for:

38. The headset device of claim 33, further comprising instructions for:

wherein the second user is a user using the headset device.

39. The headset device of claim 33, further comprising instructions for:

40. The headset device of claim 33, further comprising instructions for:

wherein the second user is a user using the headset device.

41. The headset device of claim 40, further comprising instructions for:

42. The headset device of claim 40, further comprising instructions for:

and displaying the second voice recognition text on a display interface.

43. The headset device of claim 33, further comprising instructions for:

44. The headset device of claim 33, further comprising instructions for:

wherein the second user is a user using the headset device.

45. The headset device of claim 33, wherein the obtaining the first translated text corresponding to the first call voice data comprises:

46. The headset device of claim 45, wherein obtaining the second language used by the second user comprises at least one of:

47. The headset apparatus of claim 33, wherein the headset apparatus comprises a headset and a headset receiving device connected to the headset;

the playing the first call voice data includes:

the earphone plays the first call voice data;

the playing the first translated speech data includes:

the earphone plays the first translation voice data.

48. A headset device according to claim 33, characterized in that the headset device comprises a headset receiving means;

the playing the first call voice data includes:

the earphone accommodating device plays the first call voice data;

the playing the first translated speech data includes:

the earphone receiving device plays the first translation voice data.

49. A readable storage medium, wherein instructions in the storage medium, when executed by a processor of a headset device, enable the headset device to perform the call speech translation method according to any one of method claims 1-16.