CN110267309B

CN110267309B - Method and equipment for translating call voice in real time

Info

Publication number: CN110267309B
Application number: CN201910559064.8A
Authority: CN
Inventors: 陈景郁; 成荣飞
Original assignee: Samsung Guangzhou Mobile R&D Center; Samsung Electronics Co Ltd
Current assignee: Samsung Guangzhou Mobile R&D Center; Samsung Electronics Co Ltd
Priority date: 2019-06-26
Filing date: 2019-06-26
Publication date: 2022-09-23
Anticipated expiration: 2039-06-26
Also published as: CN110267309A

Abstract

A method and device for real-time translation of call voice are provided. The method comprises the following steps: when the electronic terminal needs to translate call voice in real time, transmitting the call voice data received in real time from the base station to a server for translating the call voice at a first encoding rate, and receiving a corresponding translation result from the server; determining whether the time for receiving the translation result corresponding to the call voice data is greater than a first time threshold; when it is determined that the time is greater than the first time threshold, communicating voice data received in real-time from a base station is transmitted to the server at a second encoding rate, and a corresponding translation result is received from the server. According to the method and the device, the real-time performance of the call voice translation function can be improved.

Description

Method and equipment for translating call voice in real time

Technical Field

The present invention relates generally to the field of electronic terminals, and more particularly, to a method and apparatus for real-time translation of call speech.

Background

With the advent of the global era, cross-regional communication has become more frequent. In the cross-region communication process, people can smoothly communicate by using translation software so as to solve the trouble caused by language obstruction. In the voice communication process, the two parties can realize barrier-free voice communication through the function of real-time translation of communication voice even if the two parties use different languages. However, the translation delay of the current call voice translation function is large, so that the translation real-time performance is poor, and the user experience is reduced.

Disclosure of Invention

An exemplary embodiment of the present invention is to provide a method and an apparatus for translating call voice in real time, which can solve the problem of poor real-time translation of call voice.

According to an exemplary embodiment of the present invention, a method for translating call voice in real time is provided, wherein the method comprises: when the electronic terminal needs to translate call voice in real time, transmitting the call voice data received in real time from the base station to a server for translating the call voice at a first encoding rate, and receiving a corresponding translation result from the server; determining whether the time for receiving the translation result corresponding to the call voice data is greater than a first time threshold; when it is determined that the time is greater than the first time threshold, communicating voice data received in real-time from a base station is transmitted to the server at a second encoding rate, and a corresponding translation result is received from the server.

Optionally, the method further comprises: and outputting the translation result received from the server.

Optionally, the second encoding rate is lower than the first encoding rate.

Optionally, the method further comprises: after transmitting the voice data to the server at the second encoding rate, when it is determined that a time taken to receive a translation result corresponding to the call voice data is less than a second time threshold, transmitting the voice data received in real time from the base station to the server at the first encoding rate, and receiving the corresponding translation result from the server.

Optionally, the first encoding rate is an encoding rate of call voice data determined by the base station when the current voice call is initiated.

Optionally, the step of transmitting the voice data received in real time from the base station to the server at the second encoding rate comprises: negotiating with a base station to reduce the encoding rate of the call voice data of the current voice call; when the negotiation is completed, receiving call voice data with a second encoding rate from the base station in real time and transmitting the call voice data to the server; or, converting the call voice data received in real time from the base station into call voice data having the second encoding rate; and transmitting the converted call voice data to the server.

Optionally, the negotiating with the base station to reduce the encoding rate of the call voice data of the current voice call includes: sending a message for requesting to encode the conversation voice data of the current voice conversation according to the encoding mode corresponding to the second encoding rate to the base station, and receiving a response message returned by the base station; or, sending a message for requesting to encode the call voice data of the current voice call at a coding rate lower than the first coding rate to the base station, and receiving a message returned by the base station for instructing to encode the call voice data of the current voice call in a coding mode corresponding to the second coding rate.

Optionally, the step of converting the call voice data received in real time from the base station into call voice data having the second encoding rate includes: and decoding the call voice data received from the base station in real time, and coding the decoded data according to a coding mode corresponding to the second coding rate to obtain the call voice data with the second coding rate.

Optionally, the coding scheme corresponding to the first coding rate is: the adaptive multi-rate wideband coding scheme, the coding scheme corresponding to the second coding rate is: adaptive multi-rate narrowband coding.

According to another exemplary embodiment of the present invention, there is provided an apparatus for translating call voice in real time, wherein the apparatus includes: a communication unit which transmits the communication voice data received in real time from the base station to a server for translating the communication voice at a first encoding rate and receives a corresponding translation result from the server when the electronic terminal needs to translate the communication voice in real time; and a determination unit that determines whether a time taken to receive a translation result corresponding to the call voice data is greater than a first time threshold, wherein when the determination unit determines that the time is greater than the first time threshold, the communication unit transmits the communication voice data received in real time from the base station to the server at a second encoding rate, and receives the corresponding translation result from the server.

Optionally, the apparatus further comprises: and a result output unit which outputs the translation result received from the server.

Optionally, the second encoding rate is lower than the first encoding rate.

Alternatively, the communication unit transmits the speech-through data received in real time from the base station to the server at the first encoding rate and receives the corresponding translation result from the server when the determination unit determines that the time taken to receive the translation result corresponding to the speech-through data is less than the second time threshold after transmitting the speech-through data to the server at the second encoding rate.

Optionally, the first encoding rate is an encoding rate of the call voice data determined by the base station when the current voice call is initiated.

Optionally, the communication unit negotiates with the base station to reduce the encoding rate of the call voice data of the current voice call; when the negotiation is completed, receiving call voice data with a second encoding rate from the base station in real time and transmitting the call voice data to the server; alternatively, the communication unit converts call voice data received in real time from the base station into call voice data having the second encoding rate, and transmits the converted call voice data to the server.

Optionally, the communication unit sends a message requesting to encode the call voice data of the current voice call according to the encoding mode corresponding to the second encoding rate to the base station, and receives a response message returned by the base station; alternatively, the communication unit transmits a message requesting encoding of call voice data of the current voice call at an encoding rate lower than the first encoding rate to the base station, and receives a message returned from the base station instructing encoding of call voice data of the current voice call in an encoding manner corresponding to the second encoding rate.

Optionally, the communication unit decodes the call voice data received in real time from the base station, and encodes the decoded data according to an encoding method corresponding to the second encoding rate to obtain the call voice data with the second encoding rate.

According to another exemplary embodiment of the present invention, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, implements the method for real-time translation of call speech as described above.

According to another exemplary embodiment of the present invention, there is provided an electronic terminal, wherein the electronic terminal includes: a processor; a memory storing a computer program which, when executed by the processor, implements the method of real-time translation of call speech as described above.

According to the method and the device for translating the call voice in real time, when the translation delay is detected to be larger, the encoding rate of the call voice data sent to the translation server is reduced to reduce the network transmission load and the data processing amount of the translation server, so that the time consumption of the call voice translation process can be effectively reduced, the time for obtaining the translation result of the call voice can be shortened, the real-time performance of the call voice translation function can be improved, and the user experience can be improved.

Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.

Drawings

The above and other objects and features of exemplary embodiments of the present invention will become more apparent from the following description taken in conjunction with the accompanying drawings which illustrate exemplary embodiments, wherein:

fig. 1 illustrates a flowchart of a method of translating call speech in real time according to an exemplary embodiment of the present invention;

fig. 2 illustrates a block diagram of an apparatus for real-time translation of call voice according to an exemplary embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.

Fig. 1 illustrates a flowchart of a method of translating call speech in real time according to an exemplary embodiment of the present invention. The method may be implemented by a computer program. For example, the method may be performed by a call voice translation application installed in the electronic terminal or by a function program implemented in an operating system of the electronic terminal. As an example, the electronic terminal may be a mobile communication terminal (e.g., a smartphone), a smart wearable device (e.g., a smart watch), or the like capable of voice call.

Referring to fig. 1, when the electronic terminal needs to translate call voice in real time, the electronic terminal transmits the call voice data received in real time from the base station to a server for translating call voice at a first encoding rate and receives a corresponding translation result from the server at step S10.

As an example, when the electronic terminal is in a voice call state and a call voice real-time translation function is turned on, it may be determined that the electronic terminal needs to translate the call voice in real time.

As an example, the first encoding rate may be an encoding rate of call voice data determined by the base station when the current voice call is initiated. For example, the first encoding rate may be an encoding rate at which the electronic terminal negotiates with the base station for the determined voice data of the call when the current voice call is initiated.

In step S20, it is determined whether the time taken to receive the translation result corresponding to the call voice data is greater than a first time threshold.

Specifically, the time taken for receiving the translation result corresponding to the call voice data is: time taken from the start of transmission of the call voice data to the server to the reception of a translation result corresponding to the call voice data from the server.

As an example, it may be periodically determined whether a time taken to receive a translation result corresponding to call voice data transmitted to the server at a first encoding rate is greater than a first time threshold.

When it is determined at step S20 that the time is greater than the first time threshold, step S30 is performed, the through speech data received in real time from the base station is transmitted to the server at the second encoding rate, and the corresponding translation result is received from the server. Specifically, when it is determined that the time is greater than the first time threshold, the speech communication data, which is then received in real time from the base station, is transmitted to the server at a second encoding rate.

When it is determined at step S20 that the time is not greater than the first time threshold, return is made to execution of step S10.

As an example, the second encoding rate may be lower than the first encoding rate. As an example, the coding rate of the call voice data transmitted to the server may be reduced by switching the coding scheme of the call voice data when the quality of the call voice is not affected much (for example, it is ensured that the signal-to-noise ratio of the call voice is not lower than a certain threshold).

As an example, the encoding rate of the call voice data for the current voice call may be negotiated down with the base station; when the negotiation is completed, the call voice data having the second encoding rate is received in real time from the base station and transmitted to the server.

As an example, a message for requesting to encode call voice data of the current voice call in an encoding manner corresponding to the second encoding rate may be sent to the base station, and a response message returned by the base station may be received; alternatively, a message requesting encoding of call voice data of the current voice call at an encoding rate lower than the first encoding rate may be transmitted to the base station, and a message returned by the base station to instruct encoding of call voice data of the current voice call in an encoding manner corresponding to the second encoding rate may be received.

As another example, call voice data received in real time from the base station may be converted into call voice data having a second encoding rate; and transmitting the converted call voice data to the server.

As an example, the call voice data with the second encoding rate may be obtained by decoding call voice data received in real time from the base station and encoding the decoded data in an encoding manner corresponding to the second encoding rate. For example, it is possible to convert call voice data having a first encoding rate received in real time from a base station into call voice data having a second encoding rate and transmit the converted call voice data to the server.

As an example, the coding scheme corresponding to the first coding rate may be: an Adaptive Multi-Rate Wideband (AMR-WB) coding scheme, the coding scheme corresponding to the second coding Rate may be: adaptive Multi-Rate Narrowband (AMR-NB: Adaptive Multi-Rate Narrowband) coding. It should be understood that the coding scheme corresponding to the first coding rate and the coding scheme corresponding to the second coding rate may be other suitable coding schemes, and the present invention is not limited thereto.

For example, adaptive multi-rate wideband coding: the voice bandwidth range is 300-3400 Hz; the sampling rate is 8KHz, and the bit depth is 16 bits; self-adaptive multi-rate narrowband coding mode: the voice bandwidth range is 50-7000 Hz; the sampling rate is 16 KHz; the bit depth is 16 bits. Taking 5 seconds of call voice as an example, the data amount after coding by adopting the AMB-WB coding mode is as follows: 16000 (sample rate) × 16 (bit depth) × 5 (time)/8 bit ═ 32 KB; the data size after being coded by adopting the AMB-NB coding mode is as follows: 8000 (sampling rate) × 16 (bit depth) × 5 (time)/8 bit ═ 16KB, it can be seen that the amount of speech data can be reduced by half by switching the coding mode from the AMR-WB coding mode to the AMR-NB coding mode.

In the prior art, when performing real-time translation of call voice, extracted downlink call voice data is usually directly packaged and sent to a translation server for processing, and when the network transmission quality is poor or the translation task of the translation server is heavy, a situation of relatively large translation delay occurs. According to the exemplary embodiment of the invention, when the translation delay is detected to be larger, the data volume of the call voice data is reduced by reducing the encoding rate of the call voice data sent to the translation server, so as to reduce the network transmission load and the data processing amount of the translation server, thereby realizing the reduction of the delay of the real-time translation of the call voice and improving the user experience.

As an example, the method for translating call voice in real time according to an exemplary embodiment of the present invention may further include: and outputting the translation result received from the server. As an example, the translation result may be a translation result in a voice form and/or a text form. By way of example, the translation results received from the translation server may be output in a variety of suitable ways. For example, the translation result may be output in the form of voice and/or text.

As an example, the method for translating call voice in real time according to an exemplary embodiment of the present invention may further include: after transmitting the voice data to the server at the second encoding rate, when it is determined that a time taken to receive a translation result corresponding to the call voice data is less than a second time threshold, transmitting the voice data received in real time from the base station to the server at the first encoding rate, and receiving the corresponding translation result from the server. As an example, the encoding rate of the call voice data of the current voice call may be negotiated up with the base station; when the negotiation is completed, the call voice data having the first encoding rate is received in real time from the base station and transmitted to the server.

In other words, after the through speech data is transmitted to the server at the second encoding rate, if it is determined that the translation delay is small, it is possible to recover to transmit the call speech data received in real time from the base station to the server at the first encoding rate, which is higher.

As an example, the second time threshold may be less than the first time threshold.

As an example, it may be periodically determined whether a time taken to receive a translation result corresponding to call voice data transmitted to the server at a second encoding rate is less than a second time threshold.

As shown in fig. 2, an apparatus for translating call voice in real time according to an exemplary embodiment of the present invention includes: a communication unit 10 and a determination unit 20.

Specifically, the communication unit 10 is configured to transmit, when the electronic terminal needs to translate call voice in real time, the communication voice data received in real time from the base station at the first encoding rate to a server for translating the call voice, and receive the corresponding translation result from the server.

The determination unit 20 is configured to determine whether a time taken to receive a translation result corresponding to the call voice data is greater than a first time threshold, wherein when the determination unit 20 determines that the time is greater than the first time threshold, the communication unit 10 transmits the communication voice data received in real time from the base station to the server at a second encoding rate, and receives the corresponding translation result from the server.

As an example, the second encoding rate may be lower than the first encoding rate.

As an example, the first encoding rate may be an encoding rate of call voice data determined by the base station when the current voice call is initiated.

As an example, the communication unit 10 may negotiate with the base station to reduce the encoding rate of the call voice data of the current voice call; when the negotiation is completed, the call voice data having the second encoding rate is received in real time from the base station and transmitted to the server.

As an example, the communication unit 10 may transmit a message for requesting encoding of call voice data of the current voice call in an encoding manner corresponding to the second encoding rate to the base station, and receive a response message returned by the base station; alternatively, the communication unit 10 may transmit a message requesting encoding of call voice data of the current voice call at a coding rate lower than the first coding rate to the base station, and receive a message returned by the base station to instruct encoding of call voice data of the current voice call in a coding manner corresponding to the second coding rate.

As another example, the communication unit 10 may convert call voice data received in real time from a base station into call voice data having the second encoding rate, and transmit the converted call voice data to the server.

As an example, the communication unit 10 may decode call voice data received in real time from the base station and encode the decoded data in an encoding manner corresponding to the second encoding rate to obtain the call voice data having the second encoding rate.

As an example, the coding scheme corresponding to the first coding rate may be: the adaptive multi-rate wideband coding scheme may further include: adaptive multi-rate narrowband coding.

As an example, the communication unit 10 may transmit the communication voice data received in real time from the base station to the server at the first encoding rate and receive the corresponding translation result from the server when the determination unit 20 determines that the time taken to receive the translation result corresponding to the communication voice data is less than the second time threshold after transmitting the communication voice data to the server at the second encoding rate.

As an example, the apparatus for translating call voice in real time according to an exemplary embodiment of the present invention may further include: a result output unit (not shown) for outputting the translation result received from the server.

It should be understood that, according to the embodiment of the present invention, the specific implementation manner of the device for translating the call speech in real time may be implemented by referring to the related specific implementation manner described in conjunction with fig. 1, and details are not described herein again.

Further, it should be understood that each unit in the apparatus for real-time translation of call voice according to an exemplary embodiment of the present invention may be implemented as a hardware component and/or a software component. The individual units may be implemented, for example, using Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs), depending on the processing performed by the individual units as defined by the skilled person.

A computer-readable storage medium according to an exemplary embodiment of the present invention stores a computer program that, when executed by a processor, causes the processor to perform the method of real-time translation of call voice of the above-described exemplary embodiment. The computer readable storage medium is any data storage device that can store data which can be read by a computer system. Examples of computer-readable storage media include: read-only memory, random access memory, compact disc read-only memory, magnetic tape, floppy disk, optical data storage device, and carrier wave (such as data transmission through the internet via a wired or wireless transmission path).

An electronic terminal according to an exemplary embodiment of the present invention includes: a processor (not shown) and a memory (not shown), wherein the memory stores a computer program which, when executed by the processor, implements the method of real-time translation of call speech as in the above exemplary embodiments.

Although a few exemplary embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.

Claims

1. A method for real-time translation of call speech, wherein the method comprises:

when the electronic terminal needs to translate call voice in real time, transmitting the call voice data received in real time from the base station to a server for translating the call voice at a first encoding rate, and receiving a corresponding translation result from the server;

determining whether the time for receiving the translation result corresponding to the call voice data is greater than a first time threshold;

transmitting the communicated voice data received in real time from the base station to the server at a second encoding rate and receiving a corresponding translation result from the server when it is determined that the time is greater than a first time threshold,

wherein, the time for receiving the translation result corresponding to the call voice data is as follows: time taken from the start of transmission of the call voice data to the server to the reception of a translation result corresponding to the call voice data from the server.

2. The method of claim 1, wherein the method further comprises: outputting the translation result received from the server;

and/or the second encoding rate is lower than the first encoding rate;

and/or, the method further comprises: after transmitting the voice data to the server at the second encoding rate, when determining that the time taken to receive the translation result corresponding to the call voice data is less than the second time threshold, transmitting the voice data received in real time from the base station to the server at the first encoding rate, and receiving the corresponding translation result from the server;

and/or the first coding rate is the coding rate of the call voice data determined by the base station when the current voice call is initiated;

and/or, the step of transmitting the voice data received in real time from the base station to the server at the second encoding rate comprises:

negotiating with a base station to reduce the encoding rate of the call voice data of the current voice call; when the negotiation is completed, receiving call voice data with a second encoding rate from the base station in real time and transmitting the call voice data to the server;

or, converting the call voice data received in real time from the base station into call voice data having the second encoding rate; and transmitting the converted call voice data to the server.

3. The method of claim 2, wherein negotiating with the base station to reduce the coding rate of the call voice data for the current voice call comprises: sending a message for requesting to encode the call voice data of the current voice call according to the encoding mode corresponding to the second encoding rate to the base station, and receiving a response message returned by the base station; or sending a message for requesting to encode the call voice data of the current voice call at a coding rate lower than the first coding rate to the base station, and receiving a message returned by the base station for instructing to encode the call voice data of the current voice call in a coding mode corresponding to the second coding rate;

and/or the step of converting the call voice data received in real time from the base station into the call voice data having the second encoding rate includes: and decoding the call voice data received from the base station in real time, and coding the decoded data according to a coding mode corresponding to the second coding rate to obtain the call voice data with the second coding rate.

4. The method of claim 3, wherein the coding scheme corresponding to the first coding rate is: the adaptive multi-rate wideband coding scheme, the coding scheme corresponding to the second coding rate is: adaptive multi-rate narrowband coding.

5. An apparatus for real-time translation of call speech, wherein the apparatus comprises:

a communication unit which transmits the communication voice data received in real time from the base station to a server for translating the communication voice at a first encoding rate and receives a corresponding translation result from the server when the electronic terminal needs to translate the communication voice in real time;

a determination unit that determines whether a time taken to receive a translation result corresponding to the call voice data is greater than a first time threshold,

wherein, when the determination unit determines that the time is greater than a first time threshold, the communication unit transmits the communication voice data received in real time from the base station to the server at a second encoding rate and receives a corresponding translation result from the server,

6. The apparatus of claim 5, wherein the apparatus further comprises: a result output unit that outputs the translation result received from the server;

and/or the second encoding rate is lower than the first encoding rate;

and/or the communication unit transmits the communication voice data received in real time from the base station to the server at the first encoding rate and receives the corresponding translation result from the server when the determination unit determines that the time taken to receive the translation result corresponding to the communication voice data is less than the second time threshold after transmitting the communication voice data to the server at the second encoding rate;

and/or the communication unit and the base station negotiate to reduce the coding rate of the call voice data of the current voice call; when the negotiation is completed, receiving call voice data with a second encoding rate from the base station in real time and transmitting the call voice data to the server; alternatively, the communication unit converts call voice data received in real time from the base station into call voice data having the second encoding rate, and transmits the converted call voice data to the server.

7. The apparatus of claim 6, wherein the communication unit transmits a message requesting encoding of call voice data for the current voice call in an encoding manner corresponding to the second encoding rate to the base station, and receives a response message returned from the base station; or, the communication unit sends a message for requesting to encode the call voice data of the current voice call at a coding rate lower than the first coding rate to the base station, and receives a message returned by the base station for instructing to encode the call voice data of the current voice call in a coding mode corresponding to the second coding rate;

and/or the communication unit decodes the call voice data received from the base station in real time and codes the decoded data according to a coding mode corresponding to the second coding rate to obtain the call voice data with the second coding rate.

8. The apparatus of claim 7, wherein the coding scheme corresponding to the first coding rate is: the adaptive multi-rate wideband coding mode corresponds to a second coding rate as follows: adaptive multi-rate narrowband coding.

9. A computer-readable storage medium storing a computer program which, when executed by a processor, implements a method of real-time translation of call speech according to any one of claims 1 to 4.

10. An electronic terminal, wherein the electronic terminal comprises:

a processor;

memory storing a computer program which, when executed by a processor, implements a method of real-time translation of call speech according to any of claims 1 to 4.