CN110265061B - Method and equipment for translating call voice in real time - Google Patents

Method and equipment for translating call voice in real time Download PDF

Info

Publication number
CN110265061B
CN110265061B CN201910559564.1A CN201910559564A CN110265061B CN 110265061 B CN110265061 B CN 110265061B CN 201910559564 A CN201910559564 A CN 201910559564A CN 110265061 B CN110265061 B CN 110265061B
Authority
CN
China
Prior art keywords
translation
voice
call voice
call
met
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910559564.1A
Other languages
Chinese (zh)
Other versions
CN110265061A (en
Inventor
陈景郁
成荣飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Guangzhou Mobile R&D Center
Samsung Electronics Co Ltd
Original Assignee
Samsung Guangzhou Mobile R&D Center
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Guangzhou Mobile R&D Center, Samsung Electronics Co Ltd filed Critical Samsung Guangzhou Mobile R&D Center
Priority to CN201910559564.1A priority Critical patent/CN110265061B/en
Publication of CN110265061A publication Critical patent/CN110265061A/en
Application granted granted Critical
Publication of CN110265061B publication Critical patent/CN110265061B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)
  • Machine Translation (AREA)

Abstract

A method and device for real-time translation of call voice are provided. The method comprises the following steps: when the electronic terminal needs to translate call voice in real time, detecting whether a preset condition is met; when the preset condition is detected to be met, the collected call voice is sent to a translation server for translating the call voice; when detecting that the preset condition is not met, carrying out tone quality preprocessing on the collected call voice, and sending the processed call voice to a translation server; and receiving a translation result corresponding to the transmitted call voice from the translation server. According to the method and the device, the real-time performance of the call voice translation function can be improved.

Description

Method and equipment for translating call voice in real time
Technical Field
The present invention relates generally to the field of electronic technology, and more particularly, to a method and apparatus for real-time translation of call speech.
Background
With the advent of the global era, cross-regional communication has become more frequent. In the cross-region communication process, people can smoothly communicate by using translation software so as to solve the trouble caused by language obstruction. In the voice communication process, the two parties can realize barrier-free voice communication through the function of real-time translation of communication voice even if the two parties use different languages. However, the translation delay of the current call voice translation function is large, so that the translation real-time performance is poor, and the user experience is reduced.
Disclosure of Invention
An exemplary embodiment of the present invention is to provide a method and an apparatus for translating call voice in real time, which can solve the problem of poor real-time translation of call voice.
According to an exemplary embodiment of the present invention, a method for real-time translation of call voice is provided, wherein the method comprises: when the electronic terminal needs to translate call voice in real time, detecting whether a preset condition is met; when the preset condition is detected to be met, the collected call voice is sent to a translation server for translating the call voice; when detecting that the preset condition is not met, carrying out tone quality preprocessing on the collected call voice, and sending the processed call voice to a translation server; and receiving a translation result corresponding to the transmitted call voice from the translation server.
Optionally, the method further comprises: and transmitting the translation result received from the translation server to the base station to be forwarded by the base station to another electronic terminal in voice communication with the electronic terminal.
Optionally, the step of detecting whether the preset condition is met includes: periodically detecting whether a preset condition is met; or detecting whether preset conditions are met or not in real time.
Optionally, the preset condition includes: the collected voice quality of the call voice meets a specific condition and/or the translation server can carry out voice quality preprocessing on the received call voice to be translated.
Optionally, the sound quality preprocessing comprises: noise reduction processing and/or echo cancellation processing.
Optionally, the step of sending the translation result received from the translation server to the base station includes: and performing sound quality post-processing on the translation result received from the translation server, and sending the processed translation result to the base station, wherein the translation result is a translation result in a voice form.
Optionally, the specific condition is: the signal to noise ratio is higher than a preset threshold.
Optionally, the sound quality post-processing includes: filtering processes and/or gain settings.
According to another exemplary embodiment of the present invention, a method for real-time translation of call voice is provided, wherein the method comprises: when the electronic terminal needs to translate call voice in real time, detecting whether a preset condition is met; when detecting that a preset condition is met, sending the call voice received from the base station to a translation server for translating the call voice; when detecting that the preset condition is not met, carrying out tone quality preprocessing on the call voice received from the base station, and sending the processed call voice to a translation server; and receiving a translation result corresponding to the transmitted call voice from the translation server.
Optionally, the method further comprises: and outputting the translation result received from the translation server.
Optionally, the step of detecting whether the preset condition is met includes: periodically detecting whether a preset condition is met; or detecting whether preset conditions are met or not in real time.
Optionally, the preset condition includes: the voice quality of the call voice received from the base station meets a specific condition and/or the translation server performs voice quality preprocessing on the received call voice to be translated.
Optionally, the sound quality preprocessing comprises: noise reduction processing and/or echo cancellation processing.
Optionally, the step of outputting the translation result received from the translation server includes: and performing sound quality post-processing on the translation result received from the translation server, and outputting the processed translation result, wherein the translation result is a translation result in a voice form.
Optionally, the specific condition is: the signal to noise ratio is higher than a preset threshold.
Optionally, the sound quality post-processing includes: filtering processes and/or gain settings.
According to another exemplary embodiment of the present invention, there is provided an apparatus for translating call voice in real time, wherein the apparatus includes: the voice quality detection unit is used for detecting whether a preset condition is met or not when the electronic terminal needs to translate the call voice in real time; the voice quality processing unit is used for preprocessing the voice quality of the collected call voice when the voice quality processing unit detects that the preset condition is not met; the transmitting unit is used for transmitting the collected call voice to a translation server for translating the call voice when the preset condition is detected to be met; when detecting that the preset condition is not met, sending the call voice processed by the tone quality processing unit to a translation server; and a translation result receiving unit that receives a translation result corresponding to the transmitted call voice from the translation server.
Optionally, the transmitting unit further transmits the translation result received from the translation server to the base station to be forwarded by the base station to another electronic terminal having a voice call with the electronic terminal.
Optionally, the sound quality detection unit periodically detects whether a preset condition is met; or, the tone quality detection unit detects whether preset conditions are met in real time.
Optionally, the preset condition includes: the collected voice quality of the call voice meets a specific condition and/or the translation server can carry out voice quality preprocessing on the received call voice to be translated.
Optionally, the sound quality preprocessing comprises: noise reduction processing and/or echo cancellation processing.
Optionally, the voice quality processing unit performs voice quality post-processing on the translation result received from the translation server, wherein the transmitting unit transmits the translation result processed by the voice quality processing unit to the base station, wherein the translation result is a translation result in a speech form.
Optionally, the specific condition is: the signal to noise ratio is higher than a preset threshold.
Optionally, the sound quality post-processing includes: filtering processes and/or gain settings.
According to another exemplary embodiment of the present invention, there is provided an apparatus for translating call voice in real time, wherein the apparatus includes: the voice quality detection unit is used for detecting whether a preset condition is met or not when the electronic terminal needs to translate the call voice in real time; the tone quality processing unit is used for preprocessing the tone quality of the call voice received from the base station when detecting that the preset condition is not met; a transmitting unit which transmits the call voice received from the base station to a translation server for translating the call voice when it is detected that a preset condition is satisfied; when detecting that the preset condition is not met, sending the call voice processed by the tone quality processing unit to a translation server; and a translation result receiving unit that receives a translation result corresponding to the transmitted call voice from the translation server.
Optionally, the apparatus further comprises: and an output unit that outputs the translation result received from the translation server.
Optionally, the sound quality detection unit periodically detects whether a preset condition is met; or, the tone quality detection unit detects whether preset conditions are met in real time.
Optionally, the preset condition includes: the voice quality of the call voice received from the base station meets a specific condition and/or the translation server performs voice quality preprocessing on the received call voice to be translated.
Optionally, the sound quality preprocessing comprises: noise reduction processing and/or echo cancellation processing.
Optionally, the voice quality processing unit performs voice quality post-processing on the translation result received from the translation server, wherein the output unit outputs the processed translation result, and the translation result is a translation result in a speech form.
Optionally, the specific condition is: the signal to noise ratio is higher than a preset threshold.
Optionally, the sound quality post-processing includes: filtering processes and/or gain settings.
According to another exemplary embodiment of the present invention, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, implements the method for real-time translation of call speech as described above.
According to another exemplary embodiment of the present invention, there is provided an electronic terminal, wherein the electronic terminal includes: a processor; a memory storing a computer program which, when executed by the processor, implements the method of real-time translation of call speech as described above.
According to the method and the device for translating the call voice in real time, the time consumption of the call voice translation process can be effectively reduced, and the time for acquiring the translation result of the call voice is shortened, so that the real-time performance of the call voice translation function is improved, and the user experience is improved.
Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.
Drawings
The above and other objects and features of exemplary embodiments of the present invention will become more apparent from the following description taken in conjunction with the accompanying drawings which illustrate exemplary embodiments, wherein:
fig. 1 shows a flowchart of a method of real-time translation of call speech according to a first exemplary embodiment of the present invention;
fig. 2 shows a flowchart of a method of real-time translation of call speech according to a second exemplary embodiment of the present invention;
fig. 3 is a block diagram illustrating an apparatus for real-time translation of call voice according to a first exemplary embodiment of the present invention;
fig. 4 is a block diagram illustrating an apparatus for real-time translation of call voice according to a second exemplary embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.
Fig. 1 shows a flowchart of a method for real-time translation of call speech according to a first exemplary embodiment of the present invention. The method may be implemented by a computer program. For example, the method may be performed by a call voice translation application installed in the electronic terminal or by a function program implemented in an operating system of the electronic terminal. As an example, the electronic terminal may be a mobile communication terminal (e.g., a smartphone), a smart wearable device (e.g., a smart watch), or the like capable of voice call.
Referring to fig. 1, in step S10, when the electronic terminal needs to translate the call voice in real time, it is detected whether a first preset condition is satisfied.
As an example, when the electronic terminal is in a voice call state and a call voice real-time translation function is turned on, it may be determined that the electronic terminal needs to translate the call voice in real time.
As an example, it may be periodically detected whether a first preset condition is satisfied.
As another example, whether the first preset condition is satisfied may be detected in real time.
As an example, the first preset condition may include: the collected voice quality of the call voice meets a first specific condition and/or the translation server for translating the call voice performs voice quality preprocessing on the received call voice to be translated.
As an example, the collected call voice may be a call voice collected through a microphone of the electronic terminal.
As an example, the first specific condition may be: the signal to noise ratio is higher than a preset threshold. For example, the smaller the ambient noise, the higher the signal-to-noise ratio; the smaller the echo signal, the higher the signal-to-noise ratio.
It should be understood that the first specific condition may also be other conditions for determining whether the speech quality of the call speech is good enough without performing the tone quality preprocessing.
It should be appreciated that the determination of whether the translation server will perform timbre preprocessing on the received speech to be translated and then translate the speech may be made in any suitable manner. For example, the translation server for translating the call voice at this time may be asked whether or not to perform tone quality preprocessing on the received call voice to be translated; or, whether a translation server for translating the call voice at this time performs voice quality preprocessing on the received call voice to be translated can be confirmed through a corresponding database, wherein the database can record whether different translation servers perform voice quality preprocessing on the call voice to be translated.
In step S20, when it is detected that the first preset condition is satisfied, the collected call voice is directly sent to a translation server for translating the call voice without performing tone quality preprocessing on the collected call voice.
In step S30, when it is detected that the first preset condition is not satisfied, tone quality preprocessing is performed on the collected call voice, and the processed call voice is sent to the translation server.
It should be understood that the psychoacoustic preprocessing may include various suitable psychoacoustic processing approaches. As an example, the timbre preprocessing may include: noise reduction processing and/or echo cancellation processing.
In step S40, a translation result corresponding to the transmitted call voice is received from the translation server.
In the prior art, the conversation voice after tone quality preprocessing is uniformly sent to a translation server for processing, and particularly, when the network transmission quality is poor or the translation task of the translation server is heavy, the situation of large translation delay can occur; when the translation server also performs tone quality preprocessing on the received call voice to be translated, the problem that time and computing resources are wasted by repeatedly performing tone quality preprocessing on the call voice also occurs. According to the exemplary embodiment of the invention, the voice quality of the call voice can be detected firstly, only when the voice quality of the call voice is poor, the voice quality of the call voice is preprocessed, and the processed call voice is sent to the translation server; whether the translation server carries out tone quality preprocessing on the received call voice to be translated firstly and then translates the received call voice is determined, tone quality preprocessing is carried out on the call voice only when the translation server does not carry out tone quality preprocessing, the processed call voice is sent to the translation server, and when the translation server carries out tone quality preprocessing, the call voice without tone quality preprocessing is directly sent to the translation server so as to avoid tone quality preprocessing of both sides; in addition, when the translation server does not perform tone quality preprocessing on the received to-be-translated call voice and the voice quality of the call voice is poor, tone quality preprocessing is performed on the call voice and the processed call voice is sent to the translation server, otherwise, the call voice without tone quality preprocessing is directly sent to the translation server. According to the embodiment of the invention, the translation server can acquire and process the call voice to be translated as early as possible, so that the processing speed of the call voice translation process is increased, the real-time performance of the call voice translation is improved, and the operation load of the electronic terminal caused by voice quality preprocessing on the call voice can be reduced.
As an example, the method for translating call voice in real time according to the first exemplary embodiment of the present invention may further include: and transmitting the translation result received from the translation server to the base station to be forwarded by the base station to another electronic terminal in voice communication with the electronic terminal.
As an example, the translation result may be a translation result in a speech form and/or a text form.
As an example, the sound quality post-processing may be performed on the translation result received from the translation server, and the processed translation result may be transmitted to the base station, wherein the translation result is a translation result in a speech form.
It should be understood that the psychoacoustic post-processing may include various suitable psychoacoustic processing approaches. As an example, the sound quality post-processing may include: filtering processes and/or gain settings.
Fig. 2 shows a flowchart of a method for real-time translation of call speech according to a second exemplary embodiment of the present invention.
Referring to fig. 2, in step S50, when the electronic terminal needs to translate the call voice in real time, it is detected whether a second preset condition is satisfied.
As an example, when the electronic terminal is in a voice call state and a call voice real-time translation function is turned on, it may be determined that the electronic terminal needs to translate the call voice in real time.
As an example, it may be periodically detected whether the second preset condition is satisfied.
As another example, whether the second preset condition is satisfied may be detected in real time.
As an example, the second preset condition may include: the voice quality of the call voice received from the base station satisfies a second specific condition and/or the translation server for translating the call voice performs voice quality preprocessing on the received call voice to be translated.
As an example, the second specific condition may be: the signal to noise ratio is higher than a preset threshold.
It should be understood that the second specific condition may also be other conditions for determining whether the speech quality of the call speech is good enough without performing the tone quality preprocessing.
It should be appreciated that the determination of whether the translation server will perform timbre preprocessing on the received speech to be translated and then translate the speech may be made in any suitable manner. For example, the translation server for translating the call voice at this time may be asked whether or not to perform tone quality preprocessing on the received call voice to be translated; or, whether a translation server for translating the call voice at this time performs voice quality preprocessing on the received call voice to be translated can be confirmed through a corresponding database, wherein the database can record whether different translation servers perform voice quality preprocessing on the call voice to be translated.
In step S60, when it is detected that the second preset condition is satisfied, the call voice received from the base station is directly transmitted to the translation server for translating the call voice without performing the voice quality preprocessing on the call voice received from the base station.
In step S70, when it is detected that the second preset condition is not satisfied, voice quality preprocessing is performed on the call voice received from the base station, and the processed call voice is transmitted to the translation server.
It should be understood that the psychoacoustic preprocessing may include various suitable psychoacoustic processing approaches. As an example, the timbre preprocessing may include: noise reduction processing and/or echo cancellation processing.
In step S80, a translation result corresponding to the transmitted call voice is received from the translation server.
As an example, the method for real-time translation of call voice according to the second exemplary embodiment of the present invention may further include: and outputting the translation result received from the translation server.
As an example, the translation result may be a translation result in a speech form and/or a text form.
By way of example, the translation results received from the translation server may be output in a variety of suitable ways. For example, the translation result may be output in the form of voice and/or text.
As an example, the sound quality post-processing may be performed on the translation result received from the translation server, and the processed translation result may be output, where the translation result is a translation result in a speech form.
It should be understood that the psychoacoustic post-processing may include various suitable psychoacoustic processing approaches. As an example, the sound quality post-processing may include: filtering processes and/or gain settings.
Fig. 3 illustrates a block diagram of an apparatus for real-time translation of call voice according to a first exemplary embodiment of the present invention.
As shown in fig. 3, the apparatus for translating call voice in real time according to the first exemplary embodiment of the present invention includes: voice quality detection section 10, voice quality processing section 20, transmission section 30, and translation result reception section 40.
Specifically, the sound quality detection unit 10 is configured to detect whether a first preset condition is satisfied when the electronic terminal needs to translate the call voice in real time.
As an example, the sound quality detection unit 10 may determine that the electronic terminal needs to translate the call voice in real time when the electronic terminal is in a voice call state and the call voice real-time translation function is turned on.
As an example, the sound quality detection unit 10 may periodically detect whether the first preset condition is satisfied.
As another example, the sound quality detection unit 10 may detect whether the first preset condition is satisfied in real time.
As an example, the first preset condition may include: the collected voice quality of the call voice meets a first specific condition and/or the translation server for translating the call voice performs voice quality preprocessing on the received call voice to be translated.
As an example, the first specific condition may be: the signal to noise ratio is higher than a preset threshold. For example, the smaller the ambient noise, the higher the signal-to-noise ratio; the smaller the echo signal, the higher the signal-to-noise ratio.
As an example, the collected call voice may be a call voice collected through a microphone of the electronic terminal.
It should be understood that the first specific condition may also be other conditions for determining whether the speech quality of the call speech is good enough without performing the tone quality preprocessing.
The voice quality processing unit 20 is configured to perform voice quality preprocessing on the collected call voice when it is detected that the preset condition is not satisfied.
The sending unit 30 is configured to send the collected call voice to a translation server for translating the call voice when it is detected that a preset condition is met; when detecting that the preset condition is not satisfied, the call voice processed by the voice quality processing unit 20 is sent to the translation server.
Specifically, when it is detected that the first preset condition is satisfied, the voice quality processing unit 20 does not perform voice quality preprocessing on the collected call voice, and the transmitting unit 30 directly transmits the collected call voice to a translation server for translating the call voice; when detecting that the first preset condition is not satisfied, the voice quality processing unit 20 performs voice quality preprocessing on the collected call voice, and the sending unit 30 sends the processed call voice to the translation server.
It should be understood that the psychoacoustic preprocessing may include various suitable psychoacoustic processing approaches. As an example, the timbre preprocessing may include: noise reduction processing and/or echo cancellation processing.
The translation result receiving unit 40 is configured to receive a translation result corresponding to the transmitted call voice from the translation server.
As an example, the transmitting unit 30 may also transmit the translation result received from the translation server to the base station to be forwarded by the base station to another electronic terminal that performs a voice call with the electronic terminal.
As an example, the translation result may be a translation result in a speech form and/or a text form.
As an example, the voice quality processing unit 20 may perform voice quality post-processing on the translation result received from the translation server, and the transmitting unit 30 may transmit the processed translation result to the base station, wherein the translation result is a translation result in a speech form.
It should be understood that the psychoacoustic post-processing may include various suitable psychoacoustic processing approaches. As an example, the sound quality post-processing may include: filtering processes and/or gain settings.
Fig. 4 is a block diagram illustrating an apparatus for real-time translation of call voice according to a second exemplary embodiment of the present invention.
As shown in fig. 4, an apparatus for translating call voice in real time according to a second exemplary embodiment of the present invention includes: voice quality detection section 50, voice quality processing section 60, transmission section 70, and translation result reception section 80.
Specifically, the sound quality detection unit 50 is configured to detect whether the second preset condition is satisfied when the electronic terminal needs to translate the call voice in real time.
As an example, the sound quality detection unit 50 may determine that the electronic terminal needs to translate the call voice in real time when the electronic terminal is in a voice call state and the call voice real-time translation function is turned on.
As an example, the sound quality detection unit 50 may periodically detect whether the second preset condition is satisfied.
As another example, the sound quality detection unit 50 may detect whether the second preset condition is satisfied in real time.
As an example, the second preset condition may include: the voice quality of the call voice received from the base station satisfies a second specific condition and/or the translation server for translating the call voice performs voice quality preprocessing on the received call voice to be translated.
As an example, the second specific condition may be: the signal to noise ratio is higher than a preset threshold.
It should be understood that the second specific condition may also be other conditions for determining whether the speech quality of the call speech is good enough without performing the tone quality preprocessing.
The voice quality processing unit 60 is configured to perform voice quality preprocessing on the call voice received from the base station when it is detected that the preset condition is not satisfied.
The transmitting unit 70 is configured to transmit the call voice received from the base station to a translation server for translating the call voice when it is detected that a preset condition is satisfied; when detecting that the preset condition is not satisfied, the call voice processed by the voice quality processing unit 60 is sent to the translation server.
Specifically, when it is detected that the second preset condition is satisfied, the voice quality processing unit 60 does not perform voice quality preprocessing on the call voice received from the base station, and the transmitting unit 70 directly transmits the call voice received from the base station to a translation server for translating the call voice; when it is detected that the second preset condition is not satisfied, the voice quality processing unit 60 performs voice quality preprocessing on the call voice received from the base station, and the transmitting unit 70 transmits the processed call voice to the translation server.
It should be understood that the psychoacoustic preprocessing may include various suitable psychoacoustic processing approaches. As an example, the timbre preprocessing may include: noise reduction processing and/or echo cancellation processing.
The translation result receiving unit 80 is configured to receive a translation result corresponding to the transmitted call voice from the translation server.
As an example, the apparatus for translating call voice in real time according to the second exemplary embodiment of the present invention may further include: an output unit (not shown) for outputting the translation result received from the translation server.
As an example, the translation result may be a translation result in a speech form and/or a text form.
As an example, the output unit may output the translation result received from the translation server in various appropriate manners. For example, the output unit may output the translation result in the form of voice and/or text.
As an example, the voice quality processing unit 60 may perform voice quality post-processing on the translation result received from the translation server, and the output unit outputs the processed translation result, wherein the translation result is a translation result in a speech form.
It should be understood that the psychoacoustic post-processing may include various suitable psychoacoustic processing approaches. As an example, the sound quality post-processing may include: filtering processes and/or gain settings.
It should be understood that the device for real-time translation of call voice according to the first exemplary embodiment of the present invention may perform the method described with reference to fig. 1, and thus, in order to avoid repetition, detailed description thereof is omitted. The device for translating call voice in real time according to the second exemplary embodiment of the present invention may perform the method described with reference to fig. 2, and thus, in order to avoid repetition, details are not repeated herein.
Further, it should be understood that each unit in the apparatus for real-time translation of call voice according to the first exemplary embodiment of the present invention may be implemented as a hardware component and/or a software component. The individual units may be implemented, for example, using Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs), depending on the processing performed by the individual units as defined by the skilled person.
A computer-readable storage medium according to an exemplary embodiment of the present invention stores a computer program that, when executed by a processor, causes the processor to perform the method of real-time translation of call voice of the first exemplary embodiment. The computer readable storage medium is any data storage device that can store data which can be read by a computer system. Examples of computer-readable storage media include: read-only memory, random access memory, read-only optical disks, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the internet via wired or wireless transmission paths).
An electronic terminal according to an exemplary embodiment of the present invention includes: a processor (not shown) and a memory (not shown), wherein the memory stores a computer program which, when executed by the processor, implements the method of real-time translation of call speech as in the first exemplary embodiment.
Further, it should be understood that each unit in the apparatus for real-time translation of call voice according to the second exemplary embodiment of the present invention may be implemented as a hardware component and/or a software component. The individual units may be implemented, for example, using Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs), depending on the processing performed by the individual units as defined by the skilled person.
A computer-readable storage medium according to an exemplary embodiment of the present invention stores a computer program that, when executed by a processor, causes the processor to perform the method of real-time translation of call voice of the second exemplary embodiment. The computer readable storage medium is any data storage device that can store data which can be read by a computer system. Examples of computer-readable storage media include: read-only memory, random access memory, read-only optical disks, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the internet via wired or wireless transmission paths).
An electronic terminal according to an exemplary embodiment of the present invention includes: a processor (not shown) and a memory (not shown), wherein the memory stores a computer program which, when executed by the processor, implements the method of real-time translation of call speech as in the second exemplary embodiment.
Although a few exemplary embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.

Claims (18)

1. A method for real-time translation of call speech, wherein the method comprises:
when the electronic terminal needs to translate call voice in real time, detecting whether a preset condition is met;
when the preset condition is detected to be met, the collected call voice is sent to a translation server for translating the call voice;
when detecting that the preset condition is not met, carrying out tone quality preprocessing on the collected call voice, and sending the processed call voice to a translation server;
receiving a translation result corresponding to the transmitted call voice from the translation server,
wherein the preset conditions include: and the translation server can carry out tone quality preprocessing on the received call voice to be translated.
2. The method of claim 1, wherein the method further comprises: transmitting the translation result received from the translation server to the base station to be forwarded by the base station to another electronic terminal having a voice call with the electronic terminal;
and/or the step of detecting whether the preset condition is met comprises the following steps: periodically detecting whether a preset condition is met; or detecting whether preset conditions are met or not in real time;
and/or, the tone quality preprocessing comprises: noise reduction processing or echo cancellation processing.
3. The method of claim 2, wherein the transmitting the translation result received from the translation server to the base station comprises: and performing sound quality post-processing on the translation result received from the translation server, and sending the processed translation result to the base station, wherein the translation result is a translation result in a voice form.
4. The method of claim 3, wherein the psychoacoustic post-processing comprises: filtering processes and/or gain settings.
5. A method for real-time translation of call speech, wherein the method comprises:
when the electronic terminal needs to translate call voice in real time, detecting whether a preset condition is met;
when detecting that a preset condition is met, sending the call voice received from the base station to a translation server for translating the call voice;
when detecting that the preset condition is not met, carrying out tone quality preprocessing on the call voice received from the base station, and sending the processed call voice to a translation server;
receiving a translation result corresponding to the transmitted call voice from the translation server,
wherein the preset conditions include: and the translation server can carry out tone quality preprocessing on the received call voice to be translated.
6. The method of claim 5, wherein the method further comprises: outputting the translation result received from the translation server;
and/or the step of detecting whether the preset condition is met comprises the following steps: periodically detecting whether a preset condition is met; or detecting whether preset conditions are met or not in real time;
and/or, the tone quality preprocessing comprises: noise reduction processing or echo cancellation processing.
7. The method of claim 6, wherein the outputting of the translation result received from the translation server comprises: and performing sound quality post-processing on the translation result received from the translation server, and outputting the processed translation result, wherein the translation result is a translation result in a voice form.
8. The method of claim 7, wherein the psychoacoustic post-processing comprises: filtering processes and/or gain settings.
9. An apparatus for real-time translation of call speech, wherein the apparatus comprises:
the voice quality detection unit is used for detecting whether a preset condition is met or not when the electronic terminal needs to translate the call voice in real time;
the voice quality processing unit is used for preprocessing the voice quality of the collected call voice when the voice quality processing unit detects that the preset condition is not met;
the transmitting unit is used for transmitting the collected call voice to a translation server for translating the call voice when the preset condition is detected to be met; when detecting that the preset condition is not met, sending the call voice processed by the tone quality processing unit to a translation server;
a translation result receiving unit that receives a translation result corresponding to the transmitted call voice from the translation server,
wherein the preset conditions include: and the translation server can carry out tone quality preprocessing on the received call voice to be translated.
10. The apparatus of claim 9, wherein the transmitting unit further transmits the translation result received from the translation server to the base station to be forwarded by the base station to another electronic terminal performing a voice call with the electronic terminal;
and/or the tone quality detection unit periodically detects whether a preset condition is met; or, the tone quality detection unit detects whether preset conditions are met in real time;
and/or, the tone quality preprocessing comprises: noise reduction processing or echo cancellation processing.
11. The apparatus of claim 10, wherein the voice quality processing unit performs voice quality post-processing on the translation result received from the translation server, wherein the transmission unit transmits the translation result processed by the voice quality processing unit to the base station, wherein the translation result is a translation result in a form of speech.
12. The apparatus of claim 11, wherein the timbre post-processing comprises: filtering processes and/or gain settings.
13. An apparatus for real-time translation of call speech, wherein the apparatus comprises:
the voice quality detection unit is used for detecting whether a preset condition is met or not when the electronic terminal needs to translate the call voice in real time;
the tone quality processing unit is used for preprocessing the tone quality of the call voice received from the base station when detecting that the preset condition is not met;
a transmitting unit which transmits the call voice received from the base station to a translation server for translating the call voice when it is detected that a preset condition is satisfied; when detecting that the preset condition is not met, sending the call voice processed by the tone quality processing unit to a translation server;
a translation result receiving unit that receives a translation result corresponding to the transmitted call voice from the translation server,
wherein the preset conditions include: and the translation server can carry out tone quality preprocessing on the received call voice to be translated.
14. The apparatus of claim 13, wherein the apparatus further comprises: an output unit that outputs the translation result received from the translation server;
and/or the tone quality detection unit periodically detects whether a preset condition is met; or, the tone quality detection unit detects whether preset conditions are met in real time;
and/or, the tone quality preprocessing comprises: noise reduction processing or echo cancellation processing.
15. The apparatus of claim 14, wherein the voice quality processing unit performs voice quality post-processing on the translation result received from the translation server, wherein the output unit outputs the processed translation result, wherein the translation result is a translation result in a form of speech.
16. The apparatus of claim 15, wherein the timbre post-processing comprises: filtering processes and/or gain settings.
17. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements a method for real-time translation of call speech according to any one of claims 1 to 4 and/or a method for real-time translation of call speech according to any one of claims 5 to 8.
18. An electronic terminal, wherein the electronic terminal comprises:
a processor;
a memory storing a computer program which, when executed by the processor, implements the method of real-time translation of call speech according to any one of claims 1 to 4 and/or the method of real-time translation of call speech according to any one of claims 5 to 8.
CN201910559564.1A 2019-06-26 2019-06-26 Method and equipment for translating call voice in real time Active CN110265061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910559564.1A CN110265061B (en) 2019-06-26 2019-06-26 Method and equipment for translating call voice in real time

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910559564.1A CN110265061B (en) 2019-06-26 2019-06-26 Method and equipment for translating call voice in real time

Publications (2)

Publication Number Publication Date
CN110265061A CN110265061A (en) 2019-09-20
CN110265061B true CN110265061B (en) 2021-08-20

Family

ID=67921604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910559564.1A Active CN110265061B (en) 2019-06-26 2019-06-26 Method and equipment for translating call voice in real time

Country Status (1)

Country Link
CN (1) CN110265061B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325039B (en) * 2020-01-21 2020-12-01 陈刚 Language translation method, system, program and handheld terminal based on real-time call

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101867632A (en) * 2009-06-12 2010-10-20 刘越 Mobile phone speech instant translation system and method
JP3216276U (en) * 2017-06-30 2018-05-24 丁紹傑 Improved microphone device for instant mobile phone translation
CN108280067A (en) * 2018-02-26 2018-07-13 深圳市百泰实业股份有限公司 Earphone interpretation method and system
CN108965614A (en) * 2018-07-13 2018-12-07 深圳市简能网络技术有限公司 A kind of call interpretation method and system
CN109285563A (en) * 2018-10-15 2019-01-29 华为技术有限公司 Voice data processing method and device during translation on line
CN109473095A (en) * 2017-09-08 2019-03-15 北京君林科技股份有限公司 A kind of intelligent home control system and control method
EP3467822B1 (en) * 2017-10-09 2020-06-24 Ricoh Company, Ltd. Speech-to-text conversion for interactive whiteboard appliances in multi-language electronic meetings

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101867632A (en) * 2009-06-12 2010-10-20 刘越 Mobile phone speech instant translation system and method
JP3216276U (en) * 2017-06-30 2018-05-24 丁紹傑 Improved microphone device for instant mobile phone translation
CN109473095A (en) * 2017-09-08 2019-03-15 北京君林科技股份有限公司 A kind of intelligent home control system and control method
EP3467822B1 (en) * 2017-10-09 2020-06-24 Ricoh Company, Ltd. Speech-to-text conversion for interactive whiteboard appliances in multi-language electronic meetings
CN108280067A (en) * 2018-02-26 2018-07-13 深圳市百泰实业股份有限公司 Earphone interpretation method and system
CN108965614A (en) * 2018-07-13 2018-12-07 深圳市简能网络技术有限公司 A kind of call interpretation method and system
CN109285563A (en) * 2018-10-15 2019-01-29 华为技术有限公司 Voice data processing method and device during translation on line

Also Published As

Publication number Publication date
CN110265061A (en) 2019-09-20

Similar Documents

Publication Publication Date Title
US10469967B2 (en) Utilizing digital microphones for low power keyword detection and noise suppression
US9191493B2 (en) Methods and devices for updating an adaptive filter for echo cancellation
US8781826B2 (en) Method for operating a speech recognition system
CN109727607B (en) Time delay estimation method and device and electronic equipment
CN104067341A (en) Voice activity detection in presence of background noise
CN109644192B (en) Audio delivery method and apparatus with speech detection period duration compensation
CN106663447B (en) Audio system with noise interference suppression
WO2014120291A1 (en) System and method for improving voice communication over a network
US10771621B2 (en) Acoustic echo cancellation based sub band domain active speaker detection for audio and video conferencing applications
EP2795884A1 (en) Audio conferencing
US20160293181A1 (en) Mechanism for facilitating watermarking-based management of echoes for content transmission at communication devices.
WO2014143447A1 (en) Voice recognition configuration selector and method of operation therefor
CN103997561A (en) Communication apparatus and voice processing method therefor
EP4040764A2 (en) Method and apparatus for in-vehicle call, device, computer readable medium and product
WO2019143429A1 (en) Noise reduction in an audio system
CN110265061B (en) Method and equipment for translating call voice in real time
CN110223694B (en) Voice processing method, system and device
US11363147B2 (en) Receive-path signal gain operations
WO2023040483A1 (en) Headphone working mode control method and apparatus, terminal, and medium
CN106782614B (en) Sound quality detection method and device
US20240105198A1 (en) Voice processing method, apparatus and system, smart terminal and electronic device
CN114979344A (en) Echo cancellation method, device, equipment and storage medium
CN115706875A (en) Method, device and equipment for optimizing talkback voice quality and storage medium
JP2010010856A (en) Noise cancellation device, noise cancellation method, noise cancellation program, noise cancellation system, and base station
CN110754097B (en) Call control method, device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant