WO2016101571A1

WO2016101571A1 - Voice translation method, communication method and related device

Info

Publication number: WO2016101571A1
Application number: PCT/CN2015/082390
Authority: WO
Inventors: 尚国强
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-12-22
Filing date: 2015-06-25
Publication date: 2016-06-30
Also published as: CN105786801A

Abstract

The present invention relates to the field of terminal applications and disclosed is a voice translation method, communication method and related device, the method comprising: acquiring a first voice; extracting a voice characteristic of the first voice; converting the first voice and acquiring translated voice data; performing voice fitting with respect to the translated voice data according to the voice characteristic to acquire a second voice. In the solution of the present invention, the second voice after translation can retain the voice characteristic of the speaker, therefore when applied to the terminal communication, the solution can provide a more realistic experience for a listening party.

Description

Voice translation method, communication method and related device

Technical field

This paper relates to the field of speech translation technology, especially a speech translation method, communication method and related devices.

Background technique

With the development of hardware technology and the development of software, including the rapid development of cloud computing technology, the artificial intelligence pattern recognition algorithm has obtained a large application environment, and the various data collected can be quickly calculated by the cloud computing platform. Better training results are obtained, which makes the various speech feature libraries more suitable for the actual use environment.

The use of Apple's siri application has stimulated the use of various voice technologies in the society. The development of voice technology has further liberated the hands of those who use smart terminals, which is also a great boost to the development of social productivity.

The current mobile phone does not have an instant translation function, and there are communication barriers when the languages of the two parties are different or there are dialects. Therefore, there is a need for a voice technology based on a cloud computing platform to realize instant translation during communication.

Summary of the invention

The technical problem to be solved by the present invention is to provide a speech translation method, a communication method and a related device, which can retain the translated speech feature of the speaker and improve the experience of the listener.

In order to solve the above technical problems, the following technical solutions are adopted:

A speech translation method comprising:

Obtaining the first voice;

Extracting a voice feature of the first voice;

Converting the first voice to obtain translated voice data;

Performing a speech fit on the translated speech data according to the speech feature to obtain a second speech.

Optionally, the step of converting the first voice to obtain translated voice data includes:

The first voice is converted based on a language database to obtain translated voice data.

Optionally, the voice feature comprises: a pitch of the first voice, or a pitch and an overtone of the first voice.

Optionally, the step of acquiring the first voice includes:

After the terminal starts the communication application, the first voice to be translated is obtained based on the communication application.

Optionally, after the step of performing the voice fitting on the translated voice data according to the voice feature to obtain the second voice, the method further includes:

The second voice is output.

Optionally, the step of acquiring the first voice of the language to be converted based on the communication application includes:

Acquiring, according to the communication application, the first voice of the language to be converted sent by the peer user;

Optionally, the step of acquiring the first voice of the language to be converted based on the communication application further includes:

Acquiring, according to the communication application, the first voice of the language to be converted input by the local user;

Optionally, the step of outputting the second voice includes:

The second voice is output to the local user.

Optionally, the step of outputting the second voice further includes:

The second voice is output to the peer user based on the communication application.

A speech translation apparatus includes a first acquisition module, an extraction module, a first conversion module, and a first fitting module, wherein:

The first obtaining module is configured to: acquire a first voice;

The extracting module is configured to: extract a voice feature of the first voice;

The first conversion module is configured to: convert the first voice to obtain a translated language Tone data

The first fitting module is configured to perform a voice fitting on the translated voice data according to the voice feature to obtain a second voice.

Optionally, the first acquiring module is configured to acquire the first voice as follows:

After the terminal starts the communication application, based on the communication application, the first voice of the language to be converted is obtained.

Optionally, the device further includes an output module, wherein

The output module is configured to: output the second voice.

Optionally, the first obtaining module includes a first acquiring subunit, where

The first obtaining sub-module is configured to: acquire, according to the communication application, the first voice of the language to be converted sent by the peer user.

Optionally, the first obtaining module further includes a second acquiring submodule, where:

The second obtaining sub-module is configured to: acquire, according to the communication application, the first voice to be translated input by the local user.

The beneficial effects of the above technical solution of the present invention are as follows:

The solution of the invention can translate the voice transmitted by the communication application, thereby facilitating communication between users. Since the translated second voice can retain the speaker's voice feature, it brings a more realistic experience to the listening party when applied to the terminal communication.

BRIEF abstract

1 is a schematic diagram of steps of a voice translation method according to an embodiment of the present invention;

2 is a schematic diagram of steps of a communication method according to an embodiment of the present invention;

3 is a schematic flowchart of a communication method applied to a voice call according to an embodiment of the present invention;

4 is a schematic flowchart of a communication method applied to communication software according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a voice translation apparatus according to an embodiment of the present invention; FIG.

FIG. 6 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Preferred embodiment of the invention

The detailed description will be made below in conjunction with the accompanying drawings and specific embodiments.

It is an object of the present invention to provide a solution that enables instant translation during communication. The related speech translation technology can not retain the speaker's voice characteristics, so the translated voice data has a sense of violation on the tone, which is not conducive to the user to understand. To solve this problem, an embodiment of the present invention provides a voice translation method, as shown in FIG. 1, including:

Step 11: Acquire a first voice;

Step 12: Extract a voice feature of the first voice;

Step 13, converting the first voice to obtain translated voice data;

Step 14: Perform speech fitting on the translated voice data according to the voice feature to obtain a second voice.

According to the above steps 11 to 14, it can be known that the present embodiment extracts the voice features of the original voice before performing the translation, and after the translation, restores the translated voice data to the speaker's tone according to the extracted voice features, so that The listener gets a more realistic experience and is good for understanding.

Specifically, in the above step 13, the first voice is converted based on a language database to obtain translated voice data.

Exemplarily, the language database may be saved locally, and after the first voice is acquired, the first voice is language-recognized and translated according to the local language database. In addition, the language database can also be set on the server side to achieve online translation. It should be noted that the translation of this embodiment may be a conversion between languages, or a conversion between different dialects in the same language.

Specifically, the voice feature in this embodiment includes: a pitch of the first voice, or a pitch and an overtone of the first voice.

The pitch is the sound produced by the overall vibration of the sounding body, and the pitch determines the pitch. The sound produced by the partial vibration of the sound body is called overtone, and the overtone determines the tone. This embodiment can be translated by the pitch feature. The voice data is restored back to the speaker's original pitch. As a preferred solution, the overtone feature can also be combined to restore the translated speech data to achieve better results.

In addition, another embodiment of the present invention provides a communication method applied to a terminal, which can translate the voice of the communication party to the other party in the real-time. As shown in FIG. 2, the communication method includes:

Step 21: After the terminal starts the communication application, acquiring the first voice to be translated based on the communication application;

Step 22, extracting a voice feature of the first voice;

Step 23: Convert the first voice to obtain translated voice data.

Step 24: Perform a speech fitting on the translated voice data according to the voice feature to obtain a second voice;

In step 25, the second voice is output.

Specifically, the translation process may be performed on the terminal on the receiving side, that is, in the foregoing step 21, the first voice to be translated sent by the peer user is obtained based on the communication application run by the terminal; The end user outputs the second voice.

In practical applications, the user can translate the received voice sent from the peer end on his own terminal.

In addition, the translation process can be performed on the terminal on the transmitting side, that is, in the above step 21, based on the communication application run by the terminal, the first voice to be translated input by the local user is acquired; in the above step 25, the terminal is operated. The communication application outputs the second voice to the peer user.

In practical applications, the user can translate the spoken voice on his own terminal and then send it to the peer. Even if the peer device does not adopt the scheme of the embodiment, the translated voice can be received, thereby achieving normal communication.

The application scenario of the communication method in the embodiment of the present invention is introduced below.

In application scenario 1, the communication parties are voice calls, and the calling terminal is configured to convert the Cantonese spoken by the calling user into the Shanghai language, and then send it to the called end, as shown in FIG. 2, and the specific process includes:

A1, the translation system is configured on the calling terminal, and the language feature library of the translation system is configured, such as configuration闽 a mixed feature library of Cantonese and Shanghainese;

A2, the calling terminal establishes a voice call with the called terminal, and obtains the first voice input by the calling user through the microphone of the calling terminal;

A3, the calling terminal extracts the pitch of the first voice (may also include overtones);

A4. The calling terminal converts the first voice based on the mixed feature library to obtain the translated voice data.

A5. The calling terminal performs voice fitting on the translated voice data according to the extracted pitch, and obtains a second voice that conforms to the speaking voice of the calling user.

A6, the calling end performs voice processing and modulation on the second voice;

A7, the calling terminal sends the modulated signal to the called terminal, and the called terminal receives the modulated signal and performs demodulation processing to obtain and play the second voice. At this time, the second voice played at the called terminal is already the translated Shanghai language.

In the application scenario 1, the called terminal does not need to perform additional configuration, so the solution has high practicability. In addition, the calling terminal may only send the translated second voice to the called terminal, so as to prevent the first voice from causing interference to the called user.

In application scenario 2, the two parties communicate based on the instant messaging software. After receiving the Japanese voice file sent by the calling user, the called terminal translates it into Chinese and plays it to the called user. The specific process includes:

B1, setting Japanese translation software on the called terminal, and allowing the instant messaging software to call the Japanese translation software;

B2, the called user obtains and saves the Japanese voice file sent by the calling user through the instant messaging software;

B3, the instant messaging software extracts the pitch of the Japanese voice file, and invokes the Japanese translation software to translate the Japanese voice file to obtain a Chinese voice file;

B4, the instant messaging software performs speech fitting on the Chinese voice file through the extracted pitch, and restores the Chinese voice file to the pitch of the calling user;

B5, the instant messaging software can, but does not necessarily, save the fitted Chinese voice file instead of the translated Japanese voice file, and play the saved Chinese phonetic text to the called user through the called user operation or automatically.

In application scenario 2, the translation step can be performed by a voice translation software provided by a third party, and the instant messaging software only needs to invoke the voice translation software to perform real-time voice translation. In practical applications, the called user can download and install the corresponding translation APP according to their own translation requirements, and then associate the instant messaging software with the translation APP.

In addition, another embodiment of the present invention further provides a voice translation apparatus, as shown in FIG. 5, including:

The first obtaining module 501 is configured to: acquire the first voice;

The extracting module 502 is configured to: extract a voice feature of the first voice;

The first conversion module 503 is configured to: convert the first voice to obtain translated voice data;

The first fitting module 504 is configured to perform voice fitting on the translated voice data according to the voice feature to obtain a second voice.

The voice feature includes: a pitch of the first voice, or a pitch and an overtone of the first voice.

Optionally, the first obtaining module 501 is configured to acquire the first voice as follows:

Optionally, the device further includes an output module, wherein

The output module is configured to: output the second voice.

Optionally, the first obtaining module 501 includes a first acquiring subunit, where

Optionally, the first obtaining module 501 further includes a second acquiring submodule, where:

Optionally, the output module includes a first output submodule, wherein:

The first output submodule is configured to output the second voice to the local user.

Optionally, the output module further includes a second output submodule, wherein:

The second output submodule is configured to output the second voice to a peer user based on the communication application.

The speech translation apparatus of the present embodiment extracts the speech features of the original speech before performing the translation, and after the translation, restores the translated speech data to the speaker's tone according to the extracted speech features, so that the listener can more easily understand.

Obviously, the speech translation apparatus of the present embodiment can achieve the same technical effects as the speech translation method described above.

In addition, an embodiment of the present invention further provides a terminal, as shown in FIG. 6, including:

The second obtaining module 601 is configured to: after the terminal starts the communication application, acquire the first voice of the language to be converted based on the communication application;

The second extraction module 602 is configured to: extract a voice feature of the first voice;

The second conversion module 603 is configured to: convert the first voice to obtain translated voice data;

The second fitting module 604 is configured to perform a voice fitting on the translated voice data according to the voice feature to obtain a second voice;

The output module 605 is configured to: output the second voice.

The second obtaining module 601 includes:

The first obtaining sub-module is configured to: obtain, according to the communication application, a first voice of a language to be converted sent by the peer user;

The output module 605 includes:

The first output submodule is configured to: output the second voice to the local user.

In addition, the second obtaining module 601 further includes:

The second obtaining sub-module is configured to: obtain, according to the communication application, the first voice to be translated input by the local user;

The output module 605 further includes:

The second output submodule is configured to: output the second voice to the peer user based on the communication application.

The embodiment of the invention further discloses a computer program, comprising program instructions, when the program instruction is executed by the terminal, so that the terminal can perform any of the above methods for detecting wireless network access security.

The embodiment of the invention also discloses a carrier carrying the computer program.

Obviously, the terminal of this embodiment can achieve the same technical effect corresponding to the communication method described above.

The above is a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.

Industrial applicability

With the solution of the invention, the translated second voice can retain the speaker's voice feature, so that when applied to the terminal communication, a more realistic experience is brought to the listening party. Therefore, the present invention has strong industrial applicability.

Claims

A speech translation method comprising:

Obtaining the first voice;

Extracting a voice feature of the first voice;

Converting the first voice to obtain translated voice data;

Performing a speech fit on the translated speech data according to the speech feature to obtain a second speech.
The speech translation method according to claim 1, wherein the converting the first speech to obtain the translated speech data comprises:

The first voice is converted based on a language database to obtain translated voice data.
The speech translation method according to claim 1, wherein

The voice feature includes: a pitch of the first voice, or a pitch and an overtone of the first voice.
The speech translation method according to claim 1, wherein the step of acquiring the first speech comprises:

After the terminal starts the communication application, the first voice to be translated is obtained based on the communication application.
The speech translation method according to claim 1, wherein after the step of performing a speech fitting on the translated speech data according to the speech feature to obtain a second speech, the method further comprises:

The second voice is output.
The speech translation method according to claim 4, wherein the step of acquiring the first speech of the language to be converted based on the communication application comprises:

And acquiring, according to the communication application, the first voice of the language to be converted sent by the peer user.
The speech translation method according to claim 6, wherein the step of acquiring the first speech of the language to be converted based on the communication application further comprises:

And acquiring, according to the communication application, the first voice of the language to be converted input by the local user.
The speech translation method according to claim 4, wherein said outputting said second speech The steps include:

The second voice is output to the local user.
The speech translation method according to claim 8, wherein the step of outputting the second speech further comprises:

The second voice is output to the peer user based on the communication application.
A speech translation apparatus includes a first acquisition module, an extraction module, a first conversion module, and a first fitting module, wherein:

The first obtaining module is configured to: acquire a first voice;

The extracting module is configured to: extract a voice feature of the first voice;

The first conversion module is configured to: convert the first voice to obtain translated voice data;

The first fitting module is configured to perform a voice fitting on the translated voice data according to the voice feature to obtain a second voice.
A speech translation apparatus according to claim 10, wherein

The voice feature includes: a pitch of the first voice, or a pitch and an overtone of the first voice.
The speech translation apparatus according to claim 10, wherein the first acquisition module is configured to acquire the first speech as follows:

After the terminal starts the communication application, based on the communication application, the first voice of the language to be converted is obtained.
A speech translation apparatus according to claim 10, wherein the apparatus further comprises an output module, wherein

The output module is configured to: output the second voice.
The speech translation apparatus according to claim 10, wherein said first acquisition module comprises a first acquisition subunit, wherein

The first obtaining sub-module is configured to: acquire, according to the communication application, the first voice of the language to be converted sent by the peer user.
The speech translation apparatus according to claim 14, wherein said first acquisition module further A second acquisition submodule is included, wherein:

The second obtaining sub-module is configured to: acquire, according to the communication application, the first voice to be translated input by the local user.