WO2019071541A1

WO2019071541A1 - Voice translation method, apparatus, and terminal device

Info

Publication number: WO2019071541A1
Application number: PCT/CN2017/105915
Authority: WO
Inventors: 熊宽; 杨汉丹; 王广新; 郑勇
Original assignee: 深圳市沃特沃德股份有限公司
Priority date: 2017-10-12
Filing date: 2017-10-12
Publication date: 2019-04-18

Abstract

Disclosed by the present invention are a voice translation method, apparatus, and a terminal device, said method comprising the following steps: extracting an original voiceprint from original voice information; performing translation processing on said original voice information to obtain translation information; combining said translation information and said original voiceprint into final voice information such that the final voice information is identical to the voiceprint of the original voice information, thus achieving the effect of translating the original audio and increasing the vividness and realness of a translation voice.

Description

Speech translation method, device and terminal device

[0001] The present invention relates to the field of communications technologies, and in particular, to a voice translation method, apparatus, and terminal device.

[0002] A translator can translate voice information in one language into voice information in another language, so people using different languages can use the translator to achieve barrier-free communication and communication. The specific process of the speech translation by the translator is: receiving the original voice information of the user, and transmitting the original voice information to the server, and the server obtains the target voice information by performing a series of translation processing on the original voice information, such as voice recognition, character translation, and speech synthesis. Returned to the translator, the translator outputs the target voice information.

[0003] The voiceprint of the target voice information generated by the server translation is preset, so all the translated voices sound the same person's voice, monotonous, and make people feel that they are talking to the robot, not to the real person. Dialogue, lack of realism and human touch, can easily cause hearing fatigue, and the user experience is not good.

technical problem

[0004] A main object of the present invention is to provide a speech translation method, apparatus and terminal device, which aim to improve the authenticity and vividness of translated speech and enhance the user experience. Problem solution

Technical solution

[0005] In order to achieve the above objective, an embodiment of the present invention provides a voice translation method, where the method includes the following steps.

Extracting the original voiceprint from the original voice information;

[0007] performing translation processing on the original voice information to obtain translation information;

[0008] synthesizing the translation information and the original voiceprint into final voice information.

[0009] Optionally, the translation information is target voice information, and the step of synthesizing the translation information and the original voiceprint into final voice information includes:

[0010] culling the preset voiceprint in the target voice information to obtain target voice information of the voiceless pattern; [0011] synthesizing the original voiceprint into the target voice information of the voiceless tone to generate final voice information.

[0012] Optionally, the step of culling the preset voiceprint in the target voice information comprises:

[0013] extracting a preset voiceprint from the target voice information;

[0014] performing signal subtraction on the target voice information and the preset voiceprint to obtain a target voice f π information of the voiceless pattern.

[0015] Optionally, the step of synthesizing the original voiceprint into the target voice information of the voiceless voice, and generating the final voice information includes:

[0016] performing signal addition on the original voiceprint and the target voice information of the voiceless voice to obtain a final voice f π.

[0017] Optionally, the step of performing translation processing on the original voice information to obtain translation information includes: sending the original voice information to a first server, so that the first server The original voice information is translated into target voice information;

[0019] receiving the target voice information returned by the first server.

[0020] Optionally, the translation information is a target language string, and the step of synthesizing the translation information and the original voiceprint into final voice information includes:

[0021] performing speech synthesis on the target language string using the original voiceprint to generate final voice information.

[0022] Optionally, the step of performing translation processing on the original voice information to obtain translation information includes: sending the original voice information to a second server, so that the second server Translating the original voice information into a target language string;

[0024] receiving the target language string returned by the second server.

[0025] Optionally, the step of performing translation processing on the original voice information to obtain translation information includes: [0026] performing voice recognition on the original voice information to generate an original language string;

[0027] translating the original language string into a target language string.

[0028] Optionally, after the step of synthesizing the translation information and the original voiceprint into final voice information, the method further includes:

[0029] outputting the final voice information.

[0030] Optionally, after the step of synthesizing the translation information and the original voiceprint into final voice information, the method further includes: [0031] transmitting the final voice information outward.

[0032] Embodiments of the present invention also provide a voice translation apparatus, where the apparatus includes:

[0033] an extraction module, configured to extract an original voiceprint from the original voice information;

[0034] a processing module, configured to perform translation processing on the original voice information, to obtain translation information;

[0035] a synthesizing module, configured to synthesize the translation information and the original voiceprint into final voice information.

[0036] Optionally, the translation information is target voice information, and the synthesizing module includes:

[0037] a voiceprint culling unit, configured to remove a preset voiceprint in the target voice information, to obtain a target utterance of the voiceless tone;

And a voiceprint synthesizing unit, configured to synthesize the original voiceprint into the target voice information of the voiceless tone to generate final voice information.

[0039] Optionally, the voiceprint culling unit comprises:

[0040] a voiceprint extraction subunit, configured to extract a preset voiceprint from the target voice information;

[0041] a subtraction subunit, configured to perform signal subtraction on the target voice information and the preset voiceprint to obtain target voice information of the voiceless pattern.

[0042] Optionally, the voiceprint synthesis unit is configured to: perform signal addition on the original voiceprint and the voiceless target voice information to obtain final voice information.

[0043] Optionally, the processing module includes:

[0044] a first sending unit, configured to send the original voice information to the first server, so that the first server translates the original voice information into target voice information;

[0045] The first receiving unit is configured to receive the target voice information returned by the first server.

[0046] Optionally, the translation information is a target language string, and the synthesizing module is configured to: perform speech synthesis on the target language string by using the original voiceprint to generate final speech information.

[0047] Optionally, the processing module includes:

[0048] a second sending unit, configured to send the original voice information to the second server, so that the second server translates the original voice information into a target language string;

[0049] a second receiving unit, configured to receive the target language string returned by the second server.

[0050] Optionally, the processing module includes:

[0051] a voice recognition unit, configured to perform voice recognition on the original voice information, and generate a original language string [0052] a character translation unit, configured to translate the original language string into a target language string.

[0053] Optionally, the device further includes an output module, configured to output the final voice information.

[0054] Optionally, the device further includes a sending module, configured to send the final voice information outward.

[0055] Embodiments of the present invention further provide a terminal device, where the terminal device includes a memory, a processor, and at least one application stored in the memory and configured to be executed by the processor, the application The program is configured to perform the aforementioned speech translation method.

Advantageous effects of the invention

Beneficial effect

[0056] A speech translation method provided by an embodiment of the present invention, the original voiceprint is extracted from the original voice information, and the translated information and the original voiceprint are synthesized into the final voice information, so that the final voice information and the original voice information are obtained. The voiceprints are the same, it sounds like the other users have spoken the translated language, realizing the effect of the original sound translation, and promoting the human-machine dialogue as a direct dialogue between people, improving the vividness and authenticity of the translated voice, and improving The user experience.

Brief description of the drawing

DRAWINGS

1 is a flow chart of an embodiment of a speech translation method of the present invention;

2 is a block diagram showing an embodiment of a speech translation apparatus of the present invention;

3 is a block diagram of the processing module of FIG. 2;

4 is a block diagram of still another module of the processing module of FIG. 2;

[0061] FIG. 5 is another block diagram of the processing module of FIG. 2;

6 is a block diagram of the synthesis module of FIG. 2;

7 is a block diagram of the voiceprint culling unit of FIG. 6.

[0064] The implementation, functional features, and advantages of the present invention will be further described with reference to the accompanying drawings.

BEST MODE FOR CARRYING OUT THE INVENTION

[0065] It should be understood that the specific embodiments described herein are merely illustrative of the present invention and are not intended to limit the present invention. Bright.

The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are intended to be illustrative of the invention and are not to be construed as limiting.

[0067] The singular forms "a", "the", "the" It will be further understood that the phrase "comprising", used in the <RTI ID=0.0> </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> is intended to mean the presence of the features, integers, steps, operations, components and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, components, components, and/or their groups. It will be understood that when we refer to an element being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element, or an intermediate element can be present. Further, "connected" or "coupled" as used herein may include either a wireless connection or a wireless coupling. The phrase "and/or" used herein includes all or any of the elements and all combinations of one or more of the associated listed.

[0068] Those skilled in the art will appreciate that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention belongs, unless otherwise defined. It should also be understood that terms such as those defined in a general dictionary should be understood to have meaning consistent with the meaning in the context of the prior art, and will not be idealized or excessive unless specifically defined as here. The formal meaning is explained.

[0069] Those skilled in the art can understand that the "terminal" and "terminal device" used herein include both a device of a wireless signal receiver, a device having only a wireless signal receiver without a transmitting capability, and a receiving and receiving device. A device that transmits hardware having a receiving and transmitting hardware capable of performing two-way communication over a two-way communication link. Such a device may comprise: a cellular or other communication device having a single line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (Persona 1 Communications Service), which may combine voice, Data processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant), which can include radio frequency receivers, pagers, Internet/Intranet access, web browsers, notepads, calendars and/or GPS ( Global Positioning System, receiver; conventional laptop and/or palmtop computer or other device having conventional laptop and/or palm type with and/or including a radio frequency receiver Computer or other device. As used herein, "terminal", "terminal device" may be portable, transportable, installed in a vehicle (aviation, sea and/or land), or adapted and/or configured to operate locally, and/or Run in any other location on the Earth and/or space in a distributed fashion. The "terminal" and "terminal device" used herein may also be a communication terminal, an internet terminal, a music/video playback terminal, and may be, for example, a PDA, a MID (Mobile Internet Device), and/or have a music/video playback. Functional mobile phones can also be smart TVs, set-top boxes and other devices.

[0070] Those skilled in the art can understand that the server used herein includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud composed of a plurality of servers. Here, the cloud consists of a large number of computers or network servers based on Cloud Computing, which is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computers. In the embodiment of the present invention, communication may be implemented by any communication means between the server, the terminal device and the WNS server, including but not limited to, mobile communication based on 3GPP, LTE, WIMAX, and computer network communication based on TCP/IP and UDP protocols. And short-range wireless transmission based on Bluetooth and infrared transmission standards.

The speech translation method of the embodiment of the present invention can be applied to a translation device, a mobile terminal (such as a mobile phone, a tablet, etc.), a personal computer, and the like, and can also be applied to a server. The following is a detailed description of the application to the terminal device.

Referring to FIG. 1, an embodiment of a speech translation method according to the present invention is provided. The method includes the following steps: [0073] Sl l extracts an original voiceprint from the original voice information.

In the embodiment of the present invention, the original voice information may be the voice information of the user that is collected by the terminal device through the microphone on the spot, or may be the voice information to be translated obtained from the outside (such as the peer device). After the terminal device collects the original voice information, it preferably collects the original voice information through a microphone array composed of multiple microphones, and uses the beamforming and noise reduction processing methods of the microphone array to reduce the influence of environmental noise on the later processing and improve the voice quality.

[0075] After acquiring the original voice information, the terminal device immediately extracts the original voiceprint from the original voiceprint, and stores the original voiceprint. The terminal device can perform voiceprint extraction on the original voice information by using a wavelet transform algorithm in the prior art, and extract feature information of the original voiceprint in the domain and the frequency domain. The specific extraction method is the same as the prior art, and is not described here. [0076] In other embodiments, when applied to the server, the original voice information is from the terminal device, and the server receives the original voice information sent by the terminal device, and extracts the original voiceprint therefrom.

[0077] S12: Perform translation processing on the original voice information to obtain translation information.

[0078] The terminal device may perform translation processing on the original voice information locally, or may perform translation processing on the original voice information through the server. The translation information obtained by the terminal device may be the target voice information or the target language string.

[0079] Optionally, the terminal device sends the original voice information to the first server, so that the first server translates the original voice information into the target voice information. After receiving the original voice information, the first server performs voice recognition on the original voice information, generates a original language string, and then translates the original language string into a target language string, and finally uses the preset voiceprint to perform the target language string. Speech synthesis, generating target speech information, and returning the target speech information to the terminal device. The terminal device receives the target voice information returned by the first server.

[0080] Optionally, the terminal device sends the original voice information to the second server, so that the second server translates the original voice information into a target language string. After receiving the original voice information, the second server performs voice recognition on the original voice information, generates a original language string, and then translates the original language string into a target language string, and returns the target language string to the terminal device. The terminal device receives the target language string returned by the second server.

[0081] Optionally, the terminal device directly performs voice recognition on the original voice information, generates an original language string, and then translates the original language string into a target language string.

[0082] In other embodiments, when applied to a server, the server performs speech recognition on the original speech information, generates a raw language string, and then translates the original language string into a target language string.

[0083] S13. Synthesize the translation information and the original voiceprint into final voice information.

[0084] Optionally, when the translation information is the target voice information, the terminal device first rejects the preset voiceprint in the target voice information to obtain the target voice information of the voiceless pattern; and then synthesizes the original voiceprint into the target voice of the voiceless pattern. In the message, the final voice message is generated.

[0085] After the preset voiceprint is removed, the terminal device may first extract the preset voiceprint from the target voice information, for example, using the wavelet transform algorithm in the prior art to extract the voiceprint of the target voice information, and extract the preset. Characteristic information of the voice field and the frequency domain; then performing signal subtraction on the target voice information and the preset voiceprint, You can get the target voice information without voice. It can be understood by those skilled in the art that, besides this, the voiceprint culling can also be performed by other methods in the prior art, and the present invention will not be described again.

[0086] After the voiceprint synthesis, the terminal device can perform signal addition on the original voiceprint and the voiceless target voice information to obtain the final voice information, so that the final voice information sounds like the user's original sound, and the original sound is realized. translation. It can be understood by those skilled in the art that, besides this, the voiceprint synthesis can also be performed by other means in the prior art, and the present invention will not be repeated here.

[0087] Optionally, when the translation information is the target language string, the terminal device directly synthesizes the target language string by using the original voiceprint to generate the final voice information. The terminal device can perform speech synthesis using the existing speech synthesis technology, and will not be described here.

[0088] After the final voice information is generated, the terminal device may directly output the final voice information, such as outputting the final voice information through a sounding device such as an earpiece or a speaker; or may send the final voice information to the outside device, for example, to the peer device.

[0089] In other embodiments, when applied to a server, the server directly synthesizes the target language string using the original voiceprint to generate final voice information. And send the final voice message to the terminal device

[0090] For example:

[0091] The translation machine (terminal device) collects the original voice information, and proposes the original voiceprint from the original voice information to be stored locally, and transmits the original voice information to the server. The server translates the original voice information into target voice information and returns it to the translator. The translation machine receives the target voice information returned by the server, rejects the preset voiceprint in the target voice information, synthesizes the original voiceprint into the target voice information of the voiceless voice, generates the final voice information, and outputs the final voice information. Therefore, two users using different languages can use the translator to conduct face-to-face conversation, and the translated final voice information output by the translator is the same as the voiceprint of the user, which is equivalent to the user speaking the translated language and realizing the original sound. The effect of translation.

[0092] The mobile terminal (terminal device) collects the original voice information, and proposes the original voiceprint from the original voice information to be stored locally, and sends the original voice information to the server. The server translates the original voice information into the target voice information and returns it to the mobile terminal. The mobile terminal receives the target voice information returned by the server, removes the preset voiceprint in the target voice information, synthesizes the original voiceprint into the target voice information of the voiceless voice, generates the final voice information, and sends the final voice information to the opposite end. Thus two different languages The user of the speech can use the mobile terminal to conduct a remote conversation, and the final voice information after translation is the same as the voiceprint of the user, which is equivalent to the user speaking the translated language and realizing the effect of the original sound translation.

[0093] The server receives the original voice information sent by the terminal device, proposes the original voiceprint from the original voice information, performs voice recognition on the original voice information, generates a target language string, and performs voice synthesis on the target language string by using the original voiceprint. The final voice information is generated, and the final voice information is returned to the terminal device or the peer device of the terminal device (ie, the device that establishes a communication connection with the terminal device). Since the final speech information after translation is the same as the user's voiceprint, it is equivalent to the user's own spoken language and the effect of the original sound translation.

[0094] The speech translation method of the embodiment of the present invention extracts the original voiceprint from the original voice information, and then synthesizes the translation information and the original voiceprint into the final voice information, so that the final voice information is the same as the voiceprint of the original voice information. It sounds like the other user has spoken the translated language, realized the effect of the original sound translation, and promoted the human-machine dialogue as a direct dialogue between people, which improved the vividness and authenticity of the translated voice and improved the user experience.

[0095] Referring to FIG. 2, an embodiment of a speech translation apparatus of the present invention is provided. The apparatus includes an extraction module 10, a processing module 20, and a synthesis module 30, wherein: an extraction module 10 is configured to extract originals from original voice information. a voiceprint; a processing module 20, configured to perform translation processing on the original voice information to obtain translation information; and a synthesis module 30, configured to synthesize the translation information and the original voiceprint into final voice information.

[0096] The extraction module 10 may perform voiceprint extraction on the original voice information by using a wavelet transform algorithm in the prior art, and extract feature information of the original voiceprint in the domain and the frequency domain. The specific extraction method is the same as the prior art, and is not described here.

[0097] The translation information obtained by the processing module 20 may be the target voice information, or may be the target language string.

[0098] Optionally, as shown in FIG. 3, the processing module 20 includes a first sending unit 21 and a first receiving unit 22, where: the first sending unit 21 is configured to send original voice information to the first server, so that The first server translates the original voice information into the target voice information. The first receiving unit 22 is configured to receive the target voice information returned by the first server.

[0099] Optionally, as shown in FIG. 4, the processing module 20 includes a second sending unit 23 and a second receiving unit 24, The second sending unit 23 is configured to send the original voice information to the second server, so that the second server translates the original voice information into the target language string; the second receiving unit 24 is configured to receive the second server return. The target language string.

[0100] Optionally, as shown in FIG. 5, the processing module 20 includes a voice recognition unit 25 and a character translation unit 26, where: a voice recognition unit 25 is configured to perform voice recognition on the original voice information to generate an original language string; The character translation unit _{26 is} configured to translate the original language string into a target language string.

[0101] After the processing module 20 obtains the translation information, the synthesizing module 30 synthesizes the translation information and the original voiceprint into a final speech.

[0102] Optionally, when the translation information is the target voice information, the synthesizing module 30 includes a voiceprint culling unit 31 and a voiceprint compositing unit 32, as shown in FIG. 6, wherein: the voiceprint culling unit 31 is configured to remove the target. The preset voiceprint in the voice information obtains the target voice information of the voiceless voice; the voiceprint synthesis unit 32 is configured to synthesize the original voiceprint into the target voice information of the voiceless voice to generate the final voice information.

[0103] In the embodiment of the present invention, the voiceprint culling unit 31 includes a voiceprint extraction sub-unit 311 and a subtraction sub-unit 312, as shown in FIG. 7, wherein: a voiceprint extraction sub-unit 311 is used for the target voice information. Extracting the preset voiceprint, for example, using the wavelet transform algorithm in the prior art to perform voiceprint extraction on the target voice information, extracting feature information of the preset voiceprint in the 吋 domain and the frequency domain; and a subtraction sub-unit 312 for Signal subtraction is performed on the target voice information and the preset voiceprint to obtain target voice information of the voiceless pattern.

[0104] It can be understood by those skilled in the art that, besides this, voiceprint culling can also be performed by other methods in the prior art, and the present invention will not be described again.

[0105] After performing voiceprint synthesis, the voiceprint synthesizing unit 32 may perform signal addition on the original voiceprint and the voiceless target voice information to obtain final voice information, so that the final voice information sounds like the user's original sound. Realized the original sound translation. It can be understood by those skilled in the art that, besides this, the voiceprint synthesis can also be performed by other means in the prior art, and the present invention will not be described again.

[0106] Optionally, when the translation information is the target language string, the synthesizing module 30 directly synthesizes the target language string by using the original voiceprint to generate final speech information. The synthesis module 30 can perform speech synthesis using the existing speech synthesis technology, and will not be described here.

[0107] Further, the apparatus may further include an output module for outputting final voice information. For example, the output module outputs the final voice information through a sounding device such as an earpiece or a speaker. [0108] Further, the apparatus further includes a sending module, configured to send the final voice information outward, such as to the terminal device.

The voice translation device of the embodiment of the present invention can be applied to a terminal device such as a translation machine, a mobile terminal (such as a mobile phone, a tablet, etc.), a personal computer, or the like, and can also be applied to a server, which is not limited by the present invention.

[0110] The speech translation apparatus of the embodiment of the present invention extracts the original voiceprint from the original voice information, and then synthesizes the translation information and the original voiceprint into the final voice information, so that the final voice information is the same as the voiceprint of the original voice information. It sounds like the other user has spoken the translated language, realized the effect of the original sound translation, and promoted the human-machine dialogue as a direct dialogue between people, which improved the vividness and authenticity of the translated voice and improved the user experience.

[0111] The present invention also provides a terminal device including a memory, a processor, and at least one application stored in the memory and configured to be executed by the processor, the application being configured to Used to perform a speech translation method. The speech translation method comprises the steps of: extracting an original voiceprint from the original voice information; performing translation processing on the original voice information to obtain translation information; and synthesizing the translation information and the original voiceprint into final voice information. The speech translation method described in this embodiment is the speech translation method involved in the above embodiment of the present invention, and details are not described herein again.

Those skilled in the art will appreciate that the present invention includes apparatus related to performing one or more of the operations described herein. These devices may be specially designed and manufactured for the required purposes, or may also include known devices in a general purpose computer. These devices have computer programs stored therein that are selectively activated or reconfigured. Such computer programs may be stored in a device (eg, computer) readable medium or in any type of medium suitable for storing electronic instructions and respectively coupled to a bus, including but not limited to any Types of disks (including floppy disks, hard disks, CDs, CD-ROMs, and magneto-optical disks), ROM (Read-Only Memory), RAM (Random Access Memory), EPROM (Erasable Programmable Read-Only)

Memory, rewritable programmable read only memory), EEPROM (Electrically Erasable

Programmable Read-Only Memory, Flash, Magnetic Card or Light Card. That is, a readable medium includes any medium that is stored or transmitted by a device (eg, a computer) in a form that is readable. [0113] Those skilled in the art will appreciate that each block of the block diagrams and/or block diagrams and/or flow diagrams can be implemented by computer program instructions, and/or in the block diagrams and/or block diagrams and/or flow diagrams. The combination of boxes. Those skilled in the art will appreciate that these computer program instructions can be implemented by a general purpose computer, a professional computer, or a processor of other programmable data processing methods, such that the processor is executed by a computer or other programmable data processing method. The block diagrams and/or block diagrams of the invention and/or the schemes specified in the blocks or blocks of the flow diagram are invented.

[0114] Those skilled in the art can understand that the various operations, methods, and steps, measures, and solutions in the present invention may be alternated, changed, combined, or deleted. Further, various operations, methods, and other steps, measures, and arrangements in the process of the present invention may be alternated, changed, rearranged, decomposed, combined, or deleted. Further, the steps, measures, and solutions in the various operations, methods, and processes disclosed in the prior art may be alternated, changed, rearranged, decomposed, combined, or deleted.

The above is only a preferred embodiment of the present invention, and is not intended to limit the scope of the invention, and the equivalent structure or equivalent process transformations made by the description of the invention and the drawings are used directly or indirectly. Other related technical fields are equally included in the scope of patent protection of the present invention.

Claims

Claim

A speech translation method, comprising the steps of:

Extracting the original voiceprint from the original voice information;

Translating the original voice information to obtain translation information;

The translation information and the original voiceprint are synthesized into final voice information.

The speech translation method according to claim 1, wherein the translation information is target speech information, and the step of synthesizing the translation information and the original voiceprint into final speech information comprises:

The preset voiceprint in the target voice information is eliminated, and the target voice information of the voiceless pattern is obtained; the original voiceprint is synthesized into the target voice information of the voiceless tone to generate a final voice.

The speech translation method according to claim 2, wherein the step of culling the preset voiceprint in the target voice information comprises:

Extracting a preset voiceprint from the target voice information;

Performing signal subtraction on the target voice information and the preset voiceprint to obtain voiceless target voice information.

The speech translation method according to claim 2, wherein the step of synthesizing the original voiceprint into the target voice information of the voiceless tone to generate final voice information comprises:

A signal addition operation is performed on the original voiceprint and the target voice information of the voiceless voice to obtain final voice information.

The speech translation method according to claim 1, wherein the step of performing translation processing on the original speech information to obtain translation information comprises:

Transmitting the original voice information to the first server, so that the first server translates the original voice information into target voice information;

Receiving the target voice information returned by the first server.

The speech translation method according to claim 1, wherein the translation information is a target language string, and the translation information and the original voiceprint are synthesized into a final language The steps of the audio information include:

The target speech string is synthesized by the original voiceprint to generate final speech information.

The speech translation method according to claim 6, wherein the step of performing translation processing on the original speech information to obtain translation information comprises:

Transmitting the original voice information to a second server, so that the second server translates the original voice information into a target language string;

Receiving the target language string returned by the second server.

Performing speech recognition on the original speech information to generate an original language string; translating the original language string into a target language string.

The speech translation method according to claim 1, wherein the step of synthesizing the translation information and the original voiceprint into final speech information further comprises: outputting the final speech information.

The speech translation method according to claim 1, wherein the step of synthesizing the translation information and the original voiceprint into final speech information further comprises: transmitting the final speech information outward.

A speech translation device, comprising:

An extraction module, configured to extract an original voiceprint from the original voice information;

a processing module, configured to perform translation processing on the original voice information to obtain translation information; a synthesis module, configured to synthesize the translation information and the original voiceprint into final voice information, according to the voice translation device of claim 11. The translation information is the target voice information, and the synthesizing module includes:

a voiceprint culling unit, configured to remove a preset voiceprint in the target voice information, to obtain target voice information of the voiceless pattern;

a voiceprint synthesis unit, configured to synthesize the original voiceprint into the voiceless target voice letter In the message, the final voice message is generated.

The speech translation apparatus according to claim 12, wherein said voiceprint culling unit comprises:

a voiceprint extraction subunit, configured to extract a preset voiceprint from the target voice information; a subtraction subunit, configured to perform signal subtraction on the target voice information and the preset voiceprint to obtain a voiceless pattern Target voice information.

The speech translation apparatus according to claim 12, wherein the voiceprint synthesizing unit is configured to: perform signal addition on the original voiceprint and the target voice information of the voiceless tone to obtain final voice information.

The speech translation apparatus according to claim 12, wherein the processing module comprises:

a first sending unit, configured to send the original voice information to the first server, so that the first server translates the original voice information into target voice information; and the first receiving unit is configured to receive the first The target voice information returned by the server. The speech translation apparatus according to claim 11, wherein the translation information is a target language string, and the synthesizing module is configured to: perform speech synthesis on the target language string by using the original voiceprint, and generate Final voice message.

The speech translation apparatus according to claim 16, wherein the processing module comprises:

a second sending unit, configured to send the original voice information to the second server, so that the second server translates the original voice information into a target language string; and the second receiving unit is configured to receive the The target language string returned by the second server is the voice translation device according to claim 16, wherein the processing module comprises:

a voice recognition unit, configured to perform voice recognition on the original voice information, and generate an original language string;

a character translation unit, configured to translate the original language string into a target language string. [Claim 19] The speech translation apparatus according to claim 11, wherein the apparatus further comprises an output module for outputting the final speech information.

[Claim 20] A terminal device comprising a memory, a processor, and at least one application stored in the memory and configured to be executed by the processor, wherein the application is configured to A speech translation method for performing the method of claim 1.