CN111310481A

CN111310481A - Speech translation method, device, computer equipment and storage medium

Info

Publication number: CN111310481A
Application number: CN202010062844.4A
Authority: CN
Inventors: 张睿卿; 张传强; 熊皓; 何中军; 李芝; 吴华; 王海峰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-01-19
Filing date: 2020-01-19
Publication date: 2020-06-19
Anticipated expiration: 2040-01-19
Also published as: CN111310481B

Abstract

The application discloses a voice translation method, a voice translation device, computer equipment and a storage medium, and relates to the technical field of voice in the technical field of computers. The specific implementation scheme is as follows: obtaining an ith character string of source speech, wherein i is a positive integer; inputting the ith character string into a trained segmentation model, and judging whether the ith character string is an unambiguous semantic unit; if the ith character string is determined to be an unambiguous semantic unit, translating the ith character string to generate an ith target character string; and if the ith character string is determined not to be the unambiguous semantic unit, the ith character string is not translated until the sum of the ith character string to the (i + n) th character string is the unambiguous semantic unit, and the sum of the ith character string to the (i + n) th character string is translated, wherein n is a positive integer. Therefore, the problem that the accuracy of translation of the whole sentence is influenced due to the fact that translation errors of character strings with various paraphrases exist in the source speech is solved, and the accuracy of speech translation is improved.

Description

Speech translation method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of speech technology in the field of computer technology, and in particular, to a speech translation method and apparatus, a computer device, and a storage medium.

Background

Speech translation is the process of converting one natural language (source language) to another (target language). Unlike traditional machine translation, speech translation has the input of speech directly and the output of text. Nowadays, speech translation is more and more popular, and the current speech translation technology is to perform speech translation by taking a single word as a translation unit.

When a translation machine is actually used for speech translation, when polysemous words exist in source speech, polysemous word translation errors occur, and the translation accuracy of the whole sentence is low.

Disclosure of Invention

The application provides a voice translation method, which solves the problem that the accuracy rate of voice translation in the related technology is low.

An embodiment of a first aspect of the present application provides a speech translation method, including:

acquiring an ith character string of source speech, wherein i is a positive integer;

inputting the ith character string into a trained segmentation model, and judging whether the ith character string is an unambiguous semantic unit;

if the ith character string is determined to be an unambiguous semantic unit, translating the ith character string to generate an ith target character string; and

and if the ith character string is determined not to be the unambiguous semantic unit, the ith character string is not translated until the sum of the ith character string to the (i + n) th character string is the unambiguous semantic unit, and the sum of the ith character string to the (i + n) th character string is translated, wherein n is a positive integer.

As a first possible implementation manner of the embodiment of the present application, the inputting the ith character string into a trained segmentation model, and before determining whether the ith character string is an unambiguous semantic unit, includes:

obtaining a training sample; the training samples comprise the source language samples and the target language samples, and each semantic unit contained in the source language samples is an unambiguous semantic unit;

and training the cutting model by adopting the training sample.

As a second possible implementation manner of the embodiment of the present application, the obtaining a training sample includes:

aligning each character string in the source language sample with a corresponding character string in the target language sample to obtain alignment information, and recording the alignment information in a phrase table;

screening out polysemous words in the source language sample according to the alignment information recorded in the phrase table;

judging whether the ith character string in the source language sample is an unambiguous semantic unit or not according to the polysemous words in the source language sample;

determining that the ith character string in the source language sample is an unambiguous semantic unit, and segmenting at a position corresponding to the ith character string;

and if the ith character string in the source language sample is determined not to be an unambiguous semantic unit, the corresponding position of the ith character string is not segmented until the sum of the ith character string and the (i + n) th character string is the unambiguous semantic unit, and the corresponding position of the sum of the ith character string and the (i + n) th character string is segmented, wherein n is a positive integer.

As a third possible implementation manner of the embodiment of the present application, the screening out the ambiguous word in the source language sample according to the alignment information recorded in the phrase table includes:

according to the alignment information, counting character strings corresponding to at least two paraphrases in the source language sample;

and determining the character strings corresponding to the at least two paraphrases as polysemous words in the source language sample.

As a fourth possible implementation manner of the embodiment of the present application, the translating the ith character string to generate the ith target character string includes:

and inputting the ith character string into a translation model to obtain the ith target character string.

An embodiment of a second aspect of the present application provides a speech translation apparatus, including:

the first obtaining module is used for obtaining an ith character string of source speech, wherein i is a positive integer;

the judging module is used for inputting the ith character string into the trained segmentation model and judging whether the ith character string is an unambiguous semantic unit;

the first translation module is used for translating the ith character string to generate an ith target character string if the ith character string is determined to be an unambiguous semantic unit; and

and the second translation module is used for determining that the ith character string is not an unambiguous semantic unit, not translating the ith character string until the sum of the ith character string and the (i + n) th character string is the unambiguous semantic unit, and translating the sum of the ith character string and the (i + n) th character string, wherein n is a positive integer.

An embodiment of a third aspect of the present application provides a computer device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the speech translation method of the first aspect.

A fourth aspect of the present application is directed to a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the speech translation method according to the first aspect.

One embodiment in the above application has the following advantages or benefits: obtaining an ith character string of source speech, wherein i is a positive integer; inputting the ith character string into a trained segmentation model, and judging whether the ith character string is an unambiguous semantic unit; if the ith character string is determined to be an unambiguous semantic unit, translating the ith character string to generate an ith target character string; and if the ith character string is determined not to be the unambiguous semantic unit, the ith character string is not translated until the sum of the ith character string to the (i + n) th character string is the unambiguous semantic unit, and the sum of the ith character string to the (i + n) th character string is translated, wherein n is a positive integer. According to the method, after the character string of the source language is obtained, when the character string is determined to be an unambiguous semantic unit through the segmentation model, the character string is translated into the target character string, the problem that the accuracy of translation of the whole sentence is influenced due to the fact that translation errors of the character string with multiple paraphrases exist in the source language is solved, and the accuracy of speech translation is improved.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic flowchart of a speech translation method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a speech translation method according to a second embodiment of the present application;

fig. 3 is a schematic flowchart of a speech translation method according to a third embodiment of the present application;

fig. 4 is a schematic structural diagram of a speech translation apparatus according to a fourth embodiment of the present application;

FIG. 5 is a block diagram of a computer device of a method of speech translation according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

A speech translation method, apparatus, computer device, and storage medium according to embodiments of the present application are described below with reference to the accompanying drawings.

Fig. 1 is a flowchart illustrating a speech translation method according to an embodiment of the present application.

The embodiment of the present application is exemplified by the speech translation method being configured in a speech translation apparatus, and the speech translation apparatus can be applied to any computer device, so that the computer device can execute a speech translation function.

The Computer device may be a Personal Computer (PC), a cloud device, a mobile device, and the like, and the mobile device may be a hardware device having various operating systems, such as a mobile phone, a tablet Computer, a Personal digital assistant, a wearable device, and an in-vehicle device.

As shown in fig. 1, the speech translation method may include the steps of:

step 101, an ith character string of a source speech is obtained, wherein i is a positive integer.

In the embodiment of the application, the source speech input by a user in a speech mode is firstly acquired, the source speech is identified to obtain a source speech text corresponding to the source speech, and an ith character string of the source speech is acquired from the source speech text. Wherein i is a positive integer.

In the embodiment of the application, the source language text is a speech text to be translated, for example, the source language text may be an english text. Of course, other language texts are possible, and are not limited herein.

And 102, inputting the ith character string into the trained segmentation model, and judging whether the ith character string is an unambiguous semantic unit.

The unambiguous semantic unit refers to a semantic unit with only one paraphrase.

For example, a close may have multiple definitions of close, tight, etc., a close collaboration may only translate to a close collaboration, a close is not an unambiguous semantic unit, and a close gallery is an unambiguous semantic unit.

In the embodiment of the application, after the ith character string of the source language is acquired, the ith character string can be input into the trained segmentation model, so that whether the ith character string is an unambiguous semantic unit or not is judged according to the output of the segmentation model.

In the related art, when translating an acquired source speech text, after acquiring 3 character strings, translating the acquired character strings in the source language every time one character string is acquired. That is, after 3 character strings are acquired, translation is performed in units of a single character string until a sentence is completed, and translation contents corresponding to all the character strings are output based on the existing translation history.

In the method, after each character string of a source language is acquired, the character string is input into a trained segmentation model to judge whether the character string is an unambiguous semantic unit or not. Therefore, the situation that wrong translation is generated when the character string has various paraphrases is avoided.

Step 103, determining that the ith character string is an unambiguous semantic unit, translating the ith character string to generate an ith target character string.

In the embodiment of the application, after the ith character string is input into the trained segmentation model, the ith character string is determined to be an unambiguous semantic unit, and then the ith character string is input into the trained translation model for translation, so as to generate the ith target character string.

It should be explained that the translation model may be a neural network model that is trained using a large number of training samples.

And 104, if the ith character string is determined not to be the unambiguous semantic unit, not translating the ith character string until the sum of the ith character string to the (i + n) th character string is the unambiguous semantic unit, and translating the sum of the ith character string to the (i + n) th character string, wherein n is a positive integer.

In the embodiment of the application, after the ith character string is input into the trained segmentation model, the ith character string is determined not to be an unambiguous semantic unit, and the ith character string is not translated under the condition, so that the condition that the translation result is wrong, and the sentence is translated wrongly is avoided.

In the embodiment of the application, after the ith character string is determined not to be the unambiguous semantic unit, the (i + 1) th character string of the source speech is continuously obtained, the ith character string and the (i + 1) th character string are input into a trained segmentation model, whether the sum of the ith character string and the (i + 1) th character string is the unambiguous semantic unit or not is judged, and if the sum of the ith character string and the (i + 1) th character string is determined to be the unambiguous semantic unit, the sum of the ith character string and the (i + 1) th character string is translated. Otherwise, continuously acquiring character strings of the source language, and translating the sum of the ith character string to the (i + n) th character string when the sum of the ith character string to the (i + n) th character string is the unambiguous semantic unit, wherein n is a positive integer.

For example, assuming that the source speech to be translated is "this is large-available through closed semantic merging", after the 1 st character string "this" is acquired, it is determined that "this" is an unambiguous semantic unit, and the character string is translated. And continuously acquiring the 2 nd character string ' is ', determining that the ' is an unambiguous semantic unit, and translating the ' is '. And continuously acquiring the 3 rd character string 'made', inputting the character string 'made' into the segmentation model, determining that the 'made' is not an unambiguous semantic unit, and not translating the character string. And continuously acquiring the 4 th character string, inputting the character string 'madefossible' into the segmentation model, determining that the character string 'madefossible' is not an unambiguous semantic unit, continuously acquiring the 5 th character string 'through', determining that the 'madefossible through' is the unambiguous semantic unit, and translating the character string 'madefossible through' into a target character string.

The speech translation method of the embodiment of the application obtains the ith character string of the source speech, wherein i is a positive integer; inputting the ith character string into a trained segmentation model, and judging whether the ith character string is an unambiguous semantic unit; if the ith character string is determined to be an unambiguous semantic unit, translating the ith character string to generate an ith target character string; and if the ith character string is determined not to be the unambiguous semantic unit, the ith character string is not translated until the sum of the ith character string to the (i + n) th character string is the unambiguous semantic unit, and the sum of the ith character string to the (i + n) th character string is translated, wherein n is a positive integer. According to the method, after the character string of the source language is obtained, when the character string is determined to be an unambiguous semantic unit through the segmentation model, the character string is translated into the target character string, the problem that the accuracy of translation of the whole sentence is influenced due to the fact that translation errors of the character string with multiple paraphrases exist in the source language is solved, and the accuracy and the translation efficiency of speech translation are improved.

On the basis of the above embodiment, before inputting the ith character string into the trained segmentation model and determining whether the ith character string is an unambiguous semantic unit in step 102, the segmentation model needs to be trained by the training sample, so that the trained segmentation model can accurately determine whether the character string of the source speech is an unambiguous semantic unit. The above process is described in detail with reference to fig. 2, and fig. 2 is a flowchart illustrating a speech translation method according to a second embodiment of the present application.

As shown in fig. 2, the speech translation method may further include the following steps:

step 201, a training sample is obtained.

The training samples comprise source language samples and target language samples, and each semantic unit contained in the source language samples is an unambiguous semantic unit.

It should be noted that the training samples include source language samples and corresponding target language samples, and the source language samples are all split unambiguous semantic units.

In the embodiment of the application, a training sample containing a source language sample and a target language sample can be obtained from a parallel corpus, and then the source language sample is segmented to be segmented into a plurality of unambiguous semantic units.

And 202, training the segmentation model by using the training samples.

In the embodiment of the application, the obtained training sample is input into the segmentation model, the segmentation model can be trained, and an unambiguous semantic unit contained in the source language sample is determined according to the output of the model.

Therefore, the trained segmentation model can accurately judge the unambiguous semantic unit in the source language to segment the source language, and the accuracy of voice translation is improved.

According to the voice translation method, the training sample is obtained; the training samples comprise source language samples and target language samples, and the segmentation model is trained by adopting the training samples. According to the method, the segmentation model is trained through the source language sample and the target language sample after segmentation, so that the trained segmentation model can be used for accurately identifying the unambiguous semantic unit in the source language, and the accuracy of voice translation is improved.

In step 201 of the foregoing embodiment, each speech unit included in the obtained source language sample is an unambiguous semantic unit, and the source language sample needs to be segmented in advance to segment the source language sample into a plurality of unambiguous semantic units. The above process is described in detail with reference to fig. 3, and fig. 3 is a flowchart illustrating a speech translation method according to a third embodiment of the present application.

As shown in fig. 3, the speech translation method may further include the following steps:

step 301, aligning each character string in the source language sample with the corresponding character string in the target language sample to obtain alignment information, and recording the alignment information in the phrase table.

In the embodiment of the application, after the source language sample and the target language sample are obtained, the character string of the source language sample and the target character string of the target language sample can be aligned to obtain the alignment information, and the alignment information is recorded in the phrase table. As a possible implementation manner, an existing alignment tool may be used to perform alignment processing on the character string of the source language sample and the target character string of the target language sample to obtain alignment information.

As another possible implementation manner, an iterative optimization strategy, such as an EM Algorithm (Expectation Maximization Algorithm), may also be used to align each character string in the source language sample with a corresponding target character string in the target language sample, so as to obtain alignment information.

As an example, assume that the source language sample is "This is large capable through closed collaboration", and the target language sample is "This is achieved through close collaboration. ". Performing an alignment process on the character string of the source language sample and the target character string of the target language sample can determine that "this" is aligned with "this," is aligned with "yes," madefossible "is aligned with" realized, "through" is aligned with "pass," close "is aligned with" close, "and" collation "is aligned with" collaboration.

Step 302, screening out the polysemous words in the source language sample according to the alignment information recorded in the phrase table.

Wherein, the polysemous word refers to the character string corresponding to two or more paraphrases in the source language sample.

In the embodiment of the present application, the alignment information recorded in the phrase table includes a target character string and a number of target character strings in a target language sample corresponding to each character string in a source language sample. Therefore, the polysemous words in the source language sample can be screened out according to the alignment information recorded in the phrase table.

As an example, assume that the source language samples are "This is large porous closed speech" and "If you are cold, closed the window", and the target language samples are "This is achieved through close collaboration. "and" you close the window if you are cold. ". After each character string in the source language sample is aligned with a corresponding character string in the target language sample, the character string "close" aligns the two target character strings as "close" and "closed", respectively. In this case, in the alignment information recorded in the phrase table, the character string "close" has two kinds of interpretations, and the character string "close" can be determined as an ambiguous word.

Step 303, judging whether the ith character string in the source language sample is an unambiguous semantic unit according to the ambiguous words in the source language sample.

In the embodiment of the application, after the polysemous words in the source language sample are screened out according to the alignment information recorded in the phrase table, whether the ith character string in the source language sample is an unambiguous semantic unit or not can be determined according to the polysemous words in the source language sample.

It can be understood that each character string in the source language sample is compared with the screened ambiguous word, and the character string in the source language sample is determined not to be the ambiguous word, so that the character string is an unambiguous semantic unit.

Continuing with the above example as an example, after determining the character string "close" as an ambiguous word, it may be determined that the character string "close" in the source language sample "This is large capable of passing close interaction layout" is an ambiguous word, and the rest of the character strings are unambiguous semantic units.

And step 304, if the ith character string in the source language sample is determined to be an unambiguous semantic unit, segmenting at the corresponding position of the ith character string.

As a possible case, if the ith character string in the source language sample is determined to be an unambiguous semantic unit, segmentation can be performed at a position corresponding to the ith character string in the source language sample.

As an example, assuming that the source language sample is "This is large capable through closed semantic organization", and the character string "This" is determined to be an unambiguous semantic unit, the source language sample may be segmented after the character string "This".

Step 305, determining that the ith character string in the source language sample is not an unambiguous semantic unit, not segmenting the corresponding position of the ith character string until the sum of the ith character string to the (i + n) th character string is the unambiguous semantic unit, and segmenting the corresponding position of the sum of the ith character string to the (i + n) th character string, wherein n is a positive integer.

As a possible scenario, it is determined that the ith character string in the source language sample is an ambiguous word screened from the source language sample, and it may be determined that the ith character string in the source language sample is not an unambiguous semantic unit. In this case, the corresponding position of the ith character string of the source language sample is not segmented until the sum of the ith character string to the (i + n) th character string is an unambiguous semantic unit, and the corresponding position of the sum of the ith character string to the (i + n) th character string is segmented, wherein n is a positive integer.

As an example, assuming that a source language sample is "This is large-capable closed semantic unit", and a character string "close" is not an unambiguous semantic unit, a segmentation process is not performed at a position corresponding to the "close", the character string "close interaction" is determined to be an unambiguous semantic unit, and the character string "close interaction" is segmented into one semantic unit.

The speech translation method of the embodiment of the application obtains alignment information by aligning each character string in a source language sample with a corresponding character string in a target language sample, records the alignment information in a phrase table, screens out polysemous words in the source language sample according to the alignment information recorded in the phrase table, and judges whether an ith character string in the source language sample is an unambiguous semantic unit or not according to the polysemous words in the source language sample so as to determine whether segmentation is carried out at a position corresponding to the ith character string or not. Therefore, the source language sample is segmented into the unambiguous semantic units, and the segmentation model is trained according to the segmented source language sample, so that the accuracy of voice translation is improved.

In order to implement the above embodiments, the present application provides a speech translation apparatus.

Fig. 4 is a schematic structural diagram of a speech translation apparatus according to a fourth embodiment of the present application.

As shown in fig. 4, the speech translation apparatus 400 may include: a first obtaining module 410, a judging module 420, a first translating module 430 and a second translating module 440.

The first obtaining module 410 is configured to obtain an ith character string of a source speech, where i is a positive integer.

The judging module 420 is configured to input the ith character string into the trained segmentation model, and judge whether the ith character string is an unambiguous semantic unit.

The first translation module 430 is configured to determine that the ith character string is an unambiguous semantic unit, translate the ith character string to generate an ith target character string; and

the second translation module 440 is configured to determine that the ith character string is not an unambiguous semantic unit, not translate the ith character string, and translate the sum of the ith character string and the (i + n) th character string until the sum of the ith character string and the (i + n) th character string is the unambiguous semantic unit, where n is a positive integer.

As a possible scenario, the speech translation apparatus 400 may further include:

the second acquisition module is used for acquiring a training sample; the training samples comprise source language samples and target language samples, and each semantic unit contained in the source language samples is an unambiguous semantic unit.

And the training module is used for training the segmentation model by adopting the training samples.

As another possible case, the second obtaining module is further configured to:

aligning each character string in the source language sample with the corresponding character string in the target language sample to obtain alignment information, and recording the alignment information in a phrase table;

screening out polysemous words in a source language sample according to the alignment information recorded in the phrase table;

determining that the ith character string in the source language sample is an unambiguous semantic unit, and segmenting at the corresponding position of the ith character string;

As another possible case, the second obtaining module may further be configured to:

determining character strings corresponding to the at least two paraphrases as ambiguous words in the source language sample.

As another possible scenario, the first translation module is further configured to:

and inputting the ith character string into a translation model to obtain an ith target character string.

The speech translation device of the embodiment of the application obtains the ith character string of the source speech, wherein i is a positive integer; inputting the ith character string into a trained segmentation model, and judging whether the ith character string is an unambiguous semantic unit; if the ith character string is determined to be an unambiguous semantic unit, translating the ith character string to generate an ith target character string; and if the ith character string is determined not to be the unambiguous semantic unit, the ith character string is not translated until the sum of the ith character string to the (i + n) th character string is the unambiguous semantic unit, and the sum of the ith character string to the (i + n) th character string is translated, wherein n is a positive integer. According to the method, after the character string of the source language is obtained, when the character string is determined to be an unambiguous semantic unit through the segmentation model, the character string is translated into the target character string, the problem that the accuracy of translation of the whole sentence is influenced due to the fact that translation errors of the character string with multiple paraphrases exist in the source language is solved, and the accuracy of speech translation is improved.

According to an embodiment of the present application, a computer device and a readable storage medium are also provided.

As shown in fig. 5, fig. 5 is a block diagram of a computer device of a method of speech translation according to an embodiment of the present application. Computer devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The computer device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 5, the computer apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the computer device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple computer devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.

Memory 502 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of speech translation provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of speech translation provided herein.

The memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method for speech translation in the embodiment of the present application (for example, the first obtaining module 410, the determining module 420, the first translating module 430, and the second translating module 440 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing, i.e., a method of implementing speech translation in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory 502.

The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the computer device for speech translation, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 optionally includes memory located remotely from processor 501, which may be connected to a speech translation computer device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The computer device of the method of speech translation may further comprise: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus for speech translation, such as an input device such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the ith character string of the source speech is obtained, wherein i is a positive integer; inputting the ith character string into a trained segmentation model, and judging whether the ith character string is an unambiguous semantic unit; if the ith character string is determined to be an unambiguous semantic unit, translating the ith character string to generate an ith target character string; and if the ith character string is determined not to be the unambiguous semantic unit, the ith character string is not translated until the sum of the ith character string to the (i + n) th character string is the unambiguous semantic unit, and the sum of the ith character string to the (i + n) th character string is translated, wherein n is a positive integer. According to the method, after the character string of the source language is obtained, when the character string is determined to be an unambiguous semantic unit through the segmentation model, the character string is translated into the target character string, the problem that the accuracy of translation of the whole sentence is influenced due to the fact that translation errors of the character string with multiple paraphrases exist in the source language is solved, and the accuracy of speech translation is improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of speech translation, the method comprising:

2. The method for translating speech according to claim 1, wherein the inputting the ith character string into the trained segmentation model and determining whether the ith character string is a semantic unit without ambiguity comprises:

obtaining a training sample; the training samples comprise source language samples and target language samples, and each semantic unit contained in the source language samples is an unambiguous semantic unit;

and training the cutting model by adopting the training sample.

3. The translation method according to claim 2, wherein said obtaining training samples comprises:

4. The translation method according to claim 3, wherein said screening out ambiguous words in said source language sample according to said alignment information recorded in said phrase table comprises:

5. The translation method according to any one of claims 1 to 4, wherein translating the ith character string to generate the ith target character string comprises:

6. A speech translation apparatus, characterized in that the apparatus comprises:

7. The speech translation apparatus of claim 6, wherein the apparatus further comprises:

the second acquisition module is used for acquiring a training sample; the training samples comprise source language samples and target language samples, and each semantic unit contained in the source language samples is an unambiguous semantic unit;

and the training module is used for training the cutting model by adopting the training sample.

8. The speech translation device of claim 7, wherein the second obtaining module is further configured to:

judging whether the ith character string in the source language sample corresponding to the ith character string in the target language sample is an unambiguous semantic unit or not according to the polysemous words in the source language sample;

9. The speech translation device of claim 8, wherein the second obtaining module is further configured to:

10. The speech translation apparatus according to any one of claims 6-9, wherein the first translation module is further configured to:

11. A computer device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the speech translation method of any of claims 1-5.

12. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the speech translation method of any one of claims 1-5.