CN113591491B

CN113591491B - Speech translation text correction system, method, device and equipment

Info

Publication number: CN113591491B
Application number: CN202010366777.5A
Authority: CN
Inventors: 曹宇
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2023-12-26
Anticipated expiration: 2040-04-30
Also published as: CN113591491A

Abstract

The application discloses a system, a method, a device and related equipment for correcting a speech translation text. The system determines a source language text segment corresponding to voice stream data acquired by the client in real time through the server, and sends the text segment to the client; receiving the manually corrected first clause text sent by the client, determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client; the client acquires voice stream data in real time and sends the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text. By adopting the processing mode, the original clause text is manually corrected along with the real-time voice recognition progress, and the manually corrected original clause text is translated before the sentence recognition is completed, so that the translation text correction processing of clause granularity is realized; therefore, the correction efficiency and correction quality can be effectively improved.

Description

Speech translation text correction system, method, device and equipment

Technical Field

The application relates to the technical field of machine voice translation, in particular to a voice translation text correction system, a voice translation text correction method, a voice translation text correction device and electronic equipment.

Background

With the advent of the information internationalization age and the increasing demands of various societies, the research of the automatic speech translation technology has been receiving more and more attention. Speech translation, also commonly referred to as spoken language translation (Spoken Language Translation, SLT), is the process by which a computer performs a translation of speech from one language to another.

Speech translation is a technique that combines speech recognition with machine translation. In the scene of combining real-time speech recognition with machine translation (simultaneous interpretation), due to the influence of factors such as technology, environment, human factors and the like, the speech recognition result is inaccurate, and further the translation result is wrong, so that the recognition result and the translation result need to be subjected to real-time intervention correction by manual intervention. In a real-time identification scene, the difficulty of completely manually performing the intervention is high because of the fast rhythm and short time. At present, a typical voice translation result correction mode is that firstly, voice is recognized and translated through a real-time voice translation model, a translation result is displayed on a screen in real time, after a whole sentence is translated, automatic intervention processing is carried out on the sentence voice translation result, and then manual intervention processing is carried out.

However, in carrying out the present invention, the inventors have found that the prior art has at least the following problems: because the automatic intervention processing is carried out on the voice translation result after the translation of the whole sentence is completed, and the voice translation result has only simple translation result intervention capability, the speed of automatic intervention on the voice translation result is slower, and further, the access time of manual intervention is later, so that the residence time of the wrong translation subtitle on a screen is longer, and the correction quality of the translation text is poorer. Therefore, how to improve the overall correction efficiency and correction quality of the voice translation result, so as to shorten the residence time of the incorrectly translated caption on the screen and improve the translation quality, which is a problem that needs to be solved urgently by those skilled in the art.

Disclosure of Invention

The application provides a voice translation text correction system to solve the problem that voice translation text correction quality and correction efficiency are low in the prior art. The application additionally provides a voice translation text correction method and device and electronic equipment.

The application provides a speech translation text correction system, comprising:

the server side is used for determining a text fragment of a source language corresponding to voice stream data acquired by the client side in real time and sending the text fragment to the client side; receiving the manually corrected first clause text sent by the client, determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client;

The client is used for collecting voice stream data in real time and sending the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text.

The application also provides a voice translation text correction method, which comprises the following steps:

determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and transmitting the text segment to the client;

receiving a first clause text after manual correction sent by a client;

and determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client.

Optionally, the method further comprises:

performing correction processing on the first clause text as corrected third clause text;

the determining the second clause text of the target language corresponding to the first clause text includes:

and determining the second clause text corresponding to the third clause text.

Optionally, the performing correction processing on the first clause text includes:

and executing dialect correction processing on the first clause text.

and executing entity word replacement processing on the first clause text according to the entity word replacement rule information.

Optionally, the entity word replacement rule includes: name replacement rules, business entity name replacement rules.

and executing blacklist filtering processing on the first clause text according to the blacklist filtering rule information.

Optionally, the method further comprises:

and performing correction processing on the second clause text.

Optionally, the method further comprises:

and optimizing the voice recognition model and/or the voice translation model according to the hotword information.

Optionally, the method further comprises:

determining a translation uncertainty word included in the second clause text and a plurality of candidate translation words of the translation uncertainty word;

and sending the word with the uncertain translation and the candidate translation word to the client so that a user of the client can modify the word with the uncertain translation according to the candidate translation word.

Optionally, the term with the uncertain translation and the candidate translation terms are determined according to a similar vocabulary.

collecting voice stream data in real time, and sending the voice stream data to a server;

displaying a text segment of a source language corresponding to the voice stream data and returned by the server;

Determining a first clause text after manual correction, and sending the first clause text to a server;

and displaying a second clause text of the target language corresponding to the first clause text and returned by the server.

Optionally, the method further comprises:

determining a manually corrected third clause text corresponding to the second clause text;

and updating the displayed second clause text into a third clause text.

Optionally, performing, by the first display device, a manual correction original text process;

and displaying the source language text and the target language text corresponding to the voice progress through a second display device.

Optionally, the determining the manually corrected first clause text includes:

determining that a second display device of the first display device has displayed a history of completed source language;

and adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute.

Optionally, the determining the manually corrected first clause text includes:

and adjusting the first clause text according to the adjusted punctuation marks.

Optionally, the determining the manually corrected first clause text includes:

and according to the single step back instruction, restoring the text before single step modification.

Optionally, the determining the manually corrected first clause text includes:

and restoring the sentence text before modification according to the sentence back instruction.

Optionally, the determining the manually corrected first clause text includes:

each sentence text is displayed in a sentence-isolated manner.

Optionally, the determining the manually corrected first clause text includes:

and displaying the sentence text focused by the cursor with the first display attribute, and displaying the sentence text focused by the non-cursor with the second display attribute.

Optionally, the determining the manually corrected first clause text includes:

and if the text selection operation is executed, displaying a text processing shortcut operation option.

Optionally, the text processing shortcut options include: the method comprises the steps of adding a hotword option, adding an entity word replacement rule option, a human-name pronoun quick switching option, a punctuation quick switching option, a text region deleting option and a whole sentence deleting option.

Optionally, the determining the manually corrected third clause text corresponding to the second clause text includes:

each sentence text is displayed in a sentence-isolated manner.

The sentence text of the target language focused by the cursor is displayed with the first display attribute, and the sentence text of the target language focused by the non-cursor is displayed with the second display attribute.

determining sentence text in a source language corresponding to sentence text in a target language focused by a cursor;

and displaying sentence text of the source language with a first display attribute.

Optionally, the first display attribute includes: highlighting;

the second display attribute includes: non-highlighting.

and deleting the sentence text according to the sentence deleting instruction.

Optionally, performing, by the first display device, a manual correction translation process;

displaying a source language text and a target language text corresponding to the voice progress through a second display device;

the determining the manually corrected third clause text corresponding to the second clause text includes:

determining that a second display device of the first display devices has displayed a history text of the completed target language;

and adjusting the historical text of the target language displayed by the first display device to be a third display attribute.

Optionally, the method further comprises:

determining a volume gain of the voice stream data;

and adjusting the volume gain of the voice stream data according to the volume gain and the volume gain threshold value.

Optionally, the method further comprises:

the first clause text of the source language and the second clause text of the target language are displayed in sentence alignment.

Optionally, the method further comprises:

receiving a word with uncertain translation and a plurality of candidate translation words of the word with uncertain translation, which are included in the second clause text and sent by a server;

and modifying the words with uncertain translations according to the candidate translation words.

The application also provides a voice translation text correction device, comprising:

the voice recognition unit is used for determining a text segment of a source language corresponding to voice stream data acquired by the client in real time and sending the text segment to the client;

the data receiving unit is used for receiving the manually corrected first clause text sent by the client;

and the voice translation unit is used for determining a second clause text of the target language corresponding to the first clause text and sending the second clause text to the client.

The application also provides an electronic device comprising:

a processor; and

a memory for storing a program for implementing a method for correcting a speech translation text, the apparatus being powered on and executing the program for the method by the processor, and performing the steps of: determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and transmitting the text segment to the client; receiving a first clause text after manual correction sent by a client; and determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client.

the voice data acquisition and transmission unit is used for acquiring voice stream data in real time and transmitting the voice stream data to the server;

the original text display unit is used for displaying text fragments of a source language corresponding to the voice stream data, which are returned by the server side;

the original text correction unit is used for determining a first clause text after manual correction and sending the first clause text to the server;

and the translation display unit is used for displaying a second clause text of the target language corresponding to the first clause text and returned by the server.

The application also provides an electronic device comprising:

a processor; and

a memory for storing a program for implementing a method for correcting a speech translation text, the apparatus being powered on and executing the program for the method by the processor, and performing the steps of: collecting voice stream data in real time, and sending the voice stream data to a server; displaying a text segment of a source language corresponding to the voice stream data and returned by the server; determining a first clause text after manual correction, and sending the first clause text to a server; and displaying a second clause text of the target language corresponding to the first clause text and returned by the server.

The application also provides a system for correcting the text of the voice translation, which comprises:

the server side is used for determining a text fragment of the source language corresponding to the voice data playing progress and sending the text fragment to the client side; receiving the manually corrected first clause text sent by the client, and determining a second clause text of the target language corresponding to the first clause text;

and the client is used for playing the voice data, displaying the text fragment, determining the first clause text and sending the first clause text.

determining a text segment of a source language corresponding to the voice data playing progress, and sending the text segment to a client;

receiving a first clause text after manual correction sent by a client;

a second clause text of the target language corresponding to the first clause text is determined.

Optionally, the method further comprises:

and sending the second clause text to the client so as to perform manual correction processing on the second clause text.

playing the voice data;

displaying a text fragment of a source language corresponding to the voice data playing progress sent by a server;

And determining a first clause text, and sending the first clause text to the server side so that the server side determines a second clause text of the target language corresponding to the first clause text.

Optionally, the method further comprises:

displaying a second clause text sent by the server;

and determining the manually corrected second clause text corresponding to the second clause text.

The present application also provides a computer-readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the various methods described above.

The present application also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the various methods described above.

Compared with the prior art, the application has the following advantages:

according to the voice translation text correction system provided by the embodiment of the application, a source language text segment corresponding to voice stream data acquired by a client in real time is determined through a server, and the text segment is sent to the client; receiving the manually corrected first clause text sent by the client, determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client; the client acquires voice stream data in real time and sends the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text; according to the processing mode, along with the real-time voice recognition progress, the source language clause text (such as comma separated half-sentence) is manually corrected, and before one sentence recognition is completed, the manually corrected source language clause text is translated, so that the translation text correction processing of clause granularity is realized, and the wrong translation text is prevented from staying on a screen for a longer time; therefore, the correction efficiency of the voice translation text can be effectively improved, and the display time of the error translation text can be effectively shortened. In addition, because the source language clause text based on manual correction is translated, the correction quality of the voice translation text can be effectively improved. In addition, the difficulty of judging the text error and intervening is smaller than that of judging the translation error and intervening, so that the correction efficiency and the correction quality can be further improved.

According to the voice translation text correction system provided by the embodiment of the application, a text segment of a source language corresponding to the playing progress of voice data is determined through a server, and the text segment is sent to a client; receiving the manually corrected first clause text sent by the client, and determining a second clause text of the target language corresponding to the first clause text; the client plays the voice data, displays a text fragment, determines a first clause text and sends the first clause text; according to the processing mode, along with the voice playing progress, the source language clause text (such as comma separated half-sentence) is manually corrected, and before the sentence recognition is completed, the manually corrected source language clause text is translated, so that translation text correction processing of clause granularity is realized; therefore, the correction quality and correction efficiency of the voice translation text can be effectively improved.

Drawings

FIG. 1 is a schematic diagram of an embodiment of a speech translation text correction system provided herein;

FIG. 2 is a schematic diagram of an application scenario of an embodiment of a speech translation text correction system provided herein;

FIG. 3 is a schematic diagram of device interactions of an embodiment of a speech translation text correction system provided herein;

FIG. 4 is a schematic diagram of a manual correction interface for an embodiment of a speech translation text correction system provided herein.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.

In the application, a system, a method and a device for correcting a speech translation text are provided, and an electronic device. The various schemes are described in detail one by one in the examples below.

First embodiment

Referring to fig. 1, a block diagram of an embodiment of a speech translation text correction system of the present application is shown. The system comprises: server 1, client 2.

The server 1 may be a server deployed on a cloud server, or may be a server dedicated to implementing speech translation and text intervention processing, and may be deployed in a data center. The server may be a cluster server or a single server.

The client 2 includes, but is not limited to, a mobile communication device, namely: the mobile phone or the intelligent mobile phone also comprises terminal equipment such as a personal computer, a PAD, an iPad and the like.

Please refer to fig. 2, which is a schematic diagram of a scenario of the speech translation text correction system of the present application. The server and the client can be connected through a network, for example, the client can be networked through WIFI and the like, and the like. In the occasions such as court trial sites and multi-person conferences, the client acquires the site voice stream data in real time and sends the site voice stream data to the server; the server determines a source language text corresponding to the voice stream data through a voice recognition model, determines a target language text through a voice translation model, and sends the two texts to the client; and the client projects the correction result to a large field screen for on-site users to watch. Meanwhile, the text correction user carries out manual correction processing on the two texts through the client, and synchronously updates the correction result to the on-site large screen.

Please refer to fig. 3, which is a schematic diagram illustrating device interaction of an embodiment of the speech translation text correction system of the present application. In this embodiment, the server is configured to determine a text segment of a source language corresponding to voice stream data collected by the client in real time, and send the text segment to the client; receiving the manually corrected first clause text sent by the client, determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client; the client is used for collecting voice stream data in real time and sending the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text.

In the process of implementing the invention, the inventor finds that since the translation result (translation) is obtained by translating the voice recognition result (original text), the translation error in most cases is caused by the voice recognition result error, and meanwhile, the difficulty of judging the original text error and intervening is smaller than the difficulty of judging the translation error and intervening, so the system intervenes (corrects) the original text first. After the original text is dried, the corrected original text clauses are combined with the concept of 'fast translation of the streaming result', so that the fast intervention effect of the translated text is achieved, the completion of recognition of the whole sentence is not required, manual intervention operation can be carried out, the error stay time is shortened, and the error stay time on a screen can be avoided for a longer time aiming at the scene of real-time on-screen caption.

In this embodiment, the client deploys a speech recognition result editing module, which may also be referred to as an original text editing module, and is a core module of the system, and is responsible for editing a speech recognition result (original text), where the module is implemented based on an editor principle, and a specific embodiment may be to set a text label to a text table= "true" attribute, so that the text label has editable capability.

In the specific implementation, the original text is corrected manually, and at least one of the following modes can be adopted:

determining that a second display device in the first display device has displayed a history text of the completed source language; and adjusting the historical text of the source language displayed by the first display device to be a third display attribute.

In the present embodiment, the client performs the manual correction original text processing by the first user (text corrector) through the first display device (correction processing screen); the source language text and the target language text corresponding to the voice progress are displayed to a second user (live audience) through a second display device (live presentation screen).

Under the scene of subtitle screen throwing, the sentences can be subjected to gray setting treatment in the correction treatment screen by judging which sentences leave the display area of the field display screen and then modifying css patterns so as to prompt field control personnel that the sentences are not displayed on the field display screen, so that excessive efforts are not needed to intervene, and time waste and idle work are avoided.

And in a second mode, adjusting the text of the first clause according to the adjusted punctuation mark.

This allows a new sentence to be formed by inserting an end punctuation such as a period into the original text to split one sentence into two sentences, or by deleting or changing the end punctuation into an in-sentence punctuation. Because the translation service is called by taking a single sentence (which can be a clause) as a unit, reasonable sentence-breaking and sentence-closing operations can effectively improve the quality of a translation result, thereby achieving the effect of improving the correction quality of a translation text.

In the implementation, before the completion of a sentence identification, if commas appear, the clause content before the commas is called in advance for background optimization service and translation service, and the whole sentence identification is not required to be completed. The method has the advantages that under the scene that the original text and the translated text need to be displayed on the screen in real time, the more accurate recognition result after the background optimization processing can be displayed to the audience more quickly, and similarly, the translation result of part of clauses can be displayed to the audience in advance as soon as possible.

And thirdly, restoring the text before modification in the last step according to the single-step rollback instruction.

In specific implementation, a shortcut key for monitoring command+z under the mac system (a shortcut key for monitoring ctrl+z under the windows system) can be used for carrying out rollback operation on the content being edited, so that field control personnel can carry out single-step rapid rollback when intervention errors occur.

In particular, the method can support monitoring esc key events and perform one-key restoration operation on sentences in editing. For example, when the manual intervention is performed, the sentence is found not to need to be interfered, at this time, the focus can be forcibly lost by pressing a esc key, and a specific implementation manner can be to execute a blu () method on an editor, and replace the edited content with the original content before editing, and the method is simultaneously that a compiling field control personnel quickly performs complete rollback of the whole sentence when the interference is wrong.

And fifthly, displaying each sentence text in a sentence isolation mode.

In the specific implementation, css style processing can be performed on the complete single statement. For example, as shown in fig. 4, a closed frame is added to each sentence, and a certain distance is added between the frames of the two sentences, so that the field control personnel can clearly divide each sentence in the recognition result.

And in a sixth mode, displaying the sentence text focused by the cursor by the first display attribute, and displaying the sentence text focused by the non-cursor by the second display attribute.

In the implementation, when the cursor focuses on the content of a sentence, the sentence is highlighted by modifying the css style when the focus event is monitored, so that the field control personnel can accurately position the content being edited and the complete content of the sentence.

And seventh, if the text selection operation is executed, displaying a text processing shortcut operation option.

The text processing shortcut options include: the method comprises the steps of adding a hotword option, adding an entity word replacement rule option, a human-name pronoun quick switching option, a punctuation quick switching option, a text region deleting option and a whole sentence deleting option.

In specific implementation, the text can be selected by a mouse, and a shortcut operation bar can be automatically displayed near a selection area to perform one-key intervention, wherein the shortcut operation comprises: adding hot words, adding entity word replacement rules, quickly switching the hot words, quickly switching punctuation marks, deleting a selected area and deleting a whole sentence.

In one example, the translation text corrector may also correct the second clause text (translation) by the client, the client determining a manually corrected third clause text corresponding to the second clause text; and updating the second clause text put into the field large screen into a third clause text. By adopting the processing mode, the direct editing of the translation result is supported, and the method is applied to the condition that the original text is correctly identified but the translation result is inaccurate, and can directly edit the translation content.

In this embodiment, the client deploys a machine translation result editing module, which may also be referred to as a translation editing module, and the module is a minor important module in the system, and since the translation result is obtained by translating the speech recognition result, the translation error in most cases is caused by the speech recognition result error, and meanwhile, the difficulty of judging the original text error and interfering is smaller than the difficulty of judging the translation error and interfering, so that the proportion of the translation interfering is smaller than that of the original text interfering, and the translation interfering function is smaller.

In specific implementation, the translation is corrected manually, and at least one of the following modes can be adopted:

And displaying each sentence text in a sentence isolation mode.

In the implementation, the translation result can be displayed according to sentences, and the sentences are isolated through css patterns, so that the current sentence content can be clearly judged whether the sentences are edited or deleted.

And in the second mode, the sentence text of the target language focused by the cursor is displayed by the first display attribute, and the sentence text of the target language focused by the non-cursor is displayed by the second display attribute.

In specific implementation, when the mouse is focused, the current sentence is highlighted by modifying the css style.

In specific implementation, the method can also be used for determining the sentence text of the source language corresponding to the sentence text of the target language focused by the cursor; and displaying sentence text of the source language with a first display attribute. By adopting the processing mode, the corresponding original text content is matched through id, and then the corresponding original text sentence in the original text editing area is highlighted, so that the reference original text content can be compared when the translation is modified.

And thirdly, deleting the sentence text according to the sentence deleting instruction.

In the implementation, the translation result can be deleted rapidly through the F1-F10 keys, when the translation result is very bad, the translation result can be deleted rapidly by one key, the wrong translation result is prevented from staying on the screen for a long time, and the method is suitable for displaying the scene of the simultaneous transmission result on the real-time screen.

Determining that a second display device in the first display device has displayed the history text of the completed target language; and adjusting the historical text of the target language displayed by the first display device to be a third display attribute.

In the implementation, under the scene of translating the subtitle and projecting the screen, by judging which translation results leave the screen display area and then modifying css style, the sentence is subjected to gray setting processing, so that the field control personnel is prompted to pay more energy to the translation results in the on-screen display without intervening the sentence.

In one example, the client is further configured to determine a volume gain of the voice stream data; and adjusting the volume gain of the voice stream data according to the volume gain and the volume gain threshold value. By adopting the processing mode, the audio stream gain adjustment is supported, and the original text recognition accuracy is improved from the source.

In this embodiment, the client ends the gain adjustment module. The module can be understood as volume adjustment, the client can dynamically draw an audio waveform diagram according to the audio stream data, judge whether the volume of the incoming voice is too large or too small according to the audio waveform diagram, and then change the volume gain by adjusting a volume gain switch. The reason for this is because a reasonably loud audio stream is positively helpful in improving the quality of the algorithm identification. Therefore, the volume gain is reasonably adjusted, so that the voice recognition quality is improved, and the workload of manual intervention is further reduced.

In one example, after receiving the manually corrected first clause text sent by the client, the server may further perform correction processing on the first clause text, and send the corrected third clause text to the client. Correspondingly, the server determines the second clause text corresponding to the third clause text. Correspondingly, the client may also display the corrected third clause text corresponding to the first clause text returned by the server. By adopting the processing mode, the original text is optimized through a machine, and the quality of the translated text can be improved through translating the optimized original text, so that the effect of improving the correction quality of the translated text is achieved.

In particular, the machine-corrected text may be an optimized text, such as a reverse text normalization (Inverse Text Normalization, ITN). The ITN uses standard formatting to expose objects such as date, time, address, and amount.

In a specific implementation, the machine correcting the original text may be performing entity word replacement processing on the first clause text according to the entity word replacement rule information. The entity word substitution rules include, but are not limited to: name replacement rules, business entity name replacement rules. Entity word replacement module: the method supports automatic replacement of a certain entity word A into a entity word B, and is used for solving some frequently-occurring and fixed errors and reducing the manual correction cost in certain scenes, such as ' Hema Mr ' = > ' Hema Mr. The processing mode of entity word replacement is adopted, so that the quality of the original text can be improved by an automatic means.

In a specific implementation, the machine-corrected text may perform a blacklist filtering process on the first clause text according to the blacklist filtering rule information. Blacklist filtering module: in the scene of real-time voice recognition and machine translation, when the caption is displayed to the audience, all voice recognition results and machine translation results are filtered by a black name word list, and illegal words such as yellow words, explosion words and the like are filtered. The number of the black list vocabularies is large, hundreds of the black list vocabularies are few, tens of thousands of black list vocabularies are more than few, the black list vocabularies are matched by adopting an AC automaton algorithm technically, and then the black list is replaced by an empty character string by adopting a character string replacing method.

In a specific implementation, the machine correction text may also be a dialect correction process performed on the first clause text. The term "dialect" refers to a special, very meaning, also called white (Vernacular), earth or earth sounds, such as "octave" of Beijing. By adopting the processing mode, the dialect in the voice stream data is converted into the standard language, so that the correct translation can be determined; therefore, the correction quality of the translation text can be effectively improved.

In one example, the server is further configured to perform correction processing on the second clause text, such as translation entity word replacement, blacklist filtering, and the like.

In one example, the server is further configured to optimize a speech recognition model and/or a speech translation model based on the hotword information. The hotword management module: the hot word is different from the entity word replacement, can be used for optimizing an algorithm model, and is sent to a server side for improving the occurrence probability of the hot word. When a user configures a hotword through a client, the user can correspondingly configure the weight value of the hotword, and the probability of occurrence of the hotword is higher when the weight value is higher.

In one example, the client is further configured to display the first clause text in the source language and the second clause text in the target language in a sentence-aligned manner. By adopting the processing mode, the original text and the translated text are displayed in a comparison manner in the same area, so that the user can refer to each other for correction; therefore, the correction quality can be effectively improved, and the user experience is improved.

In one example, the server may be further configured to determine a translation uncertainty word included in the second clause text and a plurality of candidate translation words of the translation uncertainty word; sending the word with uncertain translation and the candidate translation word to a client so that a client user modifies the word with uncertain translation according to the candidate translation word; correspondingly, the client is further used for receiving the words with uncertain translations and a plurality of candidate translation words of the words with uncertain translations, which are included in the second clause text sent by the server; and modifying the words with uncertain translations according to the candidate translation words.

The term with uncertain translation includes the translation corresponding to the original text with various meanings. The machine still cannot determine which translation is more accurate based on the context information. In this case, the server may give a flag (i.e., a plurality of candidate words) indicating that the word translation may be inaccurate, prompting manual correction or confirmation, and there may be a plurality of identical or similar words. In particular implementations, the term of uncertainty of the translation and the plurality of candidate translation terms may be determined based on a list of similar terms. By adopting the processing mode, the correction quality of the translation text can be effectively improved.

As can be seen from the above embodiments, in the voice translation text correction system provided in the embodiments of the present application, a server determines a text segment in a source language corresponding to voice stream data collected by a client in real time, and sends the text segment to the client; receiving the manually corrected first clause text sent by the client, determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client; the client acquires voice stream data in real time and sends the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text; according to the processing mode, along with the real-time voice recognition progress, the source language clause text (such as comma separated half-sentence) is manually corrected, and before one sentence recognition is completed, the manually corrected source language clause text is translated, so that the translation text correction processing of clause granularity is realized, and the wrong translation text is prevented from staying on a screen for a longer time; therefore, the correction efficiency of the voice translation text can be effectively improved, and the display time of the error translation text can be effectively shortened. In addition, because the source language clause text based on manual correction is translated, the correction quality of the voice translation text can be effectively improved. In addition, the difficulty of judging the text error and intervening is smaller than that of judging the translation error and intervening, so that the correction efficiency and the correction quality can be further improved.

Second embodiment

Corresponding to the above-mentioned speech translation text correction system, the present application also provides a speech translation text correction method, and the execution subject of the method includes, but is not limited to, a server. The same parts of the present embodiment as those of the first embodiment will not be described again, please refer to the corresponding parts in the first embodiment.

In this embodiment, the method includes the steps of:

step 1: determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and transmitting the text segment to the client;

step 2: receiving a first clause text after manual correction sent by a client;

step 3: and determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client.

In one example, the method may further comprise the steps of: performing correction processing on the first clause text as corrected third clause text; accordingly, the determining the second clause text of the target language corresponding to the first clause text may be performed in the following manner: and determining the second clause text corresponding to the third clause text.

In one example, the correction processing performed on the first clause text may be as follows: and executing dialect correction processing on the first clause text.

In one example, the correction processing performed on the first clause text may be as follows: and executing entity word replacement processing on the first clause text according to the entity word replacement rule information.

The entity word replacement rule includes: name replacement rules, business entity name replacement rules.

In one example, the correction processing performed on the first clause text may be as follows: and executing blacklist filtering processing on the first clause text according to the blacklist filtering rule information.

In one example, the method may further comprise the steps of: and performing correction processing on the second clause text.

In one example, the method may further comprise the steps of: and optimizing the voice recognition model and/or the voice translation model according to the hotword information.

In one example, the method may further comprise the steps of: determining a translation uncertainty word included in the second clause text and a plurality of candidate translation words of the translation uncertainty word; and sending the word with the uncertain translation and the candidate translation word to the client so that a user of the client can modify the word with the uncertain translation according to the candidate translation word.

In particular implementations, the term of uncertainty of the translation and the plurality of candidate translation terms may be determined based on a list of similar terms.

Third embodiment

In the above embodiment, a method for correcting a speech translation text is provided, and correspondingly, the present application also provides a device for correcting a speech translation text. The device corresponds to the embodiment of the method described above.

The same parts of the present embodiment as those of the first embodiment will not be described again, please refer to the corresponding parts in the first embodiment. The device for correcting the text of the voice translation provided by the application comprises:

Fourth embodiment

The application also provides electronic equipment. Since the apparatus embodiments are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

An electronic device of the present embodiment includes: a processor and a memory; a memory for storing a program for implementing a method for correcting a speech translation text, the apparatus being powered on and executing the program for the method by the processor, and performing the steps of: determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and transmitting the text segment to the client; receiving a first clause text after manual correction sent by a client; and determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client.

Fifth embodiment

Corresponding to the above-mentioned voice translation text correction system, the present application also provides a voice translation text correction method, where the execution body of the method includes, but is not limited to, a server, and any device capable of implementing the method. The same parts of the present embodiment as those of the first embodiment will not be described again, please refer to the corresponding parts in the first embodiment.

In this embodiment, the method includes the steps of:

step 1: collecting voice stream data in real time, and sending the voice stream data to a server;

step 2: displaying a text segment of a source language corresponding to the voice stream data and returned by the server;

Step 3: determining a first clause text after manual correction, and sending the first clause text to a server;

step 4: and displaying a second clause text of the target language corresponding to the first clause text and returned by the server.

In one example, the method may further comprise the steps of: determining a manually corrected third clause text corresponding to the second clause text; and updating the displayed second clause text into a third clause text.

In one example, a manual correction original text process is performed by a first display device; and displaying the source language text and the target language text corresponding to the voice progress through a second display device.

In one example, the determining the manually corrected first clause text may include the sub-steps of: determining that a second display device of the first display device has displayed a history of completed source language; and adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute.

In one example, the determining the manually corrected first clause text may include the sub-steps of: and adjusting the first clause text according to the adjusted punctuation marks.

In one example, the determining the manually corrected first clause text may include the sub-steps of: and according to the single step back instruction, restoring the text before single step modification.

In one example, the determining the manually corrected first clause text may include the sub-steps of: and restoring the sentence text before modification according to the sentence back instruction.

In one example, the determining the manually corrected first clause text may include the sub-steps of: each sentence text is displayed in a sentence-isolated manner.

In one example, the determining the manually corrected first clause text may include the sub-steps of: and displaying the sentence text focused by the cursor with the first display attribute, and displaying the sentence text focused by the non-cursor with the second display attribute.

In one example, the determining the manually corrected first clause text may include the sub-steps of: and if the text selection operation is executed, displaying a text processing shortcut operation option.

The text processing shortcut options include, but are not limited to: the method comprises the steps of adding a hotword option, adding an entity word replacement rule option, a human-name pronoun quick switching option, a punctuation quick switching option, a text region deleting option and a whole sentence deleting option.

In one example, the determining the manually corrected third clause text corresponding to the second clause text may include the sub-steps of: each sentence text is displayed in a sentence-isolated manner.

In one example, the determining the manually corrected third clause text corresponding to the second clause text may include the sub-steps of: the sentence text of the target language focused by the cursor is displayed with the first display attribute, and the sentence text of the target language focused by the non-cursor is displayed with the second display attribute.

In one example, the determining the manually corrected third clause text corresponding to the second clause text may include the sub-steps of: determining sentence text in a source language corresponding to sentence text in a target language focused by a cursor; and displaying sentence text of the source language with a first display attribute.

The first display attribute includes: highlighting; the second display attribute includes: non-highlighting.

In one example, the determining the manually corrected third clause text corresponding to the second clause text may include the sub-steps of: and deleting the sentence text according to the sentence deleting instruction.

In one example, a manual correction translation process is performed by a first display device; displaying a source language text and a target language text corresponding to the voice progress through a second display device; the determining the manually corrected third clause text corresponding to the second clause text may comprise the sub-steps of: determining that a second display device of the first display devices has displayed a history text of the completed target language; and adjusting the historical text of the target language displayed by the first display device to be a third display attribute.

In one example, the method may further comprise the steps of: determining a volume gain of the voice stream data; and adjusting the volume gain of the voice stream data according to the volume gain and the volume gain threshold value.

In one example, the method may further comprise the steps of: the first clause text of the source language and the second clause text of the target language are displayed in sentence alignment.

In one example, the method may further comprise the steps of: receiving a word with uncertain translation and a plurality of candidate translation words of the word with uncertain translation, which are included in the second clause text and sent by a server; and modifying the words with uncertain translations according to the candidate translation words.

Sixth embodiment

Seventh embodiment

The application also provides an electronic device embodiment. Since the apparatus embodiments are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

An electronic device of the present embodiment includes: a processor and a memory; a memory for storing a program for implementing a method for correcting a speech translation text, the apparatus being powered on and executing the program for the method by the processor, and performing the steps of: collecting voice stream data in real time, and sending the voice stream data to a server; displaying a text segment of a source language corresponding to the voice stream data and returned by the server; determining a first clause text after manual correction, and sending the first clause text to a server; and displaying a second clause text of the target language corresponding to the first clause text and returned by the server.

Eighth embodiment

Corresponding to the voice translation text correction system, the application also provides a voice translation text correction system. The same parts of the present embodiment as those of the first embodiment will not be described again, please refer to the corresponding parts in the first embodiment.

In this embodiment, the system includes a server and a client. The server side is used for determining a text fragment of a source language corresponding to the voice data playing progress and sending the text fragment to the client side; receiving the manually corrected first clause text sent by the client, and determining a second clause text of the target language corresponding to the first clause text; the client is used for playing the voice data, displaying the text fragment, determining the first clause text and sending the first clause text.

The system provided in this embodiment is different from the system provided in the first embodiment, and includes: the voice data is different. The voice data in this embodiment may be a complete voice data collected in advance, such as a complete audio file submitted by a user, instead of a voice data stream collected and uploaded in real time.

In specific implementation, the server is further configured to send the second clause text to the client, so as to perform manual correction processing on the second clause text; correspondingly, the client is also used for displaying the second clause text sent by the server; and determining the manually corrected second clause text corresponding to the second clause text.

As can be seen from the above embodiments, in the speech translation text correction system provided in the embodiments of the present application, a server determines a text segment of a source language corresponding to a speech data playing progress, and sends the text segment to a client; receiving the manually corrected first clause text sent by the client, and determining a second clause text of the target language corresponding to the first clause text; the client plays the voice data, displays a text fragment, determines a first clause text and sends the first clause text; according to the processing mode, along with the voice playing progress, the source language clause text (such as comma separated half-sentence) is manually corrected, and before the sentence recognition is completed, the manually corrected source language clause text is translated, so that translation text correction processing of clause granularity is realized; therefore, the correction quality and correction efficiency of the voice translation text can be effectively improved.

While the preferred embodiment has been described, it is not intended to limit the invention thereto, and any person skilled in the art may make variations and modifications without departing from the spirit and scope of the present invention, so that the scope of the present invention shall be defined by the claims of the present application.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

1. Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer readable media, as defined herein, does not include non-transitory computer readable media (transmission media), such as modulated data signals and carrier waves.

2. It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A speech translation text correction system, comprising:

the client is used for collecting voice stream data in real time and sending the voice stream data; and displaying the text segment; performing, by a first display device, manual correction textual processing to determine the first clause text; and, transmitting the first clause text; displaying the second clause text; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying the source language text and the target language text corresponding to the voice progress.

2. A method for correcting a speech translation text, comprising:

receiving a first clause text after manual correction sent by a client;

determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client so that the client displays the second clause text;

wherein the first clause text is processed by the client in the following manner: displaying the text segment; performing, by a first display device, manual correction textual processing to determine the first clause text; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying the source language text and the target language text corresponding to the voice progress.

3. The method as recited in claim 2, further comprising:

and determining the second clause text corresponding to the third clause text.

4. The method of claim 3, wherein the step of,

the performing correction processing on the first clause text includes:

and executing dialect correction processing on the first clause text.

5. The method of claim 3, wherein the step of,

the performing correction processing on the first clause text includes:

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

7. The method of claim 3, wherein the step of,

the performing correction processing on the first clause text includes:

8. The method as recited in claim 2, further comprising:

And performing correction processing on the second clause text.

9. The method as recited in claim 2, further comprising:

10. The method as recited in claim 2, further comprising:

11. The method of claim 10, wherein the step of determining the position of the first electrode is performed,

and determining the words with uncertain translations and the candidate translation words according to the similar word list.

12. A method for correcting a speech translation text for a client, comprising:

performing manual correction original text processing through a first display device to determine a first clause text after manual correction; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying a source language text and a target language text corresponding to the voice progress;

Transmitting the first clause text to a server;

13. The method as recited in claim 12, further comprising:

and updating the displayed second clause text into a third clause text.

14. The method of claim 12, wherein the step of determining the position of the probe is performed,

the determining the manually corrected first clause text comprises the following steps:

15. The method of claim 12, wherein the step of determining the position of the probe is performed,

16. The method of claim 12, wherein the step of determining the position of the probe is performed,

17. The method of claim 12, wherein the step of determining the position of the probe is performed,

each sentence text is displayed in a sentence-isolated manner.

18. The method of claim 12, wherein the step of determining the position of the probe is performed,

19. The method of claim 12, wherein the step of determining the position of the probe is performed,

20. The method of claim 19, wherein the step of determining the position of the probe comprises,

21. The method of claim 13, wherein the step of determining the position of the probe is performed,

each sentence text is displayed in a sentence-isolated manner.

22. The method of claim 13, wherein the step of determining the position of the probe is performed,

23. The method of claim 22, wherein the step of determining the position of the probe is performed,

24. The method of claim 22, wherein the step of determining the position of the probe is performed,

the first display attribute includes: highlighting;

the second display attribute includes: non-highlighting.

25. The method of claim 13, wherein the step of determining the position of the probe is performed,

and deleting the sentence text according to the sentence deleting instruction.

26. The method of claim 13, wherein the step of determining the position of the probe is performed,

performing manual correction translation processing by the first display device;

27. The method as recited in claim 13, further comprising:

determining a volume gain of the voice stream data;

28. The method as recited in claim 12, further comprising:

29. The method as recited in claim 12, further comprising:

30. A speech translation text correction apparatus, comprising:

the voice translation unit is used for determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client so that the client displays the second clause text;

31. An electronic device, comprising:

a processor; and

a memory for storing a program for implementing a method for correcting a speech translation text, the apparatus being powered on and executing the program for the method by the processor, and performing the steps of: determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and transmitting the text segment to the client; receiving a first clause text after manual correction sent by a client; determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client so that the client displays the second clause text; wherein the first clause text is processed by the client in the following manner: displaying the text segment; performing, by a first display device, manual correction textual processing to determine the first clause text; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying the source language text and the target language text corresponding to the voice progress.

32. A speech translation text correction apparatus for a client, comprising:

an original text correction unit for performing manual correction original text processing through the first display device to determine a manually corrected first clause text; transmitting the first clause text to a server; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; the historical texts of the source language which are displayed by the first display device and are already displayed are adjusted to be of a third display attribute, and the second display device is used for displaying the source language texts and the target language texts corresponding to the voice progress;

33. An electronic device, comprising:

A processor; and

a memory for storing a program for implementing a method for correcting a speech translation text, the apparatus being powered on and executing the program for the method by the processor, and performing the steps of: collecting voice stream data in real time, and sending the voice stream data to a server; displaying a text segment of a source language corresponding to the voice stream data and returned by the server; performing manual correction original text processing through a first display device to determine a first clause text after manual correction; transmitting the first clause text to a server; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; the historical texts of the source language which are displayed by the first display device and are already displayed are adjusted to be of a third display attribute, and the second display device is used for displaying the source language texts and the target language texts corresponding to the voice progress; and displaying a second clause text of the target language corresponding to the first clause text and returned by the server.

34. A speech translation text correction system, comprising:

The client is used for playing the voice data and displaying text fragments; performing manual correction original text processing through a first display device to determine a first clause text after manual correction; transmitting a first clause text; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying the source language text and the target language text corresponding to the playing progress.

35. A method for correcting a speech translation text, comprising:

receiving a first clause text after manual correction sent by a client;

determining a second clause text of the target language corresponding to the first clause text;

wherein the first clause text is processed by the client in the following manner: displaying the text segment; performing, by a first display device, manual correction textual processing to determine the first clause text; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying the source language text and the target language text corresponding to the playing progress.

36. The method as recited in claim 35, further comprising:

37. A method for correcting a speech translation text for a client, comprising:

playing the voice data;

performing manual correction original text processing through a first display device to determine a first clause text after manual correction; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying a source language text and a target language text corresponding to the playing progress;

and sending the first clause text to the server side so that the server side determines a second clause text of the target language corresponding to the first clause text.

38. The method as recited in claim 37, further comprising:

Displaying a second clause text sent by the server;