CN113591491A

CN113591491A - System, method, device and equipment for correcting voice translation text

Info

Publication number: CN113591491A
Application number: CN202010366777.5A
Authority: CN
Inventors: 曹宇
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2021-11-02
Anticipated expiration: 2040-04-30
Also published as: CN113591491B

Abstract

The application discloses a system, a method and a device for correcting a speech translation text and related equipment. The system determines a source language text segment corresponding to voice stream data acquired by a client in real time through a server, and sends the text segment to the client; receiving the manually corrected first clause text sent by the client, determining a target language second clause text corresponding to the first clause text, and sending the second clause text to the client; the client collects voice stream data in real time and sends the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text. By adopting the processing mode, the original text clause is manually corrected along with the real-time speech recognition progress, and the manually corrected original text clause is translated before the recognition of a sentence is finished, so that the translation text correction processing of clause granularity is realized; therefore, the correction efficiency and the correction quality can be effectively improved.

Description

System, method, device and equipment for correcting voice translation text

Technical Field

The application relates to the technical field of machine voice translation, in particular to a system, a method and a device for correcting a voice translation text and an electronic device.

Background

With the coming of the information international era and the increasing pressure of various social demands, the research on the automatic speech translation technology is receiving more and more extensive attention. Speech Translation, also commonly referred to as Spoken Language Translation (SLT), is a process by which a computer effects speech Translation from speech in one Language to speech in another Language.

Speech translation is a technique that combines speech recognition with machine translation. In a scene of combining real-time speech recognition with machine translation (simultaneous interpretation), due to the influence of technical, environmental and human factors, the speech recognition result is inaccurate, and the translation result is wrong, so that manual intervention is needed to perform real-time intervention and correction on the recognition result and the translation result. In real-time recognition scenes, the difficulty of completely manual intervention is high due to the fast pace and short time. At present, a typical speech translation result correction method is to recognize and translate speech through a real-time speech translation model, display the translation result on a screen in real time, perform automatic intervention processing on the speech translation result after a whole sentence is translated, and perform manual intervention processing.

However, in the process of implementing the invention, the inventor finds that the prior art has at least the following problems: because the automatic intervention processing needs to be performed on the voice translation result after the translation of the whole sentence is completed, and only the simple translation result intervention capability is provided, the speed of performing the automatic intervention on the voice translation result is slow, and further the manual intervention access time is late, so that the retention time of wrong translation subtitles on a screen is long, and the correction quality of the translation text is poor. Therefore, how to improve the overall correction efficiency and quality of the speech translation result, so as to shorten the retention time of the wrong translated caption on the screen and improve the translation quality, becomes a problem that needs to be solved urgently by those skilled in the art.

Disclosure of Invention

The application provides a voice translation text correction system to solve the problem that the voice translation text correction quality and correction efficiency in the prior art are low. The application further provides a method and a device for correcting the voice translation text and electronic equipment.

The application provides a speech translation text correction system, including:

the server is used for determining a text segment of a source language corresponding to the voice stream data collected by the client in real time and sending the text segment to the client; receiving the manually corrected first clause text sent by the client, determining a second clause text of a target language corresponding to the first clause text, and sending the second clause text to the client;

the client is used for collecting voice stream data in real time and sending the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text.

The application also provides a method for correcting the voice translation text, which comprises the following steps:

determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and sending the text segment to the client;

receiving a first clause text which is sent by a client and is manually corrected;

and determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client.

Optionally, the method further includes:

performing correction processing on the first clause text as a corrected third clause text;

the determining of the second clause text of the target language corresponding to the first clause text comprises:

determining the second clause text corresponding to the third clause text.

Optionally, the performing a correction process on the first clause text includes:

dialect correction processing is performed on the first clause text.

and executing entity word replacement processing on the first clause text according to the entity word replacement rule information.

Optionally, the entity word replacement rule includes: a person name replacement rule and an enterprise entity name replacement rule.

and performing blacklist filtering processing on the first clause text according to the blacklist filtering rule information.

Optionally, the method further includes:

correction processing is performed on the second clause text.

Optionally, the method further includes:

and optimizing the voice recognition model and/or the voice translation model according to the hot word information.

Optionally, the method further includes:

determining a plurality of candidate translated words of the words of which the second clause text comprises uncertain words of the translated text and uncertain words of the translated text;

and sending the uncertain words of the translated text and the candidate translated text words to the client, so that a user of the client modifies the uncertain words of the translated text according to the candidate translated text words.

Optionally, the uncertain word of the translated text and the plurality of candidate translated text words are determined according to a similar word list.

collecting voice stream data in real time and sending the voice stream data to a server;

displaying a text segment of a source language corresponding to voice stream data returned by the server;

determining a first clause text after manual correction, and sending the first clause text to a server;

and displaying the second clause text of the target language corresponding to the first clause text returned by the server.

Optionally, the method further includes:

determining a third clause text after manual correction corresponding to the second clause text;

and updating the displayed second clause text into a third clause text.

Optionally, the first display device is used for executing manual original text correction processing;

and displaying the source language text and the target language text corresponding to the voice progress through a second display device.

Optionally, the determining the manually corrected first clause text includes:

determining that a second display device of the first display devices has displayed a completed history text of the source language;

and adjusting the history text of the source language which is displayed completely by the first display device to be a third display attribute.

Optionally, the determining the manually corrected first clause text includes:

and adjusting the first clause text according to the adjusted punctuation marks.

Optionally, the determining the manually corrected first clause text includes:

and restoring the text before the single step modification according to the single step backspace instruction.

Optionally, the determining the manually corrected first clause text includes:

and restoring the sentence text before modification according to the sentence backspacing instruction.

Optionally, the determining the manually corrected first clause text includes:

in a sentence isolation manner, each sentence text is displayed.

Optionally, the determining the manually corrected first clause text includes:

the sentence text focused by the cursor is displayed with the first display attribute, and the sentence text not focused by the cursor is displayed with the second display attribute.

Optionally, the determining the manually corrected first clause text includes:

and if the text selection operation is executed, displaying a text processing shortcut operation option.

Optionally, the text processing shortcut operation option includes: adding hot word options, adding entity word replacement rule options, person-name pronoun fast switching options, punctuation mark fast switching options, marking text area deleting options and whole sentence deleting options.

Optionally, the determining the third manually corrected clause text corresponding to the second clause text includes:

in a sentence isolation manner, each sentence text is displayed.

and displaying the sentence text of the target language focused by the cursor by using the first display attribute, and displaying the sentence text of the target language not focused by the cursor by using the second display attribute.

determining sentence text of a source language corresponding to the sentence text of the target language focused by the cursor;

displaying the sentence text of the source language with a first display attribute.

Optionally, the first display attribute includes: highlighting;

the second display attribute includes: and (4) non-highlighting.

and deleting the sentence text according to the sentence deletion instruction.

Optionally, the first display device is used for executing manual translation correction processing;

displaying, by a second display device, a source language text and a target language text corresponding to the voice progress;

the determining of the artificially corrected third clause text corresponding to the second clause text includes:

determining that a second display device of the first display devices has displayed a history text of a completed target language;

and adjusting the historical text of the target language displayed by the first display device to be the third display attribute.

Optionally, the method further includes:

determining a volume gain of the voice stream data;

and adjusting the volume gain of the voice stream data according to the volume gain and the volume gain threshold value.

Optionally, the method further includes:

first clause text in the source language and second clause text in the target language are displayed in sentence-aligned fashion.

Optionally, the method further includes:

receiving words with uncertain translations included in the second clause text sent by a server and a plurality of candidate translation words of the words with uncertain translations;

and modifying the uncertain words of the translated text according to the candidate translated words.

The present application also provides a speech translation text correction apparatus, including:

the voice recognition unit is used for determining a text segment of a source language corresponding to voice stream data acquired by the client in real time and sending the text segment to the client;

the data receiving unit is used for receiving the manually corrected first clause text sent by the client;

and the voice translation unit is used for determining a second clause text of the target language corresponding to the first clause text and sending the second clause text to the client.

The present application further provides an electronic device, comprising:

a processor; and

a memory for storing a program for implementing a speech translation text correction method, the apparatus performing the following steps after being powered on and running the program of the method by the processor: determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and sending the text segment to the client; receiving a first clause text which is sent by a client and is manually corrected; and determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client.

the voice data acquisition and sending unit is used for acquiring voice stream data in real time and sending the voice stream data to the server;

the original text display unit is used for displaying the text segments of the source language corresponding to the voice stream data returned by the server;

the original text correction unit is used for determining the manually corrected first clause text and sending the first clause text to the server;

and the translation display unit is used for displaying the second clause text of the target language corresponding to the first clause text returned by the server.

The present application further provides an electronic device, comprising:

a processor; and

a memory for storing a program for implementing a speech translation text correction method, the apparatus performing the following steps after being powered on and running the program of the method by the processor: collecting voice stream data in real time and sending the voice stream data to a server; displaying a text segment of a source language corresponding to voice stream data returned by the server; determining a first clause text after manual correction, and sending the first clause text to a server; and displaying the second clause text of the target language corresponding to the first clause text returned by the server.

The present application further provides a system for correcting a speech translation text, comprising:

the server is used for determining a text segment of a source language corresponding to the voice data playing progress and sending the text segment to the client; receiving the manually corrected first clause text sent by the client, and determining a second clause text of the target language corresponding to the first clause text;

and the client is used for playing the voice data, displaying the text segment, determining the first clause text and sending the first clause text.

determining a text segment of a source language corresponding to the playing progress of the voice data, and sending the text segment to a client;

a second clause text in the target language corresponding to the first clause text is determined.

Optionally, the method further includes:

and sending the second clause text to the client so as to perform manual correction processing on the second clause text.

playing the voice data;

displaying a text segment of a source language corresponding to the voice data playing progress sent by the server;

and determining a first clause text, and sending the first clause text to the server side, so that the server side determines a second clause text of the target language corresponding to the first clause text.

Optionally, the method further includes:

displaying a second clause text sent by the server;

and determining the manually corrected second clause text corresponding to the second clause text.

The present application also provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the various methods described above.

The present application also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the various methods described above.

Compared with the prior art, the method has the following advantages:

the voice translation text correction system provided by the embodiment of the application determines a source language text segment corresponding to voice stream data collected by a client in real time through a server and sends the text segment to the client; receiving the manually corrected first clause text sent by the client, determining a target language second clause text corresponding to the first clause text, and sending the second clause text to the client; the client collects voice stream data in real time and sends the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text; the processing mode can manually correct the source language clause text (such as comma-separated half clause) along with the real-time speech recognition progress, and translate the manually corrected source language clause text before one sentence recognition is finished, so that clause-granularity translation text correction processing is realized, and the phenomenon that wrong translation text stays on a screen for a longer time is avoided; therefore, the correction efficiency of the speech translation text can be effectively improved, and the display time of the wrong translation text can be effectively shortened. In addition, because the source language clause text based on manual correction is translated, the correction quality of the speech translation text can be effectively improved. In addition, the difficulty of judging the original text error and intervening is lower than that of judging the translation error and intervening, so that the correction efficiency and the correction quality can be further improved.

The voice translation text correction system provided by the embodiment of the application determines a text segment of a source language corresponding to the voice data playing progress through the server and sends the text segment to the client; receiving the manually corrected first clause text sent by the client, and determining a second clause text of the target language corresponding to the first clause text; the client plays the voice data, displays text segments, determines a first clause text and sends the first clause text; the processing mode can manually correct the source language clause text (such as comma-separated half clause) along with the voice playing progress, and translate the manually corrected source language clause text before one sentence is recognized, so as to realize the translation text correction processing of clause granularity; therefore, the correction quality and correction efficiency of the speech translation text can be effectively improved.

Drawings

FIG. 1 is a schematic block diagram of an embodiment of a system for correcting a speech translation text provided by the present application;

FIG. 2 is a schematic diagram of an application scenario of an embodiment of a speech translation text correction system provided in the present application;

FIG. 3 is a schematic diagram of an interaction of an apparatus of an embodiment of a system for correcting a speech translation text provided by the present application;

FIG. 4 is a schematic diagram of a human correction interface of an embodiment of a system for correcting a speech translation text provided by the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

In the application, a system, a method and a device for correcting the text of the speech translation and an electronic device are provided. Each of the schemes is described in detail in the following examples.

First embodiment

Please refer to fig. 1, which is a block diagram of an embodiment of a speech translation text correction system according to the present application. The system comprises: a server 1 and a client 2.

The server 1 may be a server deployed on a cloud server, or may be a server dedicated to speech translation and translation text intervention processing, and may be deployed in a data center. The server may be a cluster server or a single server.

The client 2 includes but is not limited to a mobile communication device, namely: the mobile phone or the smart phone also includes terminal devices such as a personal computer, a PAD, and an iPad.

Please refer to fig. 2, which is a schematic view illustrating a scene of the speech translation text correction system according to the present application. The server and the client can be connected through a network, for example, the client can be networked through a WIFI or the like, and the like. In court trial sites, multi-person conferences and other occasions, a client acquires site voice stream data in real time and sends the data to a server; the server side determines a source language text corresponding to the voice stream data through the voice recognition model, determines a target language text through the voice translation model, and sends the two texts to the client side; and the client side projects the correction result to a large on-site screen for a user to watch. Meanwhile, the text correction user carries out manual correction processing on the two texts through the client, and the correction result is synchronously updated to the large on-site screen.

Please refer to fig. 3, which is a schematic diagram illustrating an apparatus interaction of an embodiment of a speech translation text correction system according to the present application. In this embodiment, the server is configured to determine a text segment of the source language corresponding to voice stream data acquired by the client in real time, and send the text segment to the client; receiving the manually corrected first clause text sent by the client, determining a second clause text of a target language corresponding to the first clause text, and sending the second clause text to the client; the client is used for collecting voice stream data in real time and sending the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text.

In the process of implementing the present invention, the inventor finds that since the translation result (translated text) is obtained by translating the speech recognition result (original text), translation errors are caused by errors in the speech recognition result in most cases, and the difficulty of judging the errors in the original text and intervening is smaller than that of judging the errors in the translated text and intervening, so that the system intervenes (corrects) the original text first. After the original text is subjected to intervention, the corrected clauses of the original text are combined with the concept of 'streaming result rapid translation', so that the effect of rapid intervention of the translated text is achieved, manual intervention operation can be performed without waiting for the recognition of the whole clause, the time of the error staying is shortened, and the error staying on the screen can be prevented from staying for a longer time aiming at the scene of real-time subtitle on-screen.

In this embodiment, the client deploys a speech recognition result editing module, which may also be referred to as a text editing module, and this module is a core module of the system and is responsible for editing a speech recognition result (text), and this module is implemented based on an editor principle, and a specific implementation may be to set a text tag with a text attribute "true" so that the text tag has an editable capability.

In specific implementation, the original text is corrected manually, and at least one of the following modes can be adopted:

determining that a second display device in the first display device already displays a finished history text of the source language; and adjusting the historical text of the source language displayed by the first display device to be the third display attribute.

In the present embodiment, the client executes manual correction original text processing by a first user (text corrector) through a first display device (correction processing screen); the source language text and the target language text corresponding to the progress of the voice are displayed to a second user (live audience) through a second display device (live presentation screen).

In the scene of screen casting of subtitles, the sentences can be judged to leave the display area of the field display screen, and then the sentences are subjected to graying processing in the correction processing screen by modifying the cs style so as to prompt field control personnel that the sentences are not displayed on the field display screen, so that excessive intervention is not required, and time waste and useless work are avoided.

And secondly, adjusting the first clause text according to the adjusted punctuation marks.

This method allows one sentence to be divided into two sentences by inserting the final punctuation marks such as the sentence number into the original text, or the final punctuation marks of the sentence to be deleted or changed into the punctuation marks in the sentence to be combined with the latter sentence into a new sentence. Because the translation service is called by taking a single sentence (which can be a clause) as a unit, the quality of a translation result can be effectively improved by reasonable sentence breaking and combining operations, and the effect of improving the correction quality of a translation text is achieved.

In a specific implementation, before a sentence is recognized, if a comma occurs, the background optimization service and the translation service are called in advance for the content of the clause before the comma instead of waiting until the recognition of the whole sentence is completed. The method has the advantages that under the scene that the original text and the translated text need to be displayed on a screen in real time, the more accurate recognition result after the background optimization processing can be displayed to the audience to see more quickly, and similarly, the translation result of part of clauses can be displayed to the audience to see as soon as possible.

And thirdly, restoring the text before the last step of modification according to the single step backspace instruction.

In specific implementation, the method may support a shortcut key for monitoring command + z in the mac system (a shortcut key for monitoring ctrl + z in the windows system) to perform rollback operation on the content being edited, so that field control personnel can perform single-step fast rollback when an intervention error occurs.

And fourthly, restoring the sentence text before modification according to the sentence backspacing instruction.

In a specific implementation, the monitoring esc of the key pressing event can be supported, and the one-key reduction operation is performed on the sentence being edited. For example, when human intervention is performed, the sentence does not need to be intervened, and at this time, the focus can be forcibly out by pressing esc button, and the specific implementation mode can be that a blu () method is executed on the editor, and the edited content is replaced by the original content before editing, and the method is that the compiling field control personnel quickly performs complete rollback of the whole sentence when the intervention is wrong.

And fifthly, displaying each sentence text in a sentence isolation mode.

In specific implementation, cs style processing may be performed on the complete single statement. For example, as shown in fig. 4, a closed frame is added to each sentence, and a distance is added between the frames of the two sentences, so that the field control personnel can clearly divide each sentence in the recognition result.

And sixthly, displaying the sentence text focused by the cursor by using the first display attribute, and displaying the sentence text not focused by the cursor by using the second display attribute.

When the cursor focuses on the content of a certain sentence, the sentence is highlighted by modifying the css style when a focus event is monitored, and field control personnel can conveniently and accurately position the content being edited and the complete content of the sentence.

And a seventh mode, if the text selection operation is executed, displaying the text processing shortcut operation option.

The text processing shortcut operation option includes: adding hot word options, adding entity word replacement rule options, person-name pronoun fast switching options, punctuation mark fast switching options, marking text area deleting options and whole sentence deleting options.

During specific implementation, the text can be scratched and selected through a mouse, the shortcut operation bar can be automatically displayed near the scratching and selecting area for one-key intervention, and the shortcut operation comprises the following steps: adding hot words, adding entity word replacement rules, quickly switching among other words, quickly switching between punctuation marks, deleting in a selection area and deleting in a whole sentence.

In one example, the translation text corrector can also correct the second clause text (translation) through the client, and the client determines a third manually corrected clause text corresponding to the second clause text; and updating the second clause text thrown into the large on-site screen into a third clause text. By adopting the processing mode, the translation result is directly edited, and the method is applied to the condition that the original text is correctly identified but the translation result is inaccurate, and the translated text content can be directly edited.

In this embodiment, the client deploys a machine translation result editing module, which may also be referred to as a translation editing module, and this module is a secondary important module in the system, and since the translation result is obtained by translating the speech recognition result, the translation error in most cases is caused by an error in the speech recognition result, and the difficulty of determining an original text error and intervention is less than the difficulty of determining a translation error and intervention, so the proportion of the translation intervention is less than that of the original text intervention, and the translation intervention function is less.

In specific implementation, the translated text is corrected manually, and at least one of the following modes can be adopted:

in the first mode, each sentence text is displayed in a sentence isolation mode.

When the method is specifically implemented, the translation result can be displayed according to sentences, sentence isolation is carried out through the css style, and the current sentence content can be clearly judged no matter whether the sentence is edited or deleted.

And secondly, displaying the sentence text of the target language focused by the cursor by using the first display attribute, and displaying the sentence text of the target language not focused by the cursor by using the second display attribute.

In a specific implementation, when the mouse is focused, the current sentence is highlighted by modifying the cs style.

In specific implementation, the sentence text of the source language corresponding to the sentence text of the target language focused by the cursor can be determined; displaying the sentence text of the source language with a first display attribute. By adopting the processing mode, the corresponding original text content is matched through the id, and then the corresponding original text sentence in the original text editing area is highlighted, so that the reference original text content can be contrasted when the translation is modified.

And thirdly, deleting the sentence text according to the sentence deleting instruction.

When the method is implemented specifically, the translation result can be deleted quickly by pressing a key F1-F10, when the translation result is very poor, the translation result can be deleted quickly by one key, the phenomenon that the wrong translation result stays on a screen for a long time is avoided, and the method is suitable for displaying the scene of the co-transmission result on the real-time screen.

Determining that the second display device in the first display device already displays the completed historical text of the target language; and adjusting the historical text of the target language displayed by the first display device to be the third display attribute.

In a specific implementation, in a screen-shot scene of the translation subtitles, by judging which translation results leave a screen display area and then modifying the cs style, the statement is grayed out, so that field control personnel can be prompted to put more attention on the translation results displayed on the screen without intervening the statement.

In one example, the client is further configured to determine a volume gain of the voice stream data; and adjusting the volume gain of the voice stream data according to the volume gain and the volume gain threshold value. By adopting the processing mode, the gain adjustment of the audio stream is supported, and the accuracy rate of original text recognition is improved from the source.

In this embodiment, the client deploys a gain adjustment module. The module can be understood as volume adjustment, the client can dynamically draw an audio waveform according to audio stream data, whether the volume of the transmitted voice is too large or too small can be judged through the audio waveform, and then the volume gain is changed by adjusting the volume gain switch. The reason for this is because audio streams of reasonable volume are positively helpful to improve the quality of algorithm recognition. Therefore, the volume gain is reasonably adjusted, the voice recognition quality is improved, and the workload of manual intervention is reduced.

In one example, after receiving the manually corrected first clause text sent by the client, the server may further perform correction processing on the first clause text, and send a corrected third clause text to the client. Correspondingly, the server determines the second clause text corresponding to the third clause text. Correspondingly, the client can also display the corrected third clause text corresponding to the first clause text and returned by the server. By adopting the processing mode, not only the original text is optimized through a machine, but also the optimized original text is translated, so that the quality of the translated text is improved, and the effect of improving the correction quality of the translated text is achieved.

In a specific implementation, the machine-corrected original Text may be an optimized original Text after manual correction, such as Inverse Text Normalization (ITN). The ITN uses standard formatting to present objects such as date, time, address, and amount.

In specific implementation, the machine-corrected original text may be processed by performing entity word replacement on the first clause text according to the entity word replacement rule information. The entity word replacement rule includes but is not limited to: a person name replacement rule and an enterprise entity name replacement rule. The entity word replacing module: the method supports automatic replacement of a certain entity word A into an entity word B, and is used for solving some frequently-occurring and fixed errors and reducing the manual correction cost in some scenes, such as ' river horse Mr ' box horse Mr '. The original text quality can be improved by an automatic means by adopting a processing mode of entity word replacement.

In specific implementation, the machine-corrected text may be subjected to blacklist filtering processing on the first clause text according to the blacklist filtering rule information. A blacklist filter module: in the scene of real-time voice recognition and machine translation, when the caption is displayed to audiences, all voice recognition results and machine translation results are filtered by a blacklist vocabulary, and illegal words related to yellow and explosion are filtered. The number of the blacklist vocabularies is large, the number is hundreds if the number is small, the number is tens of thousands if the number is large, the blacklist vocabularies are matched by adopting an AC automaton algorithm technically, and then the blacklist is replaced by an empty character string by adopting a character string replacement method.

In a specific implementation, the machine-corrected text may be dialect-corrected on the first clause text. The dialects are also called "white words" (Vernacular), local words or local voices ", such as" Beijing "for special and extraordinary meaning. By adopting the processing mode, the dialect in the voice stream data is converted into the standard language, so that the correct translation can be determined; therefore, the correction quality of the translated text can be effectively improved.

In one example, the server is further configured to perform a correction process on the second clause text, such as translation entity word replacement, blacklist filtering, and the like.

In one example, the server is further configured to optimize the speech recognition model and/or the speech translation model based on the hotword information. The hot word management module: the hot words are different from entity word replacement, the hot words can be used for optimizing an algorithm model, and the hot word representation is sent to the server side to improve the occurrence probability of the hot words. When the user configures the hot words through the client, the weight values of the hot words can be configured correspondingly, and the higher the weight values are, the higher the probability of the hot words is.

In one example, the client is further configured to display the first clause text in the source language and the second clause text in the target language in sentence-aligned fashion. By adopting the processing mode, the original text and the translated text are contrasted and displayed in the same area, so that the user can conveniently perform correction by referring to each other; therefore, the correction quality can be effectively improved, and the user experience is improved.

In one example, the server may be further configured to determine a word of the second clause text that is ambiguous and a plurality of candidate translated words of the word of the second clause text that is ambiguous; sending the uncertain words of the translated text and the candidate translated text words to a client, so that a client user can modify the uncertain words of the translated text according to the candidate translated text words; correspondingly, the client is also used for receiving the words with uncertain translations included in the second clause text sent by the server and a plurality of candidate translation words of the words with uncertain translations; and modifying the uncertain words of the translated text according to the candidate translated words.

The ambiguous word of the translation includes a translation corresponding to the original which may have multiple meanings. The machine still cannot determine which translation is more accurate according to the context information. In this case, the server may present a flag (i.e., multiple candidate words) indicating that the word translation may be inaccurate, prompting for manual correction or confirmation, and that there may be multiple identical or similar words. In specific implementation, the uncertain word of the translated text and the plurality of candidate translated text words can be determined according to a similar word list. By adopting the processing mode, the correction quality of the translated text can be effectively improved.

As can be seen from the foregoing embodiments, the speech translation text correction system provided in the embodiments of the present application determines, by the server, a source language text segment corresponding to speech stream data collected by the client in real time, and sends the text segment to the client; receiving the manually corrected first clause text sent by the client, determining a target language second clause text corresponding to the first clause text, and sending the second clause text to the client; the client collects voice stream data in real time and sends the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text; the processing mode can manually correct the source language clause text (such as comma-separated half clause) along with the real-time speech recognition progress, and translate the manually corrected source language clause text before one sentence recognition is finished, so that clause-granularity translation text correction processing is realized, and the phenomenon that wrong translation text stays on a screen for a longer time is avoided; therefore, the correction efficiency of the speech translation text can be effectively improved, and the display time of the wrong translation text can be effectively shortened. In addition, because the source language clause text based on manual correction is translated, the correction quality of the speech translation text can be effectively improved. In addition, the difficulty of judging the original text error and intervening is lower than that of judging the translation error and intervening, so that the correction efficiency and the correction quality can be further improved.

Second embodiment

Corresponding to the voice translation text correction system, the application also provides a voice translation text correction method, and the execution subject of the method includes but is not limited to a server. Parts of this embodiment that are the same as the first embodiment are not described again, please refer to corresponding parts in the first embodiment.

In this embodiment, the method includes the steps of:

step 1: determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and sending the text segment to the client;

step 2: receiving a first clause text which is sent by a client and is manually corrected;

and step 3: and determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client.

In one example, the method may further comprise the steps of: performing correction processing on the first clause text as a corrected third clause text; correspondingly, the determining of the second clause text of the target language corresponding to the first clause text may be performed as follows: determining the second clause text corresponding to the third clause text.

In one example, the correction processing performed on the first clause text may be performed as follows: dialect correction processing is performed on the first clause text.

In one example, the correction processing performed on the first clause text may be performed as follows: and executing entity word replacement processing on the first clause text according to the entity word replacement rule information.

The entity word replacement rule comprises: a person name replacement rule and an enterprise entity name replacement rule.

In one example, the correction processing performed on the first clause text may be performed as follows: and performing blacklist filtering processing on the first clause text according to the blacklist filtering rule information.

In one example, the method may further comprise the steps of: correction processing is performed on the second clause text.

In one example, the method may further comprise the steps of: and optimizing the voice recognition model and/or the voice translation model according to the hot word information.

In one example, the method may further comprise the steps of: determining a plurality of candidate translated words of the words of which the second clause text comprises uncertain words of the translated text and uncertain words of the translated text; and sending the uncertain words of the translated text and the candidate translated text words to the client, so that a user of the client modifies the uncertain words of the translated text according to the candidate translated text words.

In specific implementation, the uncertain word of the translated text and the plurality of candidate translated text words can be determined according to a similar word list.

Third embodiment

In the foregoing embodiment, a method for correcting a speech translation text is provided, and correspondingly, a device for correcting a speech translation text is also provided. The apparatus corresponds to an embodiment of the method described above.

Parts of this embodiment that are the same as the first embodiment are not described again, please refer to corresponding parts in the first embodiment. The application provides a speech translation text correction device includes:

Fourth embodiment

The application also provides an electronic device. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

An electronic device of the present embodiment includes: a processor and a memory; a memory for storing a program for implementing a speech translation text correction method, the apparatus performing the following steps after being powered on and running the program of the method by the processor: determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and sending the text segment to the client; receiving a first clause text which is sent by a client and is manually corrected; and determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client.

Fifth embodiment

Corresponding to the above-mentioned speech translation text correction system, the present application also provides a speech translation text correction method, and the execution subject of the method includes but is not limited to a server, and may also be any device capable of implementing the method. Parts of this embodiment that are the same as the first embodiment are not described again, please refer to corresponding parts in the first embodiment.

In this embodiment, the method includes the steps of:

step 1: collecting voice stream data in real time and sending the voice stream data to a server;

step 2: displaying a text segment of a source language corresponding to voice stream data returned by the server;

and step 3: determining a first clause text after manual correction, and sending the first clause text to a server;

and 4, step 4: and displaying the second clause text of the target language corresponding to the first clause text returned by the server.

In one example, the method may further comprise the steps of: determining a third clause text after manual correction corresponding to the second clause text; and updating the displayed second clause text into a third clause text.

In one example, by a first display device, a manual correction original text process is performed; and displaying the source language text and the target language text corresponding to the voice progress through a second display device.

In one example, the determining the manually corrected first clause text may include the sub-steps of: determining that a second display device of the first display devices has displayed a completed history text of the source language; and adjusting the history text of the source language which is displayed completely by the first display device to be a third display attribute.

In one example, the determining the manually corrected first clause text may include the sub-steps of: and adjusting the first clause text according to the adjusted punctuation marks.

In one example, the determining the manually corrected first clause text may include the sub-steps of: and restoring the text before the single step modification according to the single step backspace instruction.

In one example, the determining the manually corrected first clause text may include the sub-steps of: and restoring the sentence text before modification according to the sentence backspacing instruction.

In one example, the determining the manually corrected first clause text may include the sub-steps of: in a sentence isolation manner, each sentence text is displayed.

In one example, the determining the manually corrected first clause text may include the sub-steps of: the sentence text focused by the cursor is displayed with the first display attribute, and the sentence text not focused by the cursor is displayed with the second display attribute.

In one example, the determining the manually corrected first clause text may include the sub-steps of: and if the text selection operation is executed, displaying a text processing shortcut operation option.

The text processing shortcut operation options include, but are not limited to: adding hot word options, adding entity word replacement rule options, person-name pronoun fast switching options, punctuation mark fast switching options, marking text area deleting options and whole sentence deleting options.

In one example, the determining of the manually corrected third clause text corresponding to the second clause text may include the sub-steps of: in a sentence isolation manner, each sentence text is displayed.

In one example, the determining of the manually corrected third clause text corresponding to the second clause text may include the sub-steps of: and displaying the sentence text of the target language focused by the cursor by using the first display attribute, and displaying the sentence text of the target language not focused by the cursor by using the second display attribute.

In one example, the determining of the manually corrected third clause text corresponding to the second clause text may include the sub-steps of: determining sentence text of a source language corresponding to the sentence text of the target language focused by the cursor; displaying the sentence text of the source language with a first display attribute.

The first display attribute includes: highlighting; the second display attribute includes: and (4) non-highlighting.

In one example, the determining of the manually corrected third clause text corresponding to the second clause text may include the sub-steps of: and deleting the sentence text according to the sentence deletion instruction.

In one example, a manual corrected translation process is performed through a first display device; displaying, by a second display device, a source language text and a target language text corresponding to the voice progress; the determining of the artificially corrected third clause text corresponding to the second clause text may comprise the sub-steps of: determining that a second display device of the first display devices has displayed a history text of a completed target language; and adjusting the historical text of the target language displayed by the first display device to be the third display attribute.

In one example, the method may further comprise the steps of: determining a volume gain of the voice stream data; and adjusting the volume gain of the voice stream data according to the volume gain and the volume gain threshold value.

In one example, the method may further comprise the steps of: first clause text in the source language and second clause text in the target language are displayed in sentence-aligned fashion.

In one example, the method may further comprise the steps of: receiving words with uncertain translations included in the second clause text sent by a server and a plurality of candidate translation words of the words with uncertain translations; and modifying the uncertain words of the translated text according to the candidate translated words.

Sixth embodiment

Seventh embodiment

The application also provides an electronic device embodiment. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

An electronic device of the present embodiment includes: a processor and a memory; a memory for storing a program for implementing a speech translation text correction method, the apparatus performing the following steps after being powered on and running the program of the method by the processor: collecting voice stream data in real time and sending the voice stream data to a server; displaying a text segment of a source language corresponding to voice stream data returned by the server; determining a first clause text after manual correction, and sending the first clause text to a server; and displaying the second clause text of the target language corresponding to the first clause text returned by the server.

Eighth embodiment

Corresponding to the voice translation text correction system, the application also provides a voice translation text correction system. Parts of this embodiment that are the same as the first embodiment are not described again, please refer to corresponding parts in the first embodiment.

In this embodiment, the system includes a server and a client. The server is used for determining a text segment of a source language corresponding to the voice data playing progress and sending the text segment to the client; receiving the manually corrected first clause text sent by the client, and determining a second clause text of the target language corresponding to the first clause text; the client is used for playing the voice data, displaying the text segment, determining the first clause text and sending the first clause text.

The difference between the system provided by the present embodiment and the system of the first embodiment includes: the voice data is different. The voice data described in this embodiment may be complete voice data collected in advance, such as a complete audio file submitted by a user, rather than a voice data stream collected and uploaded in real time.

In specific implementation, the server is further configured to send the second clause text to the client, so as to perform manual correction processing on the second clause text; correspondingly, the client is also used for displaying a second clause text sent by the server; and determining the manually corrected second clause text corresponding to the second clause text.

As can be seen from the foregoing embodiments, the speech translation text correction system provided in the embodiments of the present application determines, by the server, a text segment of a source language corresponding to a speech data playing progress, and sends the text segment to the client; receiving the manually corrected first clause text sent by the client, and determining a second clause text of the target language corresponding to the first clause text; the client plays the voice data, displays text segments, determines a first clause text and sends the first clause text; the processing mode can manually correct the source language clause text (such as comma-separated half clause) along with the voice playing progress, and translate the manually corrected source language clause text before one sentence is recognized, so as to realize the translation text correction processing of clause granularity; therefore, the correction quality and correction efficiency of the speech translation text can be effectively improved.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A speech translation text correction system, comprising:

2. A method for correcting a text translated from speech, comprising:

3. The method of claim 2, further comprising:

determining the second clause text corresponding to the third clause text.

4. The method of claim 3,

the performing of the correction process on the first clause text includes:

dialect correction processing is performed on the first clause text.

5. The method of claim 3,

the performing of the correction process on the first clause text includes:

6. The method of claim 5,

7. The method of claim 3,

the performing of the correction process on the first clause text includes:

8. The method of claim 2, further comprising:

correction processing is performed on the second clause text.

9. The method of claim 2, further comprising:

10. The method of claim 2, further comprising:

11. The method of claim 10,

and determining the uncertain words of the translated text and the plurality of candidate translated text words according to the similar word list.

12. A method for correcting a text translated from speech, comprising:

13. The method of claim 12, further comprising:

and updating the displayed second clause text into a third clause text.

14. The method of claim 12,

executing manual correction original text processing through a first display device;

15. The method of claim 12,

the determining the manually corrected first clause text comprises:

16. The method of claim 12,

the determining the manually corrected first clause text comprises:

17. The method of claim 12,

the determining the manually corrected first clause text comprises:

18. The method of claim 12,

the determining the manually corrected first clause text comprises:

19. The method of claim 12,

the determining the manually corrected first clause text comprises:

in a sentence isolation manner, each sentence text is displayed.

20. The method of claim 12,

the determining the manually corrected first clause text comprises:

21. The method of claim 12,

the determining the manually corrected first clause text comprises:

22. The method of claim 21,

23. The method of claim 13,

in a sentence isolation manner, each sentence text is displayed.

24. The method of claim 13,

25. The method of claim 24,

26. The method of claim 24,

the first display attribute includes: highlighting;

the second display attribute includes: and (4) non-highlighting.

27. The method of claim 13,

and deleting the sentence text according to the sentence deletion instruction.

28. The method of claim 13,

executing manual correction translation processing through a first display device;

29. The method of claim 13, further comprising:

determining a volume gain of the voice stream data;

30. The method of claim 12, further comprising:

31. The method of claim 2, further comprising:

32. A speech translation text correction apparatus characterized by comprising:

33. An electronic device, comprising:

a processor; and

34. A speech translation text correction apparatus characterized by comprising:

35. An electronic device, comprising:

a processor; and

36. A speech translation text correction system, comprising:

37. A method for correcting a text translated from speech, comprising:

38. The method of claim 37, further comprising:

39. A method for correcting a text translated from speech, comprising:

playing voice data;

40. The method of claim 39, further comprising:

displaying a second clause text sent by the server;