CN113591491B - Speech translation text correction system, method, device and equipment - Google Patents

Speech translation text correction system, method, device and equipment Download PDF

Info

Publication number
CN113591491B
CN113591491B CN202010366777.5A CN202010366777A CN113591491B CN 113591491 B CN113591491 B CN 113591491B CN 202010366777 A CN202010366777 A CN 202010366777A CN 113591491 B CN113591491 B CN 113591491B
Authority
CN
China
Prior art keywords
text
clause
determining
display device
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010366777.5A
Other languages
Chinese (zh)
Other versions
CN113591491A (en
Inventor
曹宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202010366777.5A priority Critical patent/CN113591491B/en
Publication of CN113591491A publication Critical patent/CN113591491A/en
Application granted granted Critical
Publication of CN113591491B publication Critical patent/CN113591491B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/51Translation evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation

Abstract

The application discloses a system, a method, a device and related equipment for correcting a speech translation text. The system determines a source language text segment corresponding to voice stream data acquired by the client in real time through the server, and sends the text segment to the client; receiving the manually corrected first clause text sent by the client, determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client; the client acquires voice stream data in real time and sends the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text. By adopting the processing mode, the original clause text is manually corrected along with the real-time voice recognition progress, and the manually corrected original clause text is translated before the sentence recognition is completed, so that the translation text correction processing of clause granularity is realized; therefore, the correction efficiency and correction quality can be effectively improved.

Description

Speech translation text correction system, method, device and equipment
Technical Field
The application relates to the technical field of machine voice translation, in particular to a voice translation text correction system, a voice translation text correction method, a voice translation text correction device and electronic equipment.
Background
With the advent of the information internationalization age and the increasing demands of various societies, the research of the automatic speech translation technology has been receiving more and more attention. Speech translation, also commonly referred to as spoken language translation (Spoken Language Translation, SLT), is the process by which a computer performs a translation of speech from one language to another.
Speech translation is a technique that combines speech recognition with machine translation. In the scene of combining real-time speech recognition with machine translation (simultaneous interpretation), due to the influence of factors such as technology, environment, human factors and the like, the speech recognition result is inaccurate, and further the translation result is wrong, so that the recognition result and the translation result need to be subjected to real-time intervention correction by manual intervention. In a real-time identification scene, the difficulty of completely manually performing the intervention is high because of the fast rhythm and short time. At present, a typical voice translation result correction mode is that firstly, voice is recognized and translated through a real-time voice translation model, a translation result is displayed on a screen in real time, after a whole sentence is translated, automatic intervention processing is carried out on the sentence voice translation result, and then manual intervention processing is carried out.
However, in carrying out the present invention, the inventors have found that the prior art has at least the following problems: because the automatic intervention processing is carried out on the voice translation result after the translation of the whole sentence is completed, and the voice translation result has only simple translation result intervention capability, the speed of automatic intervention on the voice translation result is slower, and further, the access time of manual intervention is later, so that the residence time of the wrong translation subtitle on a screen is longer, and the correction quality of the translation text is poorer. Therefore, how to improve the overall correction efficiency and correction quality of the voice translation result, so as to shorten the residence time of the incorrectly translated caption on the screen and improve the translation quality, which is a problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
The application provides a voice translation text correction system to solve the problem that voice translation text correction quality and correction efficiency are low in the prior art. The application additionally provides a voice translation text correction method and device and electronic equipment.
The application provides a speech translation text correction system, comprising:
the server side is used for determining a text fragment of a source language corresponding to voice stream data acquired by the client side in real time and sending the text fragment to the client side; receiving the manually corrected first clause text sent by the client, determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client;
The client is used for collecting voice stream data in real time and sending the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text.
The application also provides a voice translation text correction method, which comprises the following steps:
determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and transmitting the text segment to the client;
receiving a first clause text after manual correction sent by a client;
and determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client.
Optionally, the method further comprises:
performing correction processing on the first clause text as corrected third clause text;
the determining the second clause text of the target language corresponding to the first clause text includes:
and determining the second clause text corresponding to the third clause text.
Optionally, the performing correction processing on the first clause text includes:
and executing dialect correction processing on the first clause text.
Optionally, the performing correction processing on the first clause text includes:
and executing entity word replacement processing on the first clause text according to the entity word replacement rule information.
Optionally, the entity word replacement rule includes: name replacement rules, business entity name replacement rules.
Optionally, the performing correction processing on the first clause text includes:
and executing blacklist filtering processing on the first clause text according to the blacklist filtering rule information.
Optionally, the method further comprises:
and performing correction processing on the second clause text.
Optionally, the method further comprises:
and optimizing the voice recognition model and/or the voice translation model according to the hotword information.
Optionally, the method further comprises:
determining a translation uncertainty word included in the second clause text and a plurality of candidate translation words of the translation uncertainty word;
and sending the word with the uncertain translation and the candidate translation word to the client so that a user of the client can modify the word with the uncertain translation according to the candidate translation word.
Optionally, the term with the uncertain translation and the candidate translation terms are determined according to a similar vocabulary.
The application also provides a voice translation text correction method, which comprises the following steps:
collecting voice stream data in real time, and sending the voice stream data to a server;
displaying a text segment of a source language corresponding to the voice stream data and returned by the server;
Determining a first clause text after manual correction, and sending the first clause text to a server;
and displaying a second clause text of the target language corresponding to the first clause text and returned by the server.
Optionally, the method further comprises:
determining a manually corrected third clause text corresponding to the second clause text;
and updating the displayed second clause text into a third clause text.
Optionally, performing, by the first display device, a manual correction original text process;
and displaying the source language text and the target language text corresponding to the voice progress through a second display device.
Optionally, the determining the manually corrected first clause text includes:
determining that a second display device of the first display device has displayed a history of completed source language;
and adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute.
Optionally, the determining the manually corrected first clause text includes:
and adjusting the first clause text according to the adjusted punctuation marks.
Optionally, the determining the manually corrected first clause text includes:
and according to the single step back instruction, restoring the text before single step modification.
Optionally, the determining the manually corrected first clause text includes:
and restoring the sentence text before modification according to the sentence back instruction.
Optionally, the determining the manually corrected first clause text includes:
each sentence text is displayed in a sentence-isolated manner.
Optionally, the determining the manually corrected first clause text includes:
and displaying the sentence text focused by the cursor with the first display attribute, and displaying the sentence text focused by the non-cursor with the second display attribute.
Optionally, the determining the manually corrected first clause text includes:
and if the text selection operation is executed, displaying a text processing shortcut operation option.
Optionally, the text processing shortcut options include: the method comprises the steps of adding a hotword option, adding an entity word replacement rule option, a human-name pronoun quick switching option, a punctuation quick switching option, a text region deleting option and a whole sentence deleting option.
Optionally, the determining the manually corrected third clause text corresponding to the second clause text includes:
each sentence text is displayed in a sentence-isolated manner.
Optionally, the determining the manually corrected third clause text corresponding to the second clause text includes:
The sentence text of the target language focused by the cursor is displayed with the first display attribute, and the sentence text of the target language focused by the non-cursor is displayed with the second display attribute.
Optionally, the determining the manually corrected third clause text corresponding to the second clause text includes:
determining sentence text in a source language corresponding to sentence text in a target language focused by a cursor;
and displaying sentence text of the source language with a first display attribute.
Optionally, the first display attribute includes: highlighting;
the second display attribute includes: non-highlighting.
Optionally, the determining the manually corrected third clause text corresponding to the second clause text includes:
and deleting the sentence text according to the sentence deleting instruction.
Optionally, performing, by the first display device, a manual correction translation process;
displaying a source language text and a target language text corresponding to the voice progress through a second display device;
the determining the manually corrected third clause text corresponding to the second clause text includes:
determining that a second display device of the first display devices has displayed a history text of the completed target language;
and adjusting the historical text of the target language displayed by the first display device to be a third display attribute.
Optionally, the method further comprises:
determining a volume gain of the voice stream data;
and adjusting the volume gain of the voice stream data according to the volume gain and the volume gain threshold value.
Optionally, the method further comprises:
the first clause text of the source language and the second clause text of the target language are displayed in sentence alignment.
Optionally, the method further comprises:
receiving a word with uncertain translation and a plurality of candidate translation words of the word with uncertain translation, which are included in the second clause text and sent by a server;
and modifying the words with uncertain translations according to the candidate translation words.
The application also provides a voice translation text correction device, comprising:
the voice recognition unit is used for determining a text segment of a source language corresponding to voice stream data acquired by the client in real time and sending the text segment to the client;
the data receiving unit is used for receiving the manually corrected first clause text sent by the client;
and the voice translation unit is used for determining a second clause text of the target language corresponding to the first clause text and sending the second clause text to the client.
The application also provides an electronic device comprising:
a processor; and
a memory for storing a program for implementing a method for correcting a speech translation text, the apparatus being powered on and executing the program for the method by the processor, and performing the steps of: determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and transmitting the text segment to the client; receiving a first clause text after manual correction sent by a client; and determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client.
The application also provides a voice translation text correction device, comprising:
the voice data acquisition and transmission unit is used for acquiring voice stream data in real time and transmitting the voice stream data to the server;
the original text display unit is used for displaying text fragments of a source language corresponding to the voice stream data, which are returned by the server side;
the original text correction unit is used for determining a first clause text after manual correction and sending the first clause text to the server;
and the translation display unit is used for displaying a second clause text of the target language corresponding to the first clause text and returned by the server.
The application also provides an electronic device comprising:
a processor; and
a memory for storing a program for implementing a method for correcting a speech translation text, the apparatus being powered on and executing the program for the method by the processor, and performing the steps of: collecting voice stream data in real time, and sending the voice stream data to a server; displaying a text segment of a source language corresponding to the voice stream data and returned by the server; determining a first clause text after manual correction, and sending the first clause text to a server; and displaying a second clause text of the target language corresponding to the first clause text and returned by the server.
The application also provides a system for correcting the text of the voice translation, which comprises:
the server side is used for determining a text fragment of the source language corresponding to the voice data playing progress and sending the text fragment to the client side; receiving the manually corrected first clause text sent by the client, and determining a second clause text of the target language corresponding to the first clause text;
and the client is used for playing the voice data, displaying the text fragment, determining the first clause text and sending the first clause text.
The application also provides a voice translation text correction method, which comprises the following steps:
determining a text segment of a source language corresponding to the voice data playing progress, and sending the text segment to a client;
receiving a first clause text after manual correction sent by a client;
a second clause text of the target language corresponding to the first clause text is determined.
Optionally, the method further comprises:
and sending the second clause text to the client so as to perform manual correction processing on the second clause text.
The application also provides a voice translation text correction method, which comprises the following steps:
playing the voice data;
displaying a text fragment of a source language corresponding to the voice data playing progress sent by a server;
And determining a first clause text, and sending the first clause text to the server side so that the server side determines a second clause text of the target language corresponding to the first clause text.
Optionally, the method further comprises:
displaying a second clause text sent by the server;
and determining the manually corrected second clause text corresponding to the second clause text.
The present application also provides a computer-readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the various methods described above.
The present application also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the various methods described above.
Compared with the prior art, the application has the following advantages:
according to the voice translation text correction system provided by the embodiment of the application, a source language text segment corresponding to voice stream data acquired by a client in real time is determined through a server, and the text segment is sent to the client; receiving the manually corrected first clause text sent by the client, determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client; the client acquires voice stream data in real time and sends the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text; according to the processing mode, along with the real-time voice recognition progress, the source language clause text (such as comma separated half-sentence) is manually corrected, and before one sentence recognition is completed, the manually corrected source language clause text is translated, so that the translation text correction processing of clause granularity is realized, and the wrong translation text is prevented from staying on a screen for a longer time; therefore, the correction efficiency of the voice translation text can be effectively improved, and the display time of the error translation text can be effectively shortened. In addition, because the source language clause text based on manual correction is translated, the correction quality of the voice translation text can be effectively improved. In addition, the difficulty of judging the text error and intervening is smaller than that of judging the translation error and intervening, so that the correction efficiency and the correction quality can be further improved.
According to the voice translation text correction system provided by the embodiment of the application, a text segment of a source language corresponding to the playing progress of voice data is determined through a server, and the text segment is sent to a client; receiving the manually corrected first clause text sent by the client, and determining a second clause text of the target language corresponding to the first clause text; the client plays the voice data, displays a text fragment, determines a first clause text and sends the first clause text; according to the processing mode, along with the voice playing progress, the source language clause text (such as comma separated half-sentence) is manually corrected, and before the sentence recognition is completed, the manually corrected source language clause text is translated, so that translation text correction processing of clause granularity is realized; therefore, the correction quality and correction efficiency of the voice translation text can be effectively improved.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a speech translation text correction system provided herein;
FIG. 2 is a schematic diagram of an application scenario of an embodiment of a speech translation text correction system provided herein;
FIG. 3 is a schematic diagram of device interactions of an embodiment of a speech translation text correction system provided herein;
FIG. 4 is a schematic diagram of a manual correction interface for an embodiment of a speech translation text correction system provided herein.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.
In the application, a system, a method and a device for correcting a speech translation text are provided, and an electronic device. The various schemes are described in detail one by one in the examples below.
First embodiment
Referring to fig. 1, a block diagram of an embodiment of a speech translation text correction system of the present application is shown. The system comprises: server 1, client 2.
The server 1 may be a server deployed on a cloud server, or may be a server dedicated to implementing speech translation and text intervention processing, and may be deployed in a data center. The server may be a cluster server or a single server.
The client 2 includes, but is not limited to, a mobile communication device, namely: the mobile phone or the intelligent mobile phone also comprises terminal equipment such as a personal computer, a PAD, an iPad and the like.
Please refer to fig. 2, which is a schematic diagram of a scenario of the speech translation text correction system of the present application. The server and the client can be connected through a network, for example, the client can be networked through WIFI and the like, and the like. In the occasions such as court trial sites and multi-person conferences, the client acquires the site voice stream data in real time and sends the site voice stream data to the server; the server determines a source language text corresponding to the voice stream data through a voice recognition model, determines a target language text through a voice translation model, and sends the two texts to the client; and the client projects the correction result to a large field screen for on-site users to watch. Meanwhile, the text correction user carries out manual correction processing on the two texts through the client, and synchronously updates the correction result to the on-site large screen.
Please refer to fig. 3, which is a schematic diagram illustrating device interaction of an embodiment of the speech translation text correction system of the present application. In this embodiment, the server is configured to determine a text segment of a source language corresponding to voice stream data collected by the client in real time, and send the text segment to the client; receiving the manually corrected first clause text sent by the client, determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client; the client is used for collecting voice stream data in real time and sending the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text.
In the process of implementing the invention, the inventor finds that since the translation result (translation) is obtained by translating the voice recognition result (original text), the translation error in most cases is caused by the voice recognition result error, and meanwhile, the difficulty of judging the original text error and intervening is smaller than the difficulty of judging the translation error and intervening, so the system intervenes (corrects) the original text first. After the original text is dried, the corrected original text clauses are combined with the concept of 'fast translation of the streaming result', so that the fast intervention effect of the translated text is achieved, the completion of recognition of the whole sentence is not required, manual intervention operation can be carried out, the error stay time is shortened, and the error stay time on a screen can be avoided for a longer time aiming at the scene of real-time on-screen caption.
In this embodiment, the client deploys a speech recognition result editing module, which may also be referred to as an original text editing module, and is a core module of the system, and is responsible for editing a speech recognition result (original text), where the module is implemented based on an editor principle, and a specific embodiment may be to set a text label to a text table= "true" attribute, so that the text label has editable capability.
In the specific implementation, the original text is corrected manually, and at least one of the following modes can be adopted:
determining that a second display device in the first display device has displayed a history text of the completed source language; and adjusting the historical text of the source language displayed by the first display device to be a third display attribute.
In the present embodiment, the client performs the manual correction original text processing by the first user (text corrector) through the first display device (correction processing screen); the source language text and the target language text corresponding to the voice progress are displayed to a second user (live audience) through a second display device (live presentation screen).
Under the scene of subtitle screen throwing, the sentences can be subjected to gray setting treatment in the correction treatment screen by judging which sentences leave the display area of the field display screen and then modifying css patterns so as to prompt field control personnel that the sentences are not displayed on the field display screen, so that excessive efforts are not needed to intervene, and time waste and idle work are avoided.
And in a second mode, adjusting the text of the first clause according to the adjusted punctuation mark.
This allows a new sentence to be formed by inserting an end punctuation such as a period into the original text to split one sentence into two sentences, or by deleting or changing the end punctuation into an in-sentence punctuation. Because the translation service is called by taking a single sentence (which can be a clause) as a unit, reasonable sentence-breaking and sentence-closing operations can effectively improve the quality of a translation result, thereby achieving the effect of improving the correction quality of a translation text.
In the implementation, before the completion of a sentence identification, if commas appear, the clause content before the commas is called in advance for background optimization service and translation service, and the whole sentence identification is not required to be completed. The method has the advantages that under the scene that the original text and the translated text need to be displayed on the screen in real time, the more accurate recognition result after the background optimization processing can be displayed to the audience more quickly, and similarly, the translation result of part of clauses can be displayed to the audience in advance as soon as possible.
And thirdly, restoring the text before modification in the last step according to the single-step rollback instruction.
In specific implementation, a shortcut key for monitoring command+z under the mac system (a shortcut key for monitoring ctrl+z under the windows system) can be used for carrying out rollback operation on the content being edited, so that field control personnel can carry out single-step rapid rollback when intervention errors occur.
And restoring the sentence text before modification according to the sentence back instruction.
In particular, the method can support monitoring esc key events and perform one-key restoration operation on sentences in editing. For example, when the manual intervention is performed, the sentence is found not to need to be interfered, at this time, the focus can be forcibly lost by pressing a esc key, and a specific implementation manner can be to execute a blu () method on an editor, and replace the edited content with the original content before editing, and the method is simultaneously that a compiling field control personnel quickly performs complete rollback of the whole sentence when the interference is wrong.
And fifthly, displaying each sentence text in a sentence isolation mode.
In the specific implementation, css style processing can be performed on the complete single statement. For example, as shown in fig. 4, a closed frame is added to each sentence, and a certain distance is added between the frames of the two sentences, so that the field control personnel can clearly divide each sentence in the recognition result.
And in a sixth mode, displaying the sentence text focused by the cursor by the first display attribute, and displaying the sentence text focused by the non-cursor by the second display attribute.
In the implementation, when the cursor focuses on the content of a sentence, the sentence is highlighted by modifying the css style when the focus event is monitored, so that the field control personnel can accurately position the content being edited and the complete content of the sentence.
And seventh, if the text selection operation is executed, displaying a text processing shortcut operation option.
The text processing shortcut options include: the method comprises the steps of adding a hotword option, adding an entity word replacement rule option, a human-name pronoun quick switching option, a punctuation quick switching option, a text region deleting option and a whole sentence deleting option.
In specific implementation, the text can be selected by a mouse, and a shortcut operation bar can be automatically displayed near a selection area to perform one-key intervention, wherein the shortcut operation comprises: adding hot words, adding entity word replacement rules, quickly switching the hot words, quickly switching punctuation marks, deleting a selected area and deleting a whole sentence.
In one example, the translation text corrector may also correct the second clause text (translation) by the client, the client determining a manually corrected third clause text corresponding to the second clause text; and updating the second clause text put into the field large screen into a third clause text. By adopting the processing mode, the direct editing of the translation result is supported, and the method is applied to the condition that the original text is correctly identified but the translation result is inaccurate, and can directly edit the translation content.
In this embodiment, the client deploys a machine translation result editing module, which may also be referred to as a translation editing module, and the module is a minor important module in the system, and since the translation result is obtained by translating the speech recognition result, the translation error in most cases is caused by the speech recognition result error, and meanwhile, the difficulty of judging the original text error and interfering is smaller than the difficulty of judging the translation error and interfering, so that the proportion of the translation interfering is smaller than that of the original text interfering, and the translation interfering function is smaller.
In specific implementation, the translation is corrected manually, and at least one of the following modes can be adopted:
And displaying each sentence text in a sentence isolation mode.
In the implementation, the translation result can be displayed according to sentences, and the sentences are isolated through css patterns, so that the current sentence content can be clearly judged whether the sentences are edited or deleted.
And in the second mode, the sentence text of the target language focused by the cursor is displayed by the first display attribute, and the sentence text of the target language focused by the non-cursor is displayed by the second display attribute.
In specific implementation, when the mouse is focused, the current sentence is highlighted by modifying the css style.
In specific implementation, the method can also be used for determining the sentence text of the source language corresponding to the sentence text of the target language focused by the cursor; and displaying sentence text of the source language with a first display attribute. By adopting the processing mode, the corresponding original text content is matched through id, and then the corresponding original text sentence in the original text editing area is highlighted, so that the reference original text content can be compared when the translation is modified.
And thirdly, deleting the sentence text according to the sentence deleting instruction.
In the implementation, the translation result can be deleted rapidly through the F1-F10 keys, when the translation result is very bad, the translation result can be deleted rapidly by one key, the wrong translation result is prevented from staying on the screen for a long time, and the method is suitable for displaying the scene of the simultaneous transmission result on the real-time screen.
Determining that a second display device in the first display device has displayed the history text of the completed target language; and adjusting the historical text of the target language displayed by the first display device to be a third display attribute.
In the implementation, under the scene of translating the subtitle and projecting the screen, by judging which translation results leave the screen display area and then modifying css style, the sentence is subjected to gray setting processing, so that the field control personnel is prompted to pay more energy to the translation results in the on-screen display without intervening the sentence.
In one example, the client is further configured to determine a volume gain of the voice stream data; and adjusting the volume gain of the voice stream data according to the volume gain and the volume gain threshold value. By adopting the processing mode, the audio stream gain adjustment is supported, and the original text recognition accuracy is improved from the source.
In this embodiment, the client ends the gain adjustment module. The module can be understood as volume adjustment, the client can dynamically draw an audio waveform diagram according to the audio stream data, judge whether the volume of the incoming voice is too large or too small according to the audio waveform diagram, and then change the volume gain by adjusting a volume gain switch. The reason for this is because a reasonably loud audio stream is positively helpful in improving the quality of the algorithm identification. Therefore, the volume gain is reasonably adjusted, so that the voice recognition quality is improved, and the workload of manual intervention is further reduced.
In one example, after receiving the manually corrected first clause text sent by the client, the server may further perform correction processing on the first clause text, and send the corrected third clause text to the client. Correspondingly, the server determines the second clause text corresponding to the third clause text. Correspondingly, the client may also display the corrected third clause text corresponding to the first clause text returned by the server. By adopting the processing mode, the original text is optimized through a machine, and the quality of the translated text can be improved through translating the optimized original text, so that the effect of improving the correction quality of the translated text is achieved.
In particular, the machine-corrected text may be an optimized text, such as a reverse text normalization (Inverse Text Normalization, ITN). The ITN uses standard formatting to expose objects such as date, time, address, and amount.
In a specific implementation, the machine correcting the original text may be performing entity word replacement processing on the first clause text according to the entity word replacement rule information. The entity word substitution rules include, but are not limited to: name replacement rules, business entity name replacement rules. Entity word replacement module: the method supports automatic replacement of a certain entity word A into a entity word B, and is used for solving some frequently-occurring and fixed errors and reducing the manual correction cost in certain scenes, such as ' Hema Mr ' = > ' Hema Mr. The processing mode of entity word replacement is adopted, so that the quality of the original text can be improved by an automatic means.
In a specific implementation, the machine-corrected text may perform a blacklist filtering process on the first clause text according to the blacklist filtering rule information. Blacklist filtering module: in the scene of real-time voice recognition and machine translation, when the caption is displayed to the audience, all voice recognition results and machine translation results are filtered by a black name word list, and illegal words such as yellow words, explosion words and the like are filtered. The number of the black list vocabularies is large, hundreds of the black list vocabularies are few, tens of thousands of black list vocabularies are more than few, the black list vocabularies are matched by adopting an AC automaton algorithm technically, and then the black list is replaced by an empty character string by adopting a character string replacing method.
In a specific implementation, the machine correction text may also be a dialect correction process performed on the first clause text. The term "dialect" refers to a special, very meaning, also called white (Vernacular), earth or earth sounds, such as "octave" of Beijing. By adopting the processing mode, the dialect in the voice stream data is converted into the standard language, so that the correct translation can be determined; therefore, the correction quality of the translation text can be effectively improved.
In one example, the server is further configured to perform correction processing on the second clause text, such as translation entity word replacement, blacklist filtering, and the like.
In one example, the server is further configured to optimize a speech recognition model and/or a speech translation model based on the hotword information. The hotword management module: the hot word is different from the entity word replacement, can be used for optimizing an algorithm model, and is sent to a server side for improving the occurrence probability of the hot word. When a user configures a hotword through a client, the user can correspondingly configure the weight value of the hotword, and the probability of occurrence of the hotword is higher when the weight value is higher.
In one example, the client is further configured to display the first clause text in the source language and the second clause text in the target language in a sentence-aligned manner. By adopting the processing mode, the original text and the translated text are displayed in a comparison manner in the same area, so that the user can refer to each other for correction; therefore, the correction quality can be effectively improved, and the user experience is improved.
In one example, the server may be further configured to determine a translation uncertainty word included in the second clause text and a plurality of candidate translation words of the translation uncertainty word; sending the word with uncertain translation and the candidate translation word to a client so that a client user modifies the word with uncertain translation according to the candidate translation word; correspondingly, the client is further used for receiving the words with uncertain translations and a plurality of candidate translation words of the words with uncertain translations, which are included in the second clause text sent by the server; and modifying the words with uncertain translations according to the candidate translation words.
The term with uncertain translation includes the translation corresponding to the original text with various meanings. The machine still cannot determine which translation is more accurate based on the context information. In this case, the server may give a flag (i.e., a plurality of candidate words) indicating that the word translation may be inaccurate, prompting manual correction or confirmation, and there may be a plurality of identical or similar words. In particular implementations, the term of uncertainty of the translation and the plurality of candidate translation terms may be determined based on a list of similar terms. By adopting the processing mode, the correction quality of the translation text can be effectively improved.
As can be seen from the above embodiments, in the voice translation text correction system provided in the embodiments of the present application, a server determines a text segment in a source language corresponding to voice stream data collected by a client in real time, and sends the text segment to the client; receiving the manually corrected first clause text sent by the client, determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client; the client acquires voice stream data in real time and sends the voice stream data; displaying the text segment, determining a first clause text, and sending the first clause text; and displaying the second clause text; according to the processing mode, along with the real-time voice recognition progress, the source language clause text (such as comma separated half-sentence) is manually corrected, and before one sentence recognition is completed, the manually corrected source language clause text is translated, so that the translation text correction processing of clause granularity is realized, and the wrong translation text is prevented from staying on a screen for a longer time; therefore, the correction efficiency of the voice translation text can be effectively improved, and the display time of the error translation text can be effectively shortened. In addition, because the source language clause text based on manual correction is translated, the correction quality of the voice translation text can be effectively improved. In addition, the difficulty of judging the text error and intervening is smaller than that of judging the translation error and intervening, so that the correction efficiency and the correction quality can be further improved.
Second embodiment
Corresponding to the above-mentioned speech translation text correction system, the present application also provides a speech translation text correction method, and the execution subject of the method includes, but is not limited to, a server. The same parts of the present embodiment as those of the first embodiment will not be described again, please refer to the corresponding parts in the first embodiment.
In this embodiment, the method includes the steps of:
step 1: determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and transmitting the text segment to the client;
step 2: receiving a first clause text after manual correction sent by a client;
step 3: and determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client.
In one example, the method may further comprise the steps of: performing correction processing on the first clause text as corrected third clause text; accordingly, the determining the second clause text of the target language corresponding to the first clause text may be performed in the following manner: and determining the second clause text corresponding to the third clause text.
In one example, the correction processing performed on the first clause text may be as follows: and executing dialect correction processing on the first clause text.
In one example, the correction processing performed on the first clause text may be as follows: and executing entity word replacement processing on the first clause text according to the entity word replacement rule information.
The entity word replacement rule includes: name replacement rules, business entity name replacement rules.
In one example, the correction processing performed on the first clause text may be as follows: and executing blacklist filtering processing on the first clause text according to the blacklist filtering rule information.
In one example, the method may further comprise the steps of: and performing correction processing on the second clause text.
In one example, the method may further comprise the steps of: and optimizing the voice recognition model and/or the voice translation model according to the hotword information.
In one example, the method may further comprise the steps of: determining a translation uncertainty word included in the second clause text and a plurality of candidate translation words of the translation uncertainty word; and sending the word with the uncertain translation and the candidate translation word to the client so that a user of the client can modify the word with the uncertain translation according to the candidate translation word.
In particular implementations, the term of uncertainty of the translation and the plurality of candidate translation terms may be determined based on a list of similar terms.
Third embodiment
In the above embodiment, a method for correcting a speech translation text is provided, and correspondingly, the present application also provides a device for correcting a speech translation text. The device corresponds to the embodiment of the method described above.
The same parts of the present embodiment as those of the first embodiment will not be described again, please refer to the corresponding parts in the first embodiment. The device for correcting the text of the voice translation provided by the application comprises:
the voice recognition unit is used for determining a text segment of a source language corresponding to voice stream data acquired by the client in real time and sending the text segment to the client;
the data receiving unit is used for receiving the manually corrected first clause text sent by the client;
and the voice translation unit is used for determining a second clause text of the target language corresponding to the first clause text and sending the second clause text to the client.
Fourth embodiment
The application also provides electronic equipment. Since the apparatus embodiments are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The device embodiments described below are merely illustrative.
An electronic device of the present embodiment includes: a processor and a memory; a memory for storing a program for implementing a method for correcting a speech translation text, the apparatus being powered on and executing the program for the method by the processor, and performing the steps of: determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and transmitting the text segment to the client; receiving a first clause text after manual correction sent by a client; and determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client.
Fifth embodiment
Corresponding to the above-mentioned voice translation text correction system, the present application also provides a voice translation text correction method, where the execution body of the method includes, but is not limited to, a server, and any device capable of implementing the method. The same parts of the present embodiment as those of the first embodiment will not be described again, please refer to the corresponding parts in the first embodiment.
In this embodiment, the method includes the steps of:
step 1: collecting voice stream data in real time, and sending the voice stream data to a server;
step 2: displaying a text segment of a source language corresponding to the voice stream data and returned by the server;
Step 3: determining a first clause text after manual correction, and sending the first clause text to a server;
step 4: and displaying a second clause text of the target language corresponding to the first clause text and returned by the server.
In one example, the method may further comprise the steps of: determining a manually corrected third clause text corresponding to the second clause text; and updating the displayed second clause text into a third clause text.
In one example, a manual correction original text process is performed by a first display device; and displaying the source language text and the target language text corresponding to the voice progress through a second display device.
In one example, the determining the manually corrected first clause text may include the sub-steps of: determining that a second display device of the first display device has displayed a history of completed source language; and adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute.
In one example, the determining the manually corrected first clause text may include the sub-steps of: and adjusting the first clause text according to the adjusted punctuation marks.
In one example, the determining the manually corrected first clause text may include the sub-steps of: and according to the single step back instruction, restoring the text before single step modification.
In one example, the determining the manually corrected first clause text may include the sub-steps of: and restoring the sentence text before modification according to the sentence back instruction.
In one example, the determining the manually corrected first clause text may include the sub-steps of: each sentence text is displayed in a sentence-isolated manner.
In one example, the determining the manually corrected first clause text may include the sub-steps of: and displaying the sentence text focused by the cursor with the first display attribute, and displaying the sentence text focused by the non-cursor with the second display attribute.
In one example, the determining the manually corrected first clause text may include the sub-steps of: and if the text selection operation is executed, displaying a text processing shortcut operation option.
The text processing shortcut options include, but are not limited to: the method comprises the steps of adding a hotword option, adding an entity word replacement rule option, a human-name pronoun quick switching option, a punctuation quick switching option, a text region deleting option and a whole sentence deleting option.
In one example, the determining the manually corrected third clause text corresponding to the second clause text may include the sub-steps of: each sentence text is displayed in a sentence-isolated manner.
In one example, the determining the manually corrected third clause text corresponding to the second clause text may include the sub-steps of: the sentence text of the target language focused by the cursor is displayed with the first display attribute, and the sentence text of the target language focused by the non-cursor is displayed with the second display attribute.
In one example, the determining the manually corrected third clause text corresponding to the second clause text may include the sub-steps of: determining sentence text in a source language corresponding to sentence text in a target language focused by a cursor; and displaying sentence text of the source language with a first display attribute.
The first display attribute includes: highlighting; the second display attribute includes: non-highlighting.
In one example, the determining the manually corrected third clause text corresponding to the second clause text may include the sub-steps of: and deleting the sentence text according to the sentence deleting instruction.
In one example, a manual correction translation process is performed by a first display device; displaying a source language text and a target language text corresponding to the voice progress through a second display device; the determining the manually corrected third clause text corresponding to the second clause text may comprise the sub-steps of: determining that a second display device of the first display devices has displayed a history text of the completed target language; and adjusting the historical text of the target language displayed by the first display device to be a third display attribute.
In one example, the method may further comprise the steps of: determining a volume gain of the voice stream data; and adjusting the volume gain of the voice stream data according to the volume gain and the volume gain threshold value.
In one example, the method may further comprise the steps of: the first clause text of the source language and the second clause text of the target language are displayed in sentence alignment.
In one example, the method may further comprise the steps of: receiving a word with uncertain translation and a plurality of candidate translation words of the word with uncertain translation, which are included in the second clause text and sent by a server; and modifying the words with uncertain translations according to the candidate translation words.
Sixth embodiment
In the above embodiment, a method for correcting a speech translation text is provided, and correspondingly, the present application also provides a device for correcting a speech translation text. The device corresponds to the embodiment of the method described above.
The same parts of the present embodiment as those of the first embodiment will not be described again, please refer to the corresponding parts in the first embodiment. The device for correcting the text of the voice translation provided by the application comprises:
the voice data acquisition and transmission unit is used for acquiring voice stream data in real time and transmitting the voice stream data to the server;
The original text display unit is used for displaying text fragments of a source language corresponding to the voice stream data, which are returned by the server side;
the original text correction unit is used for determining a first clause text after manual correction and sending the first clause text to the server;
and the translation display unit is used for displaying a second clause text of the target language corresponding to the first clause text and returned by the server.
Seventh embodiment
The application also provides an electronic device embodiment. Since the apparatus embodiments are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The device embodiments described below are merely illustrative.
An electronic device of the present embodiment includes: a processor and a memory; a memory for storing a program for implementing a method for correcting a speech translation text, the apparatus being powered on and executing the program for the method by the processor, and performing the steps of: collecting voice stream data in real time, and sending the voice stream data to a server; displaying a text segment of a source language corresponding to the voice stream data and returned by the server; determining a first clause text after manual correction, and sending the first clause text to a server; and displaying a second clause text of the target language corresponding to the first clause text and returned by the server.
Eighth embodiment
Corresponding to the voice translation text correction system, the application also provides a voice translation text correction system. The same parts of the present embodiment as those of the first embodiment will not be described again, please refer to the corresponding parts in the first embodiment.
In this embodiment, the system includes a server and a client. The server side is used for determining a text fragment of a source language corresponding to the voice data playing progress and sending the text fragment to the client side; receiving the manually corrected first clause text sent by the client, and determining a second clause text of the target language corresponding to the first clause text; the client is used for playing the voice data, displaying the text fragment, determining the first clause text and sending the first clause text.
The system provided in this embodiment is different from the system provided in the first embodiment, and includes: the voice data is different. The voice data in this embodiment may be a complete voice data collected in advance, such as a complete audio file submitted by a user, instead of a voice data stream collected and uploaded in real time.
In specific implementation, the server is further configured to send the second clause text to the client, so as to perform manual correction processing on the second clause text; correspondingly, the client is also used for displaying the second clause text sent by the server; and determining the manually corrected second clause text corresponding to the second clause text.
As can be seen from the above embodiments, in the speech translation text correction system provided in the embodiments of the present application, a server determines a text segment of a source language corresponding to a speech data playing progress, and sends the text segment to a client; receiving the manually corrected first clause text sent by the client, and determining a second clause text of the target language corresponding to the first clause text; the client plays the voice data, displays a text fragment, determines a first clause text and sends the first clause text; according to the processing mode, along with the voice playing progress, the source language clause text (such as comma separated half-sentence) is manually corrected, and before the sentence recognition is completed, the manually corrected source language clause text is translated, so that translation text correction processing of clause granularity is realized; therefore, the correction quality and correction efficiency of the voice translation text can be effectively improved.
While the preferred embodiment has been described, it is not intended to limit the invention thereto, and any person skilled in the art may make variations and modifications without departing from the spirit and scope of the present invention, so that the scope of the present invention shall be defined by the claims of the present application.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
1. Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer readable media, as defined herein, does not include non-transitory computer readable media (transmission media), such as modulated data signals and carrier waves.
2. It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims (38)

1. A speech translation text correction system, comprising:
the server side is used for determining a text fragment of a source language corresponding to voice stream data acquired by the client side in real time and sending the text fragment to the client side; receiving the manually corrected first clause text sent by the client, determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client;
the client is used for collecting voice stream data in real time and sending the voice stream data; and displaying the text segment; performing, by a first display device, manual correction textual processing to determine the first clause text; and, transmitting the first clause text; displaying the second clause text; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying the source language text and the target language text corresponding to the voice progress.
2. A method for correcting a speech translation text, comprising:
determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and transmitting the text segment to the client;
receiving a first clause text after manual correction sent by a client;
determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client so that the client displays the second clause text;
wherein the first clause text is processed by the client in the following manner: displaying the text segment; performing, by a first display device, manual correction textual processing to determine the first clause text; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying the source language text and the target language text corresponding to the voice progress.
3. The method as recited in claim 2, further comprising:
Performing correction processing on the first clause text as corrected third clause text;
the determining the second clause text of the target language corresponding to the first clause text includes:
and determining the second clause text corresponding to the third clause text.
4. The method of claim 3, wherein the step of,
the performing correction processing on the first clause text includes:
and executing dialect correction processing on the first clause text.
5. The method of claim 3, wherein the step of,
the performing correction processing on the first clause text includes:
and executing entity word replacement processing on the first clause text according to the entity word replacement rule information.
6. The method of claim 5, wherein the step of determining the position of the probe is performed,
the entity word replacement rule includes: name replacement rules, business entity name replacement rules.
7. The method of claim 3, wherein the step of,
the performing correction processing on the first clause text includes:
and executing blacklist filtering processing on the first clause text according to the blacklist filtering rule information.
8. The method as recited in claim 2, further comprising:
And performing correction processing on the second clause text.
9. The method as recited in claim 2, further comprising:
and optimizing the voice recognition model and/or the voice translation model according to the hotword information.
10. The method as recited in claim 2, further comprising:
determining a translation uncertainty word included in the second clause text and a plurality of candidate translation words of the translation uncertainty word;
and sending the word with the uncertain translation and the candidate translation word to the client so that a user of the client can modify the word with the uncertain translation according to the candidate translation word.
11. The method of claim 10, wherein the step of determining the position of the first electrode is performed,
and determining the words with uncertain translations and the candidate translation words according to the similar word list.
12. A method for correcting a speech translation text for a client, comprising:
collecting voice stream data in real time, and sending the voice stream data to a server;
displaying a text segment of a source language corresponding to the voice stream data and returned by the server;
performing manual correction original text processing through a first display device to determine a first clause text after manual correction; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying a source language text and a target language text corresponding to the voice progress;
Transmitting the first clause text to a server;
and displaying a second clause text of the target language corresponding to the first clause text and returned by the server.
13. The method as recited in claim 12, further comprising:
determining a manually corrected third clause text corresponding to the second clause text;
and updating the displayed second clause text into a third clause text.
14. The method of claim 12, wherein the step of determining the position of the probe is performed,
the determining the manually corrected first clause text comprises the following steps:
and adjusting the first clause text according to the adjusted punctuation marks.
15. The method of claim 12, wherein the step of determining the position of the probe is performed,
the determining the manually corrected first clause text comprises the following steps:
and according to the single step back instruction, restoring the text before single step modification.
16. The method of claim 12, wherein the step of determining the position of the probe is performed,
the determining the manually corrected first clause text comprises the following steps:
and restoring the sentence text before modification according to the sentence back instruction.
17. The method of claim 12, wherein the step of determining the position of the probe is performed,
the determining the manually corrected first clause text comprises the following steps:
each sentence text is displayed in a sentence-isolated manner.
18. The method of claim 12, wherein the step of determining the position of the probe is performed,
the determining the manually corrected first clause text comprises the following steps:
and displaying the sentence text focused by the cursor with the first display attribute, and displaying the sentence text focused by the non-cursor with the second display attribute.
19. The method of claim 12, wherein the step of determining the position of the probe is performed,
the determining the manually corrected first clause text comprises the following steps:
and if the text selection operation is executed, displaying a text processing shortcut operation option.
20. The method of claim 19, wherein the step of determining the position of the probe comprises,
the text processing shortcut options include: the method comprises the steps of adding a hotword option, adding an entity word replacement rule option, a human-name pronoun quick switching option, a punctuation quick switching option, a text region deleting option and a whole sentence deleting option.
21. The method of claim 13, wherein the step of determining the position of the probe is performed,
the determining the manually corrected third clause text corresponding to the second clause text includes:
each sentence text is displayed in a sentence-isolated manner.
22. The method of claim 13, wherein the step of determining the position of the probe is performed,
the determining the manually corrected third clause text corresponding to the second clause text includes:
The sentence text of the target language focused by the cursor is displayed with the first display attribute, and the sentence text of the target language focused by the non-cursor is displayed with the second display attribute.
23. The method of claim 22, wherein the step of determining the position of the probe is performed,
the determining the manually corrected third clause text corresponding to the second clause text includes:
determining sentence text in a source language corresponding to sentence text in a target language focused by a cursor;
and displaying sentence text of the source language with a first display attribute.
24. The method of claim 22, wherein the step of determining the position of the probe is performed,
the first display attribute includes: highlighting;
the second display attribute includes: non-highlighting.
25. The method of claim 13, wherein the step of determining the position of the probe is performed,
the determining the manually corrected third clause text corresponding to the second clause text includes:
and deleting the sentence text according to the sentence deleting instruction.
26. The method of claim 13, wherein the step of determining the position of the probe is performed,
performing manual correction translation processing by the first display device;
displaying a source language text and a target language text corresponding to the voice progress through a second display device;
The determining the manually corrected third clause text corresponding to the second clause text includes:
determining that a second display device of the first display devices has displayed a history text of the completed target language;
and adjusting the historical text of the target language displayed by the first display device to be a third display attribute.
27. The method as recited in claim 13, further comprising:
determining a volume gain of the voice stream data;
and adjusting the volume gain of the voice stream data according to the volume gain and the volume gain threshold value.
28. The method as recited in claim 12, further comprising:
the first clause text of the source language and the second clause text of the target language are displayed in sentence alignment.
29. The method as recited in claim 12, further comprising:
receiving a word with uncertain translation and a plurality of candidate translation words of the word with uncertain translation, which are included in the second clause text and sent by a server;
and modifying the words with uncertain translations according to the candidate translation words.
30. A speech translation text correction apparatus, comprising:
the voice recognition unit is used for determining a text segment of a source language corresponding to voice stream data acquired by the client in real time and sending the text segment to the client;
The data receiving unit is used for receiving the manually corrected first clause text sent by the client;
the voice translation unit is used for determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client so that the client displays the second clause text;
wherein the first clause text is processed by the client in the following manner: displaying the text segment; performing, by a first display device, manual correction textual processing to determine the first clause text; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying the source language text and the target language text corresponding to the voice progress.
31. An electronic device, comprising:
a processor; and
a memory for storing a program for implementing a method for correcting a speech translation text, the apparatus being powered on and executing the program for the method by the processor, and performing the steps of: determining a text segment of a source language corresponding to voice stream data acquired by a client in real time, and transmitting the text segment to the client; receiving a first clause text after manual correction sent by a client; determining a second clause text of the target language corresponding to the first clause text, and sending the second clause text to the client so that the client displays the second clause text; wherein the first clause text is processed by the client in the following manner: displaying the text segment; performing, by a first display device, manual correction textual processing to determine the first clause text; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying the source language text and the target language text corresponding to the voice progress.
32. A speech translation text correction apparatus for a client, comprising:
the voice data acquisition and transmission unit is used for acquiring voice stream data in real time and transmitting the voice stream data to the server;
the original text display unit is used for displaying text fragments of a source language corresponding to the voice stream data, which are returned by the server side;
an original text correction unit for performing manual correction original text processing through the first display device to determine a manually corrected first clause text; transmitting the first clause text to a server; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; the historical texts of the source language which are displayed by the first display device and are already displayed are adjusted to be of a third display attribute, and the second display device is used for displaying the source language texts and the target language texts corresponding to the voice progress;
and the translation display unit is used for displaying a second clause text of the target language corresponding to the first clause text and returned by the server.
33. An electronic device, comprising:
A processor; and
a memory for storing a program for implementing a method for correcting a speech translation text, the apparatus being powered on and executing the program for the method by the processor, and performing the steps of: collecting voice stream data in real time, and sending the voice stream data to a server; displaying a text segment of a source language corresponding to the voice stream data and returned by the server; performing manual correction original text processing through a first display device to determine a first clause text after manual correction; transmitting the first clause text to a server; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; the historical texts of the source language which are displayed by the first display device and are already displayed are adjusted to be of a third display attribute, and the second display device is used for displaying the source language texts and the target language texts corresponding to the voice progress; and displaying a second clause text of the target language corresponding to the first clause text and returned by the server.
34. A speech translation text correction system, comprising:
the server side is used for determining a text fragment of the source language corresponding to the voice data playing progress and sending the text fragment to the client side; receiving the manually corrected first clause text sent by the client, and determining a second clause text of the target language corresponding to the first clause text;
The client is used for playing the voice data and displaying text fragments; performing manual correction original text processing through a first display device to determine a first clause text after manual correction; transmitting a first clause text; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying the source language text and the target language text corresponding to the playing progress.
35. A method for correcting a speech translation text, comprising:
determining a text segment of a source language corresponding to the voice data playing progress, and sending the text segment to a client;
receiving a first clause text after manual correction sent by a client;
determining a second clause text of the target language corresponding to the first clause text;
wherein the first clause text is processed by the client in the following manner: displaying the text segment; performing, by a first display device, manual correction textual processing to determine the first clause text; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying the source language text and the target language text corresponding to the playing progress.
36. The method as recited in claim 35, further comprising:
and sending the second clause text to the client so as to perform manual correction processing on the second clause text.
37. A method for correcting a speech translation text for a client, comprising:
playing the voice data;
displaying a text fragment of a source language corresponding to the voice data playing progress sent by a server;
performing manual correction original text processing through a first display device to determine a first clause text after manual correction; the manual correction original text processing is executed through the first display device, and the manual correction original text processing comprises the following steps: determining that a second display device of the first display device has displayed a history of completed source language; adjusting the historical text of the source language which is displayed by the first display device and is already displayed to be a third display attribute; the second display device is used for displaying a source language text and a target language text corresponding to the playing progress;
and sending the first clause text to the server side so that the server side determines a second clause text of the target language corresponding to the first clause text.
38. The method as recited in claim 37, further comprising:
Displaying a second clause text sent by the server;
and determining the manually corrected second clause text corresponding to the second clause text.
CN202010366777.5A 2020-04-30 2020-04-30 Speech translation text correction system, method, device and equipment Active CN113591491B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010366777.5A CN113591491B (en) 2020-04-30 2020-04-30 Speech translation text correction system, method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010366777.5A CN113591491B (en) 2020-04-30 2020-04-30 Speech translation text correction system, method, device and equipment

Publications (2)

Publication Number Publication Date
CN113591491A CN113591491A (en) 2021-11-02
CN113591491B true CN113591491B (en) 2023-12-26

Family

ID=78237626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010366777.5A Active CN113591491B (en) 2020-04-30 2020-04-30 Speech translation text correction system, method, device and equipment

Country Status (1)

Country Link
CN (1) CN113591491B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117672190A (en) * 2022-09-07 2024-03-08 华为技术有限公司 Transliteration method and electronic equipment
CN115860015B (en) * 2022-12-29 2023-06-20 北京中科智加科技有限公司 Translation memory-based transcription text translation method and computer equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000259632A (en) * 1999-03-09 2000-09-22 Toshiba Corp Automatic interpretation system, interpretation program transmission system, recording medium, and information transmission medium
CN104714943A (en) * 2015-03-26 2015-06-17 百度在线网络技术(北京)有限公司 Translation method and system
JP2016192599A (en) * 2015-03-30 2016-11-10 株式会社エヌ・ティ・ティ・データ Device and method combining video conference system and speech recognition technology
CN109408833A (en) * 2018-10-30 2019-03-01 科大讯飞股份有限公司 A kind of interpretation method, device, equipment and readable storage medium storing program for executing
CN109446532A (en) * 2018-09-11 2019-03-08 深圳市沃特沃德股份有限公司 Translate bearing calibration, device and translation calibration equipment
CN110047488A (en) * 2019-03-01 2019-07-23 北京彩云环太平洋科技有限公司 Voice translation method, device, equipment and control equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8626486B2 (en) * 2006-09-05 2014-01-07 Google Inc. Automatic spelling correction for machine translation
KR102516364B1 (en) * 2018-02-12 2023-03-31 삼성전자주식회사 Machine translation method and apparatus

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000259632A (en) * 1999-03-09 2000-09-22 Toshiba Corp Automatic interpretation system, interpretation program transmission system, recording medium, and information transmission medium
CN104714943A (en) * 2015-03-26 2015-06-17 百度在线网络技术(北京)有限公司 Translation method and system
JP2016192599A (en) * 2015-03-30 2016-11-10 株式会社エヌ・ティ・ティ・データ Device and method combining video conference system and speech recognition technology
CN109446532A (en) * 2018-09-11 2019-03-08 深圳市沃特沃德股份有限公司 Translate bearing calibration, device and translation calibration equipment
CN109408833A (en) * 2018-10-30 2019-03-01 科大讯飞股份有限公司 A kind of interpretation method, device, equipment and readable storage medium storing program for executing
CN110047488A (en) * 2019-03-01 2019-07-23 北京彩云环太平洋科技有限公司 Voice translation method, device, equipment and control equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
人工智能语音系统的实现;杨梓艺;;网友世界(06);27-29 *

Also Published As

Publication number Publication date
CN113591491A (en) 2021-11-02

Similar Documents

Publication Publication Date Title
WO2018157703A1 (en) Natural language semantic extraction method and device, and computer storage medium
CN110164435A (en) Audio recognition method, device, equipment and computer readable storage medium
CN105590627B (en) Image display apparatus, method for driving image display apparatus, and computer-readable recording medium
JP5478478B2 (en) Text correction apparatus and program
KR20170030297A (en) System, Apparatus and Method For Processing Natural Language, and Computer Readable Recording Medium
CN113591491B (en) Speech translation text correction system, method, device and equipment
US20170242832A1 (en) Character editing method and device for screen display device
CN110740275B (en) Nonlinear editing system
CN111885416B (en) Audio and video correction method, device, medium and computing equipment
CN103984772A (en) Method and device for generating text retrieval subtitle library and video retrieval method and device
CN104871240A (en) Information processing device, information processing method and program
EP4322029A1 (en) Method and apparatus for generating video corpus, and related device
CN109326284A (en) The method, apparatus and storage medium of phonetic search
CN109782997B (en) Data processing method, device and storage medium
JP2012181358A (en) Text display time determination device, text display system, method, and program
CN113035199A (en) Audio processing method, device, equipment and readable storage medium
KR20190083532A (en) System for learning languages using the video selected by the learners and learning contents production method thereof
CN113948066A (en) Error correction method, system, storage medium and device for real-time translation text
CN110853627B (en) Method and system for voice annotation
CN111326144A (en) Voice data processing method, device, medium and computing equipment
CN111062221A (en) Data processing method, data processing device, electronic equipment and storage medium
CN113435198A (en) Automatic correction display method and device for caption dialect words
CN111767214B (en) Automatic testing method and device for software UI
CN113923479A (en) Audio and video editing method and device
CN113221514A (en) Text processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant