CN111898388A

CN111898388A - Video subtitle translation editing method and device, electronic equipment and storage medium

Info

Publication number: CN111898388A
Application number: CN202010700313.3A
Authority: CN
Inventors: 李秋平; 杜育璋; 李磊; 王明轩; 朱培豪
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-07-20
Filing date: 2020-07-20
Publication date: 2020-11-06

Abstract

The embodiment of the disclosure discloses a method and a device for translating and editing video subtitles, electronic equipment and a storage medium. The method comprises the following steps: acquiring a source video, and generating a translation content display page according to the source video, wherein the translation content display page comprises at least one time range and translation texts matched with the time ranges; when a translation editing instruction is received, obtaining a translation to be edited in a target translation text matched with the translation editing instruction; generating and displaying an input prompt text according to the translation to be edited, wherein the input prompt text is used for indicating a user to update the translation to be edited according to the input prompt text; and acquiring a text input by the user aiming at the input prompt text, and updating the target translation text. The embodiment of the disclosure can improve the editing efficiency and accuracy of video voice translation.

Description

Video subtitle translation editing method and device, electronic equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the field of translation editing, and in particular, to a method and an apparatus for translating and editing video subtitles, an electronic device, and a storage medium.

Background

As networks evolve, more and more viewers choose to watch network videos. The network video not only comprises a home video, but also comprises a foreign video. The viewing experience of the viewer when viewing foreign videos is degraded due to language barriers.

At present, the speech in the video can be acquired, speech recognition is performed on the speech to generate text content, and finally the text content is translated into a specified language, so that the translated text content is acquired and added to the video as a video subtitle. The user can play the video added with the video subtitle and obtain the translated text of the non-native language voice in real time while watching the video.

Often the translated text content is the result of a machine translation. If the machine translation result is wrong, and the machine translation result is a one-time translation result, the translation result can be modified only by human. Furthermore, if the language of the translation to be edited is not the language commonly used by the user, it is difficult to ensure the correctness of the modification result of the user.

Disclosure of Invention

The embodiment of the disclosure provides a video subtitle translation editing method and device, an electronic device and a storage medium, which can improve the editing efficiency and accuracy of video and voice translation.

In a first aspect, an embodiment of the present disclosure provides a method for translating and editing a video subtitle, including:

acquiring a source video, and generating a translation content display page according to the source video, wherein the translation content display page comprises at least one time range and translation texts matched with the time ranges;

when a translation editing instruction is received, obtaining a translation to be edited in a target translation text matched with the translation editing instruction;

generating and displaying an input prompt text according to the translation to be edited, wherein the input prompt text is used for indicating a user to update the translation to be edited according to the input prompt text;

and acquiring a text input by the user aiming at the input prompt text, and updating the target translation text.

In a second aspect, an embodiment of the present disclosure further provides a video subtitle translation editing apparatus, including:

the translation content display page generating module is used for acquiring a source video and generating a translation content display page according to the source video, wherein the translation content display page comprises at least one time range and translation texts matched with the time ranges;

the translation editing device comprises a translation editing instruction acquisition module, a translation editing module and a translation editing module, wherein the translation editing instruction acquisition module is used for acquiring a translation to be edited in a target translation text matched with the translation editing instruction when the translation editing instruction is received;

the input prompt text generation module is used for generating and displaying an input prompt text according to the translation to be edited, and the input prompt text is used for indicating a user to update the translation to be edited according to the input prompt text;

and the translation text editing module is used for acquiring the text input by the user aiming at the input prompt text and updating the target translation text.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the video subtitle translation editing method according to any one of the embodiments of the present disclosure when executing the computer program.

In a fourth aspect, the embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the video subtitle translation editing method according to any one of the embodiments of the present disclosure.

According to the method and the device, the voice in the source video is subjected to voice recognition and text translation to obtain the translated text, the translated text is displayed through the translation content display page, the translated text to be edited matched with the translation editing instruction is obtained when the translation editing instruction is received, the input prompt text matched with the translated text to be edited is generated and displayed to a user, the user is helped to modify the translated text, the problem that the modification result of the translated caption by the user in the prior art is inaccurate is solved, the modifiable result can be provided, the language capability required by modifying the translated text is reduced, the difficulty of translation editing is reduced, meanwhile, the accuracy of translation editing is improved, meanwhile, the user can update the translated text according to the input prompt text, the editing time of the translated text is shortened, and the efficiency of translation editing is improved.

Drawings

Fig. 1 is a flowchart of a video subtitle translation editing method in an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a translated content presentation page in an embodiment of the disclosure;

FIG. 3 is a schematic diagram of entering prompt text in an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of entering prompt text in an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a translated content presentation page in an embodiment of the disclosure;

FIG. 6 is a schematic diagram of a dynamic prompt text in an embodiment of the present disclosure;

fig. 7 is a flowchart of a video subtitle translation editing method in an embodiment of the present disclosure;

fig. 8 is a flowchart of a video subtitle translation editing method in an embodiment of the present disclosure;

fig. 9 is a flowchart of a video subtitle translation editing method in an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of a translated content presentation page that includes textual units of textual origin, in an embodiment of the disclosure;

fig. 11 is a schematic diagram of a translated content presentation page including a video playback area in an embodiment of the disclosure;

FIG. 12 is a schematic diagram of a translated content presentation page that includes dynamic prompt text, in an embodiment of the disclosure;

FIG. 13 is a schematic diagram of an application scenario to which embodiments of the present disclosure are applicable;

FIG. 14 is a diagram of an original text unit, a translated text, and a dynamic prompt text in an embodiment of the disclosure;

FIG. 15 is a schematic diagram of a translated text upon receipt of a translation follow close instruction in an embodiment of the present disclosure;

FIG. 16 is a schematic illustration of a translated text upon receipt of a translation follower switch command in an embodiment of the disclosure;

fig. 17 is a schematic structural diagram of a video subtitle translation editing apparatus in an embodiment of the present disclosure;

fig. 18 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Examples

Fig. 1 is a flowchart of a video subtitle translation editing method in an embodiment of the present disclosure, which is applicable to a case of editing a video translation subtitle, where the method may be executed by a video subtitle translation editing apparatus, the apparatus may be implemented in a software and/or hardware manner, the apparatus may be configured in an electronic device, specifically, an electronic device, where the electronic device may be a terminal device, and the terminal device may include a mobile phone, a vehicle-mounted terminal, or a notebook computer. As shown in fig. 1, the method specifically includes the following steps:

s110, a source video is obtained, and a translation content display page is generated according to the source video, wherein the translation content display page comprises at least one time range and translation texts matched with the time ranges.

The source video is used as the video to which subtitles are to be added. The source video may refer to a video uploaded by a user. The source video may include speech in at least one language. For example, the language of speech may be Chinese: i love singing, or may include english and chinese: and I love to singing, wherein the voice in the source video comprises English text I love to and Chinese text singing. Optionally, the source video includes speech in one language.

And performing voice recognition on the voice in the source video to form an original text, converting the original text into a text of a specified language, and determining the text as a translation text. The voice recognition can be realized by adopting an automatic voice recognition method, and specifically comprises the following steps: establishing a voice database and training an acoustic model; establishing a text database and training a language model; acquiring a voice to be recognized, extracting voice characteristics, and coding to form a characteristic vector; and decoding and searching the characteristic vector according to the acoustic model trained in advance and the language model trained in advance to generate a recognition text. In addition, the speech recognition method adopted by the person skilled in the art can be applied to the embodiment, and the embodiment of the present disclosure is not particularly limited.

The time range may refer to a time period formed by two time points in the source video and a time length between the two time points. Corresponding voices at different time points of the source video are different, original text generated by corresponding voice recognition is different, and translated text formed by translation is different. Meanwhile, the video duration of the source video is composed of a plurality of time ranges, so that the translation data can be divided into a plurality of translation texts according to the time ranges, and each translation text is matched with one time range.

The translated text can be a text of a specified language, and the semantics of the translated text is the same as the semantics of the original text corresponding to the source video. The specified language may be user-specified. The designated language is different from the language of the textual text. The original text may be divided into a plurality of original text units according to a time range, and each of the original text units is converted into a translation text of a specified language, so that the time range, the original text units, and the translation text correspond one to one. Specifically, the conversion of the original text into the translated text in the specified language can be realized by using a machine translation algorithm, for example, an analysis and conversion-based machine translation method, an intermediate language-based machine translation method, a statistical-based machine translation method, an instance-based machine translation method, and the like. Wherein the translation text may include text in at least one language. Optionally, the number of the languages of the original text is one, the number of the languages of the translated text is one, and meanwhile, the language of the original text is different from the language of the translated text.

The translation content presentation page is used for presenting the translation of the original text recognized by the source video into the translation text in the specified language, and as shown in fig. 2, the time range is on the left side of the translation text and the time range is matched with the translation text. Optionally, the translation content display page is further configured to display the original text, and display the original text and the translation text in a contrast manner, so that a user can browse the translation content conveniently. The user can display and browse the translated content presentation page, that is, the original text and the translated text, through the client or the browser. The translation content display page can display and edit the original text and the translated text on line, can be saved into a file in real time, is stored in a designated storage position in the electronic equipment, and can be saved in a designated network database in real time. The online real-time saving operation may save the editing result of the real-time editing operation in a designated network database in real time. And the server related to the translation content display page performs voice acquisition, voice recognition, machine translation and the like on the uploaded source video, generates page data, feeds the generated page data back to the electronic equipment, and generates a translation content display page according to the page data by the electronic equipment. Or, the electronic equipment directly acquires a source video specified by a user for voice acquisition, voice recognition, machine translation and the like, generates page data, and generates a translation content presentation page according to the page data.

It should be noted that, in the translation content display page, the display content, the display position, the display style, and the like of the translation text and the original text, as well as the layout, the page content, the page style, and the like of the translation content display page may be set as required, and therefore, the embodiment of the present disclosure is not particularly limited. For example, the specified language is english, and the translation content presentation page may include an original text and a translation text, and a presentation position of the translation text is below a presentation position of the original text.

And S120, when the translation editing instruction is received, acquiring a translation to be edited in the target translation text matched with the translation editing instruction.

The translation editing instruction is used for specifying a translation text to be edited and operating the target translation text. The target translation text may be any one of the translation texts, and the operation may include text editing operations such as addition, modification, deletion, and the like. The translation editing instruction may refer to an operation instruction input by a user for the target translation text in the translation content presentation page. The translation to be edited may be text in the text of the target translation. When the number of the translated texts is multiple, taking the translated text hit by the translation editing instruction as a target translated text; when the number of the translation texts is one, the translation text is the target translation text.

The translation editing instruction may be triggered when the user clicks an editing button in the translation content display page, or when the user clicks an area where any one of the translation texts in the translation content display page is located, or when the user inputs a specific touch screen gesture (for example, a text range may be selected by long pressing and left sliding or long pressing and right sliding), and in addition, the translation editing instruction triggering manner adopted by a person skilled in the art may be applied to this embodiment, and therefore, the embodiment of the present disclosure is not limited specifically.

For example, the translation to be edited is a text to be deleted in the target translation text, for example, the user deletes a "like" in the target translation text "i like singing", and the translation to be edited is "like". Or the translation to be edited is a text which is matched with the input text in the target translation text, for example, the user inputs "love" in the middle of the target translation text "i like singing", that is, the target translation text becomes: although the love is not deleted, the love input by the user is different from the love at the original position, and therefore the translation to be edited is like.

And S130, generating and displaying an input prompt text according to the translation to be edited, wherein the input prompt text is used for instructing a user to update the translation to be edited according to the input prompt text.

And the input prompt text is used for displaying the text which can be input, the semantic of the text is the same as that of the translation to be edited, and the content of the input prompt text is different from that of the translation to be edited. The input prompt text is used for updating the text to be edited by the target voice, for example, the input prompt text may replace the translation to be edited. Specifically, the input prompt text and the text of the target translation text except the translation to be edited may be combined to generate a modified target translation text, where the semantics of the modified target translation text are respectively the same as the semantics of the target translation text before modification and the semantics of the original text. Entering prompt text may assist the user in modifying the speech editing text. For example, the user needs to modify a word in the target translation text, where the word is the translation to be edited, and the input prompt text is an approximate word or a synonym of the word, etc.

When a user modifies a voice editing text, grammar errors and/or spelling errors exist in the text which is automatically input, and the correct text which is different from the content of the translation to be edited and has the same semantic meaning can be provided for the user by inputting the prompt text, so that the user is helped to modify the text, and the editing correctness of the translation text is improved.

Optionally, the displaying the input prompt text includes: generating a prompt following area at a position associated with the translation editing instruction, and displaying the input prompt text in the prompt following area, wherein the translation to be edited is located in the middle of the target translation text; and displaying the input prompt text in the target translation text, wherein the text style of the input prompt text is different from that of the target translation text, and the translation to be edited is located at the tail position of the target translation text.

The translation to be edited is located at the middle position, which indicates that the user needs to edit the text located at the middle position of the text of the target translation. The intermediate position may refer to an intermediate position of a paragraph including a translation to be edited, or may refer to an intermediate position of a sentence including a translation to be edited. The input prompt text can be displayed in a mode of inserting and displaying a middle pop-up window, so that the situation that the edited data amount is increased and errors are easily caused due to the fact that the positions of other un-edited texts in the target translation text are influenced is avoided. For example, a prompt following area is generated near the position of the text to be input, and the input prompt text is displayed in the prompt following area. The prompt following area is used for displaying input prompt text, and the prompt following area may refer to an area associated with a position to be input. When displayed, the prompt following area may overlay other text. In practice, a text is edited in a document, an input identification image (such as a cursor) is generated at a position to be input of the text in the document, and a user can determine the position of the text to be input according to the input identification image. Illustratively, the distance between the prompt following area and the input identification image is a set distance. For example, in a specific example, the prompt following area is located at a position 1 mm to the right of the input identification image, specifically, as shown in fig. 3, the input identification image is a cursor, a prompt following area exists on the right side of the cursor, and the input prompt text in the prompt following area is am not.

The text style is used for identifying the text and distinguishing the text, particularly distinguishing and displaying the input prompt text and the translated text, and the text style may include at least one of the following items: font, font size, position, color, background, bolding, italics, underlining, superscript, text effects, etc.

The translation to be edited is located at the tail position, which indicates that the user needs to edit the text at the tail position of the target translation text. The end position may refer to an end position of a paragraph including the translation to be edited, or may refer to an end position of a sentence including the translation to be edited. The position of other unedited texts cannot be influenced because other texts do not exist after the translation to be edited, and the input prompt text can be directly added and displayed at the paragraph or the end of the sentence. Typically, to highlight the input prompt text, the input prompt text may be configured to have a different style than the target translation text. Illustratively, the font of the target translation text is black, and the font of the input prompt text is configured in gray. In one specific example, the target translation text is: i have no exercise talent; the translation to be edited is: a have nonexcitabine talent; deleting the translation to be edited by the user, wherein the input prompt text corresponding to the translation to be edited is as follows: the haveno exercise talent inputs the prompt text the same as the translation to be edited before the user makes no modification (i.e., no characters different from the translation to be edited are added). Correspondingly, after the deletion, if the user inputs am, the input prompt text corresponding to the translation to be edited is: specifically, as shown in fig. 4, the text "I" in front of the cursor is the target translation text that is not edited, and the text behind the cursor is the input prompt text: not ath ethetic.

In fact, whether the translation to be edited is located at the position of the text or the position of the end of the text, the input prompt text is displayed nearby, so that the user can quickly browse the input prompt text.

The method has the advantages that when the text in the text is edited, the prompt following area is generated, the input prompt text is inserted and displayed, when the text at the end of the text is edited, the input prompt text is directly added and displayed at the position of the end of the text, the input prompt text is highlighted, a user can quickly distinguish the input prompt text and simultaneously display the input prompt text near the input editing position, the user can quickly browse the input prompt text, the real-time performance of displaying and translating the prompt text is improved, the translated text is edited according to the input prompt text, and the editing efficiency of the translated text is improved.

And S140, acquiring a text input by the user aiming at the input prompt text, and updating the target translation text.

And the user replaces the translation to be edited in the target translation text aiming at the text input by the input prompt text to form an edited text. The text may be the same as the input prompt text or may be different from the input prompt text. For example, if the user considers that the input prompt text is incorrect or is not the translation result desired by the user, the user may input text different from the input prompt text, in which case the user may directly input characters and determine the input characters as the input text. As another example, the user may enter text that is different from the input prompt text, considering the input prompt text to be correct. The user can input the same characters according to the content of the input prompt text, and the input characters are determined as the input text; or the user may input a trigger instruction, and determine the input prompt text as the input text, for example, click an area where the input prompt text is located, and use the input prompt text in the area as the text input by the user; or by pressing a preset key (e.g., Tab key), the input prompt text is determined as the input text.

It should be noted that the input of the prompt text is used to prompt the user of diversified translation results, so as to reduce the editing workload of the user. In fact, whether the target translation text is updated or not and the specific content after the update are determined by the operation input by the user.

Optionally, before receiving the translation editing instruction, the method further includes: when a prompt display triggering instruction is received, generating a dynamic prompt text, wherein the dynamic prompt text is the same as the target translation text; while displaying the input prompt text, the method further comprises the following steps: and in the dynamic prompt text, replacing the text matched with the translation to be edited with an input prompt text, wherein the text style of the input prompt text is different from the text style of the text in the dynamic prompt text.

And the prompt display triggering instruction is used for generating and displaying the dynamic prompt text. The prompt display triggering instruction may be triggered when the user clicks a modification display button in the translation content display page, or when the user clicks an area where a target translation text in the translation content display page is located, or when the user inputs a specific touch screen gesture (for example, the user may slide to the area where the target translation text is located).

The dynamic prompt text is used for displaying the editing effect of the target translation text after the translation to be edited is replaced by the input prompt text. The dynamic prompt text actually shows the target translation text edited by using the input prompt text as the editing content. When no editing instruction is carried out, the dynamic prompt text is the same as the target translation text, specifically, the semantics and the language of the dynamic prompt text are the same.

The position relationship between the dynamic prompt text and the target translation text may be set as required, for example, the dynamic prompt text may be placed below the range text, and in addition, the display mode of the dynamic prompt text adopted by a person skilled in the art may be applied to this embodiment, and therefore, the embodiment of the present disclosure is not limited specifically.

In a specific example, as shown in fig. 5, the translated text and the dynamic prompt text are included in the translated content presentation page. Wherein, the translation text is positioned above the dynamic prompt text.

When the input prompt text is displayed, correspondingly, the editing effect of the input prompt text applied to the target translation text can be displayed in the dynamic prompt text. Specifically, in the target translation text, the text before the translation to be edited and/or after the translation to be edited is kept unchanged, the input prompt text is filled in the position of the translation to be edited, the input prompt text and the text which is kept unchanged in the target translation text are input, and the updated dynamic prompt text is generated. Specifically, as shown in fig. 6, the time range 101 is: in the time formed by the starting time point 0:00:8 and the ending time point 0:00:12, the text in the translation area 102 is the target translation text, the text in the dynamic prompt text area 103 is the dynamic prompt text, and the text in the prompt following area 104 is the input prompt text. The translation to be edited (deleted and not shown in the translation area 102) is replaced by the updated target translation text formed by the input prompt text, which is the text in the dynamic prompt text area 103. In fact, the translation to be edited is a deleted text, and the updated target translation text formed by replacing the translation to be edited with the input prompt text may be an updated target translation text formed by adding the input prompt text to the existing target translation text.

Typically, to highlight the input prompt text, in dynamic prompt text, the style of the input prompt text is configured to be different from the original dynamic prompt text. For example, the filling color of the text background of the dynamic prompt text is none, and the filling color of the text background of the input prompt text is blue, and in addition, other different styles may be set as required, and thus, the embodiment of the present disclosure is not particularly limited.

By generating and displaying the dynamic prompt text, displaying the dynamic prompt text while displaying the input prompt text, and modifying the editing effect of the target translation text according to the input prompt text, a user can intuitively and quickly obtain the predicted editing effect of editing according to the input prompt text, the real-time performance of translation editing display is improved, the user can edit the target translation text according to the input prompt text, and the editing efficiency of the translation text is improved.

In an exemplary implementation manner, fig. 7 is a flowchart of a video subtitle translation editing method in an embodiment of the present disclosure, where the method specifically includes the following steps:

s210, a source video is obtained, and a translation content display page is generated according to the source video, wherein the translation content display page comprises at least one time range and translation texts matched with the time ranges.

Reference may be made to the foregoing for a non-exhaustive description of embodiments of the disclosure.

S220, when a translation editing instruction is received, obtaining a to-be-edited translation from a target translation text matched with the translation editing instruction, wherein the translation editing instruction comprises a character editing instruction or a word editing instruction, the character editing instruction is used for editing characters in a first to-be-edited translation, the first to-be-edited translation comprises a word, the word editing instruction is used for editing a second to-be-edited translation, and the second to-be-edited translation comprises at least one word.

And the character editing instruction is used for editing part of characters in all the characters in the divided words in the target translation text. The word editing instructions are used for editing the divided at least one word in the target translation text. Editing may include deletion or addition. The smallest text element of the translated text that is editable may be a character.

And if the translation editing instruction is a character editing instruction, the translation to be edited is a word and is determined as the first translation to be edited. In practice, the character editing instruction is to edit at least one character in a word, and is to edit in units of characters.

And if the translation editing instruction is a word editing instruction, the translation to be edited is a short sentence or multi-word combination formed by at least one word, and is determined as a second translation to be edited. The word editing instruction is to edit a certain word or words, and is to edit the words in units.

It should be noted that the character editing instruction and the word editing instruction need to determine the division of characters, words, and sentences. Generally, in chinese text, a chinese character is a character, a word includes at least one character, and a sentence includes at least one word. The words need to be divided through a word segmentation algorithm, for example, i has no motion to divide into four words, i, no, motion and only. Wherein, the first word only comprises 1 character, and the second word, the third word and the fourth word all comprise two characters. And sentences can be divided by punctuation marks, for example, characters from a previous punctuation mark to a next punctuation mark are used as a sentence. Generally, in english text, one word is one word, one word includes at least one character, and one sentence includes at least one word. The word may be divided by using a space, for example, a character from a previous space to a next space is used as a word, wherein the space may be replaced by a punctuation mark.

Illustratively, the target translation text is: and (3) selecting and deleting all the words after the I, wherein the instruction is used for deleting a plurality of words, the instruction is a word editing instruction, and the translation to be edited is as follows: has nonexcitabine talent. The user adds a to the deleted target translation text, and does not input a space or a punctuation mark, so that the user can be determined to want to edit the word, the instruction for adding a is a character editing instruction, and the translation to be edited is as follows: has (a).

The language of the target translation text may include chinese, english, and japanese, and may further include korean, french, german, russian, and the like, to which the embodiments of the present disclosure are not particularly limited.

And S230, generating and displaying an input prompt text according to the translation to be edited, wherein the input prompt text is used for instructing a user to update the translation to be edited according to the input prompt text.

If the user does not input characters, inputting a prompt text which is the same as the translation to be edited; if the user inputs characters different from the translation to be edited, the input prompt text can be determined according to the translation to be edited and the characters associated with the translation editing instruction. The server associated with the translation content display page can convert the translation to be edited again, and a conversion result matched with the characters associated with the translation editing instruction is inquired in a conversion result to be used as an input prompt text.

Illustratively, the textual text in the source video is: i have no movement; the target translation text is: i has nonexcitabine talent; the translation to be edited is: have no exercise talent; and deleting the translation to be edited by the user, wherein the input prompt text comprises the following steps: have no exercise talent. The user re-inputs am, namely, the character associated with the translation editing instruction comprises: am; according to the characters related to the translation editing instruction and the translation to be edited, translating the translation without movement again, wherein the generated input prompt text is as follows: not ath ethetic.

Optionally, the translation editing instruction includes a character editing instruction; generating and displaying an input prompt text according to the translation to be edited, wherein the input prompt text comprises the following steps: acquiring an input character matched with the character editing instruction; in the first to-be-edited translated text, obtaining an unedited character matched with the character editing instruction; and generating at least one cue word according to each input character and the unedited character, and determining the cue word as an input cue text, wherein the cue word comprises each input character and the unedited character.

The input characters are characters input by a user. The unedited character is a character which is not deleted by the user in the text to be edited, and the unedited character can be null. The prompt words are used for prompting characters except for the characters included in the words so as to prompt the words expected to be input by the user, reduce word spelling errors of the user, improve editing accuracy and reduce editing difficulty. A cue following area may be generated and at least one cue word may be displayed in a list form. The matched prompt words can be inquired in the preset word database according to the residual characters and input characters of the original words after deletion, so that a user can directly obtain all characters included in the words and can avoid word spelling errors, wherein the matched prompt words are words including the input characters and the unedited characters. And prompting the user by using at least one prompting word as input prompting text. Words that match the incoming characters and the unedited characters from the most frequently used words may be determined as cue words. Or acquiring words with the same semantics or the same part of speech of the first translation to be edited, and determining the words as the cue words. The word including the input character and the unedited character may be a word in which the first n consecutive characters are the same as characters formed by concatenation of the unedited character and the input character, n is the same as the total number of the unedited character and the input character, and the character sequence of the consecutive n characters is the same as the input sequence of the unedited character and the input character.

Illustratively, the first translation to be edited is: third; the user deletes the first to-be-edited translated text through the character editing instruction, and the deleted text is as follows: th; at this time, no cue word is generated or only the content consistent with the text to be translated is prompted, for example, the cue word is third. The unedited characters at this time are: th, adding an input character e by the user after th, determining that the input character is added by the user and determining that the command is a character editing command, wherein the input character is different from the character i at the matched position in the translation to be edited. At this time, the character string formed by the input character and the unedited character is the word, which may be queried as a hint, such as the, there, or theeth. In addition, if the query result is null, no prompt may be generated. Typically the number of hinting words is 0-3. As another example, as shown in FIG. 8, the user enters the character a and the cue words include an, am, and are.

It should be noted that, the prompt word is different from the editing result desired by the user, the user may continue to add the input character, and each time a character is added, a prompt word matching the input character and the character to be added is screened from the translation data for prompt. However, the character editing instruction of the user is edited one by one, if the input speed of the user is too high, the user does not need to provide a prompt, or the number of unedited characters and input characters is too small, so that the prompt is not accurate, and therefore the user can wait for a set time after inputting the character editing instruction and generate a prompt word as an input prompt text to display and prompt the user.

In fact, if the character editing instruction is used only to delete a part of characters in a word, no cue word is generated, or content consistent with the text to be translated is generated. And only when the user adds at least one input character through the character editing instruction and the input character is different from the character in the translation to be edited, carrying out subsequent input prompt according to the content of the input character added by the user. In fact, the prompt in the word dimension ignores the editing result of the user in the character dimension, for example, there may be misspelling of a word for the input of an english word by the user, or there may be miswritten characters in a word for the input of a chinese word, and so on. Therefore, word content can be prompted preferentially, and complete phrase content is prompted further on the basis of determining the word content, so that the accuracy of the prompting content is improved.

It will be appreciated that if the user deletes all the characters in a word, the deletion instruction is actually a word editing instruction, and accordingly, the input prompt text is a prompt for a plurality of words and not a prompt for a word including characters.

The character editing instruction can prompt a part of characters included in a certain word, so that the prompt of a user in character dimension can be improved, the probability of hitting the editing result expected by the user can be improved, the editing result expected by the user can be accurately determined, and the accuracy of prompting the text can be improved.

Optionally, the translation editing instruction includes a word editing instruction; generating and displaying an input prompt text according to the translation to be edited, wherein the input prompt text comprises the following steps: acquiring at least one input word matched with the word editing instruction; and generating a prompt phrase according to the second text to be edited and each input word, determining the prompt phrase as the input prompt text, wherein the semantic meaning of the prompt phrase is the same as that of the second text to be edited, and the prompt phrase comprises at least one word.

The input word is a word input by the user. And generating a prompt phrase by combining the second text to be edited with the input word. The prompt phrase is used for prompting words in the sentence. The semantics of the second text to be edited and the prompt phrase are the same.

Illustratively, the target translation text is: i have no exercise talent; the translation to be edited is: haveno exercise talent; and deleting the translation to be edited by the user, wherein the input prompt text comprises the following steps: have no exercisetalent. And the user inputs am and a space again, and determines to input a word, namely the input word comprises: am; at this time, the second text to be edited is: no exercise talent. According to the input words and the second translation to be edited associated with the translation editing instruction, the semantic meaning of the prompt phrase can be determined to be no movement, and correspondingly, the generated input prompt phrase is as follows: not ath ethetic.

The word editing instruction can prompt a plurality of words in the sentence, so that the prompt of the user in word dimension can be improved, the probability of hitting the editing result expected by the user can be improved, the editing result expected by the user can be accurately determined, and the accuracy of prompting the text can be improved.

In one specific example, the target translation text is: the I have no exception talent, and the translation text deleted by the user is as follows: I. when the user inputs a new character a, the user is determined to be an input character editing instruction because it is not detected that the user inputs a complete word. And according to the a, generating prompt words such as am, are, an and the like, and determining the prompt words as input prompt texts. After the user enters a new character am and a space, it is detected that the user enters a complete word, and it is determined that the user is an input word editing instruction. According to am, a prompt phrase not ath ideal is generated and determined as input prompt text.

Optionally, after the displaying the input prompt text, the method further includes: receiving an input instruction for prompting ending content, and/or receiving a determination instruction of the user for inputting a prompt text; and stopping displaying the input prompt text.

The prompt termination content distinguishes words in the text. Wherein, the prompting ending content is at the end of a plurality of continuous characters, thereby determining the plurality of continuous characters as a word. Generally, the content of the end of the prompt is mainly applied to multi-letter languages such as english, russian, french or italian, and such languages usually use a space to distinguish the front word and the rear word.

Illustratively, in English, a word is divided by spaces. The content of the prompt ending is a blank space. Specifically, the "character + space" is used as a criterion for determining completion of the "word" operation, and when the user inputs the "character + space", the input identification image (cursor) is located behind the "space".

For Chinese, there is no specific content to distinguish between words. For example, in chinese, at least one input character is usually queried in a predetermined word stock, and if a word including the input character is queried in the word stock and the word does not include other characters except the input character, the at least one input character is determined to be a word. In addition, there are other word division methods and judgment methods, and the embodiments of the present disclosure are not particularly limited thereto.

Wherein, receiving the input instruction of the prompt ending content, indicating that the user does not hit the input prompt text, and the user manually inputs a complete word. And determining an instruction for hitting the input prompt text to update the translation text according to the input prompt text. The determination instruction may be a trigger operation on any prompt word or prompt phrase, and the trigger operation may be a click operation, a specific gesture operation, or the like. For languages such as English, if the prompt text is input as a prompt word, the prompt word and a space are added at the matched position in the translation text.

After the user hits the input prompt text or the user does not hit the input prompt text but finishes editing one word, the prompt function indicating the input prompt text is finished, and the display of the input prompt text can be stopped. The input prompt text which is stopped to be displayed is the current input prompt text, and a new input prompt text can be generated according to the subsequently updated content.

By stopping displaying the input prompt text after the user finishes word editing or directly hits the input prompt text, whether the input prompt text is displayed or not can be flexibly adjusted, and user experience is improved.

S240, acquiring a text input by the user aiming at the input prompt text, and updating the target translation text.

According to the embodiment of the disclosure, the translation editing instruction is configured to be the character deleting instruction and/or the character inputting instruction, the translation to be edited is triggered and obtained, the input prompt text matched with the translation to be edited is generated and displayed to the user, so that the user edits the translation text according to the input prompt text, the real-time editing of the translation text is realized, the real-time editing of the translation text is improved, and meanwhile, the accuracy and the efficiency of the translation text are improved.

In an exemplary implementation manner, fig. 9 is a flowchart of a video subtitle translation editing method in an embodiment of the present disclosure, where the method specifically includes the following steps:

s310, acquiring a source video, acquiring audio data matched with the source video, and performing voice recognition on the audio data matched with the source video to generate an original text.

The audio data may refer to a voice signal in the source video. The textual text refers to a speech recognition result of the audio data. The textual text may serve as textual subtitles for the source video.

S320, obtaining the video time length of the source video, dividing the video time length, generating at least one time range and determining the original text units matched with the time ranges.

The video duration may refer to the time of the source video. Generally, subtitles of a source video are displayed in units of one sentence or one word, so that text corresponding to one time range is at least one word or one sentence. The voice duration matched with the voice data is obtained from the source video, and the voice data can be divided according to words or sentences in the voice recognition process and the corresponding voice duration is correspondingly divided to form at least one time range. And meanwhile, carrying out voice recognition on the divided words or sentences to generate an original text unit, and taking the voice duration matched with the divided words or sentences as the time range matched with the original text unit. Wherein, there is a time without voice in the source video, which is different from the voice duration and is not taken as a time range.

All the original text units form the text recognized by the source video voice. Or, it can be understood that the original text corresponding to the source video is divided into a plurality of original text units according to sentences or words.

And S330, performing text conversion on each original text unit according to the specified language to generate a translation text matched with each original text unit, and using the translation text as a translation text matched with each time range.

The plurality of translated texts constitute a translation of the original text. A translated text is matched with an original text unit, and the semantics of the translated text is the same as those of the matched original text unit but the language is different.

And S340, generating a translation content display page according to each time range and the matched translation text, wherein the translation content display page comprises at least one translation area matched with each time range, and the translation area matched with each time range is used for displaying the translation text matched with each time range.

And the translation area is used for displaying a translation text, wherein the translation area is an operable area.

The time range can be divided through a time division triggering instruction, and correspondingly, the translation text matched with the time range is divided correspondingly. For example, the translation text with the matching time range may be divided into two front and back translation texts according to the input identification image (cursor), and the time range may be divided into a time range with the matching time of the translation text with the preceding time sequence and a time range with the matching time of the translation text with the following time sequence, so as to form a translation text divided into two time ranges, and the two time ranges are respectively matched.

Two adjacent time ranges can be combined through a time combination triggering instruction, and correspondingly, the translation texts matched with the time ranges are correspondingly combined. Two adjacent time ranges can be combined to form a time range, and correspondingly, the translation texts respectively matched with the two time ranges are spliced into a translation text according to the time sequence, for example, the translation text after the time sequence is spliced behind the translation text before the time sequence to form a translation text which is used as the translation text matched with the time range after the combination.

Optionally, the generating a translation content presentation page according to each time range and the matched translation text includes: generating a translation content display page according to each time range, the matched original text unit and the matched translation text, wherein the translation content display page further comprises: and each time range matched original text area is used for displaying the time range matched original text units.

An original text area is used for displaying an original text unit. The original text areas correspond to the translation areas one by one, correspondingly, original text units in the original text areas are matched with translation texts in the corresponding translation areas, the semantics of the original text units in the original text areas are the same as those of the translation texts in the corresponding translation areas, and the languages are different. Wherein, the text area is an operable area.

In a specific example, as shown in fig. 10, the translation content presentation page includes time ranges, an original text unit matched with the time ranges, and a translation text matched with the time ranges, where the time ranges are on the left of the original text region and the translation region, the original text region is located above the translation region, and at the same time, the original text region only displays a part of the original text, so that a user can view the original text and the corresponding translation locally, and the editing efficiency is improved.

Optionally, the generating a translation content presentation page according to each of the time ranges, the matched original text unit, and the matched translated text unit includes: adding the original text unit matched with each time range and the matched translated text into the source video to generate a subtitle video; generating a translation content display page according to the subtitle video, each time range, the matched original text unit and the matched translation text, wherein the translation content display page further comprises: and the video playing area is used for playing the subtitle video.

In a specific example, as shown in fig. 11, the translation content presentation page includes a video playing area, time ranges, original text units matched with the time ranges, and translation texts matched with the time ranges. Wherein, the left side of the text area is a video playing area. The original text unit is used for generating original subtitles, and the translated text is used for generating translated subtitles. And adding the original text subtitle and the translation subtitle into the source video to generate a subtitle video, playing the subtitle video in a video playing area, and displaying the edited translation subtitle.

The original text unit and the translated text can be stored as a subtitle file for storage, and meanwhile, the subtitle video can be stored as a video file for storage and can be stored in local and network databases.

In addition, for each time range, a play control can be configured in the translated content display page, and a user can only play the subtitle video matched with the time range in the video play area by triggering the play control.

By configuring the video playing area and displaying the video effect after the translation subtitles formed by the added edited voice editing text in the video playing area, the editing effect of the translation text is displayed in real time, and the generation and editing efficiency of the video translation subtitles are improved.

Optionally, the translated content presentation page further includes a modification presentation area, and one modification presentation area is used for displaying one dynamic prompt text. The modified display area is triggered to be generated only when a prompt display triggering instruction is received. Specifically, before receiving the translation editing instruction, the method further includes: when a prompt display triggering instruction is received, generating a dynamic prompt text and displaying the dynamic prompt text in a modification display area, wherein the dynamic prompt text is the same as the target translation text; while displaying the input prompt text, the method further comprises the following steps: and in the dynamic prompt text, replacing the text matched with the translation to be edited with an input prompt text, wherein the text style of the input prompt text is different from the text style of the text in the dynamic prompt text.

In a specific example, as shown in fig. 12, the translation content presentation page includes time ranges, original text units matched with the time ranges, translation text matched with the time ranges, and dynamic prompt text matched with a certain time range. The corresponding translation text is configured below the original text unit, so that a user can conveniently and locally check the original text and the corresponding translation, and the editing efficiency is improved. And clicking the original text unit by the user through a mouse, or sliding the mouse to the region where the original text unit is located, determining that a prompt display triggering instruction is detected, correspondingly generating a dynamic prompt text region matched with the original text unit, and displaying a corresponding dynamic prompt text. The dynamic hint text region matches the time range and is typically located below the translation region.

In a specific example, as shown in fig. 13, the text in the original region 107 is an original text unit corresponding to the target translation text, the text in the translation region 102 is the target translation text, the text in the dynamic prompt text region 103 is a dynamic prompt text, and the text in the prompt following region 104 is an input prompt text. The translation to be edited (deleted and not shown in the translation area 102) is replaced by the updated target translation text formed by the input prompt text, which is the text in the dynamic prompt text area 103. In fact, the translation to be edited is a deleted text, and the updated target translation text formed by replacing the translation to be edited with the input prompt text may be an updated target translation text formed by adding the input prompt text to the existing target translation text. The start time point 105 and the end time point 106 constitute a time range matching the original text unit, the time range being: the starting time point 0:00: 8-the ending time point 0:00: 12.

In addition, the user can click the translation text through a mouse or slide the mouse to the region where the translation text is located, the fact that a prompt display trigger instruction is detected is determined, a dynamic prompt text region matched with the translation text is correspondingly generated, and the corresponding dynamic prompt text is displayed.

After the generation of the translated content presentation page, the method further comprises the following steps: after the translated content presentation page is generated, the method further comprises the following steps: when a translation following closing instruction is received, updating a target original text unit according to a received editing instruction of the target original text unit; and when a translation following starting instruction is received, updating the target original text unit according to the received editing instruction of the target original text unit, and correcting the translated text matched with the target original text unit according to the updated target original text unit.

The translation following closing instruction is used for controlling the original text not to be modified when the original text is modified. The translation following starting instruction is used for controlling the corresponding modification of the translated text when the original text is modified. The translation following off command and the translation following on command may refer to a trigger operation for the translation following button. The translation following button may correspond to the entire original text or only a unit of the original text. In addition, when the operation identification image (such as a mouse) clicks the translation area, the original text area or the area outside the modification display area, it is determined that the translation following closing instruction is received, so that the user does not need to perform triggering operation input by aiming at the translation following button one by one. When a translation following closing instruction is received, the syntax and the structure of the original text are adjusted or the description is added, so that the translation result is not influenced.

And the editing instruction of the target original text unit is used for operating the target original text unit.

When a translation following closing instruction is received, the translation following state is a closing state, an editing instruction of the target original text unit is detected, only the text included in the target original text unit is operated, and the translated text matched with the target original text unit is not operated.

When a translation following opening instruction is received, the translation following state is an opening state, an editing instruction of the target original text unit is detected at the moment, the text included by the target original text unit is operated, meanwhile, machine translation is carried out again according to the updated original text unit, the updated translated text is generated, retranslation operation is automatically realized, and the correspondence between the original text and the translated text is ensured.

In a specific example, as shown in fig. 14, the original text unit in the original text area 101 is: i am the third sentence of this caption, and I would now want to retranslate this sentence; the translated text in the translation area 102 is: i am the third science of the subtitle, now I wait to translate it again; the dynamic prompt text in the dynamic prompt text area 103 is: i am the third sensor of the subtitle, now I want translation it again. At present, an original text unit, a translated text and a dynamic prompt text are respectively corresponding, and the three texts have the same semantic meaning.

As shown in fig. 15, when a translation following closing instruction is received, the "three" in the original text unit is modified to "four", and accordingly, the original text unit is: i am the fourth sentence of this caption and we now want to retranslate this sentence. At this time, the edited text is not updated following the editing of the original text unit, and still: i am the third sensor of the subtitle, now I wait to translate it again. The dynamic prompt text still corresponds to the original text unit and is updated along with the editing of the original text unit, and the dynamic prompt text is as follows: i am the four sensitivity of the subtitle, nowI wait to translate it again.

As shown in fig. 16, when a translation following start instruction is received, the "three" in the original text unit is modified to "four", and accordingly, the original text unit is: i am the fourth sentence of this caption and we now want to retranslate this sentence. At this time, the edited text corresponds to the original text unit and is updated along with the editing of the original text unit, and the method comprises the following steps: i am the four corner of the subtitle, now I wait to translate it again. The dynamic prompt text still corresponds to the original text unit and is updated along with the editing of the original text unit, and the dynamic prompt text is as follows: i am the four sensory office of the subtitle, now I want to transform it again.

By means of the translation following closing instruction and the translation following opening instruction, the original text and the translated text can be freely modified, meanwhile, the situation that a user mistakenly operates to modify the translated text after the translated text is edited is avoided, and the editing accuracy rate of the translated text is improved.

And S350, when the translation editing instruction is received, acquiring a translation to be edited in the target translation text matched with the translation editing instruction.

And S360, generating and displaying an input prompt text according to the translation to be edited, wherein the input prompt text is used for instructing a user to update the translation to be edited according to the input prompt text.

S370, acquiring the text input by the user aiming at the input prompt text, and updating the target translation text.

According to the method and the device, the video duration and the voice recognition result are divided to generate a plurality of original text units with matched time ranges and time ranges, each original text unit is translated to form the corresponding translated text, the translation content display page is generated according to each original text unit and each translated text, the original text content and the corresponding translated text content can be displayed in the translation content display page in a contrasting manner, a user can edit the translated text according to the original text content, and the editing efficiency is improved.

Fig. 17 is a schematic structural diagram of a video subtitle translation editing apparatus according to an embodiment of the present disclosure. The apparatus may be implemented in software and/or hardware, and may be configured in an electronic device. The apparatus may include: the translation content display page generation module 410, the to-be-edited translation acquisition module 420, the input prompt text display module 430 and the translation text editing module 440.

A translation content display page generating module 410, configured to obtain a source video, and generate a translation content display page according to the source video, where the translation content display page includes at least one time range and translation texts matched with each time range;

the to-be-edited translation obtaining module 420 is configured to, when a translation editing instruction is received, obtain a translation to be edited from a target translation text matched with the translation editing instruction;

an input prompt text display module 430, configured to generate and display an input prompt text according to the translation to be edited, where the input prompt text is used to instruct a user to update the translation to be edited according to the input prompt text;

and a translation text editing module 440, configured to acquire a text input by the user for the input prompt text, and update the target translation text.

Further, the translation editing instruction includes a character editing instruction or a word editing instruction, the character editing instruction is used for editing characters in the first translation to be edited, the first translation to be edited includes one word, the word editing instruction is used for editing a second translation to be edited, and the second translation to be edited includes at least one word.

Further, the translation editing instruction comprises a character editing instruction; the input prompt text display module 430 includes: the character editing unit is used for acquiring input characters matched with the character editing instruction; in the first to-be-edited translated text, obtaining an unedited character matched with the character editing instruction; and generating at least one cue word according to each input character and the unedited character, and determining the cue word as an input cue text, wherein the cue word comprises each input character and the unedited character.

Further, the translation editing instruction includes a word editing instruction, and the input prompt text display module 430 includes: the word editing unit is used for acquiring at least one input word matched with the word editing instruction; and generating a prompt phrase according to the second text to be edited and each input word, determining the prompt phrase as the input prompt text, wherein the semantic meaning of the prompt phrase is the same as that of the second text to be edited, and the prompt phrase comprises at least one word.

Further, the video subtitle translation editing apparatus further includes: the input prompt text stop display module is used for receiving an input instruction of prompt ending content after the input prompt text is displayed and/or receiving a determination instruction of the user for the input prompt text; and stopping displaying the input prompt text.

Further, the input prompt text display module 430 includes: the input prompt text highlighting unit is used for generating a prompt following area at a position related to the translation editing instruction, and displaying the input prompt text in the prompt following area, wherein the translation to be edited is positioned in the middle of the target translation text; and displaying the input prompt text in the target translation text, wherein the text style of the input prompt text is different from that of the target translation text, and the translation to be edited is located at the tail position of the target translation text.

Further, the video subtitle translation editing apparatus further includes: the dynamic prompt text generation module is used for generating a dynamic prompt text when a prompt display triggering instruction is received before a translation editing instruction is received, wherein the dynamic prompt text is the same as the target translation text; the video subtitle translation editing apparatus further includes: and the input prompt text highlighting module is used for replacing the text matched with the translation to be edited with an input prompt text in the dynamic prompt text while displaying the input prompt text, wherein the text style of the input prompt text is different from the text style of the text in the dynamic prompt text.

Further, the translated content presentation page generating module 410 includes: the original text unit dividing unit is used for acquiring the audio data matched with the source video, performing voice recognition on the audio data matched with the source video and generating an original text; acquiring the video time length of the source video, dividing the video time length to generate at least one time range and determining an original text unit matched with each time range; performing text conversion on each original text unit according to a specified language to generate a translation text matched with each original text unit, wherein the translation text is used as a translation text matched with each time range; and generating a translation content display page according to each time range and the matched translation text, wherein the translation content display page comprises at least one translation area matched with each time range, and the translation area matched with each time range is used for displaying the translation text matched with each time range.

Further, the unit for dividing the unit of the original text includes: an original text unit display subunit, configured to generate a translation content display page according to each of the time ranges, the matched original text unit, and the matched translation text, where the translation content display page further includes: and each time range matched original text area is used for displaying the time range matched original text units.

Further, the video subtitle translation editing apparatus further includes: the translation following triggering module is used for updating the target original text unit according to the received editing instruction of the target original text unit when receiving a translation following closing instruction after generating a translation content display page; and when a translation following starting instruction is received, updating the target original text unit according to the received editing instruction of the target original text unit, and correcting the translated text matched with the target original text unit according to the updated target original text unit.

Further, the unit for dividing the unit of the original text includes: the subtitle video display subunit is used for adding the original text unit matched with each time range and the matched translated text into the source video to generate a subtitle video; generating a translation content display page according to the subtitle video, each time range, the matched original text unit and the matched translation text, wherein the translation content display page further comprises: and the video playing area is used for playing the subtitle video.

The video subtitle translation editing apparatus provided by the embodiment of the present disclosure belongs to the same inventive concept as the video subtitle translation editing method, and the technical details that are not described in detail in the embodiment of the present disclosure can be referred to in the foregoing, and the embodiment of the present disclosure has the same beneficial effects as the embodiment of the foregoing.

Referring now to FIG. 18, a schematic diagram of an electronic device (e.g., the electronic device of FIG. 1) 500 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 18 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 18, the electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 18 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a source video, and generating a translation content display page according to the source video, wherein the translation content display page comprises at least one time range and translation texts matched with the time ranges; when a translation editing instruction is received, obtaining a translation to be edited in a target translation text matched with the translation editing instruction; generating and displaying an input prompt text according to the translation to be edited, wherein the input prompt text is used for indicating a user to update the translation to be edited according to the input prompt text; and acquiring a text input by the user aiming at the input prompt text, and updating the target translation text.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of a module does not constitute a limitation to the module itself in some cases, for example, the target page generation module may be further described as "a module that generates the target page according to the single inheritance configuration file, each of the first topics, and the inheritance order of each of the first topics".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, there is provided a video subtitle translation editing method including:

According to one or more embodiments of the present disclosure, in a video subtitle translation editing method provided by the present disclosure, the receiving a translation editing instruction includes: receiving a character deleting instruction and/or a character inputting instruction, wherein the characters related to the character inputting instruction are different from the characters included in the translation to be edited, and the language of the target translation text comprises a single-character language and a multi-character language.

According to one or more embodiments of the present disclosure, in the video subtitle translation editing method provided by the present disclosure, a language of the target translation text is a multi-character language; the receiving of the character deleting instruction comprises: receiving a deleting instruction of input contents corresponding to the characters in the translation to be edited and the set character end key; the receiving the character input instruction comprises: receiving input characters and input instructions of the character end keys, wherein the input characters are not matched with characters included in the translation to be edited; the character end key is a space key.

According to one or more embodiments of the present disclosure, in the video subtitle translation editing method provided by the present disclosure, after receiving a character input instruction, the method further includes: acquiring at least one input character matched with the character input instruction, wherein the language of the target translation text is a multi-character language; generating and displaying at least one prompt word according to each input character, wherein the prompt word comprises each input character; receiving an input instruction of the character end key and/or receiving a determination instruction of the user for a target prompt word; stopping displaying each of the prompt words.

According to one or more embodiments of the present disclosure, in a video subtitle translation editing method provided by the present disclosure, the displaying input prompt text includes: generating a prompt following area at a position associated with the translation editing instruction, and displaying the input prompt text in the prompt following area, wherein the translation to be edited is located in the middle of the target translation text; and displaying the input prompt text in the target translation text, wherein the text style of the input prompt text is different from that of the target translation text, and the translation to be edited is located at the tail position of the target translation text.

According to one or more embodiments of the present disclosure, before receiving a translation editing instruction, a video subtitle translation editing method further includes: when a prompt display triggering instruction is received, generating a dynamic prompt text, wherein the dynamic prompt text is the same as the target translation text; while displaying the input prompt text, the method further comprises the following steps: and in the dynamic prompt text, replacing the text matched with the translation to be edited with an input prompt text, wherein the text style of the input prompt text is different from the text style of the text in the dynamic prompt text.

According to one or more embodiments of the present disclosure, in a video subtitle translation editing method provided by the present disclosure, the generating a translated content presentation page according to the source video includes: acquiring audio data matched with the source video, and performing voice recognition on the audio data matched with the source video to generate an original text; acquiring the video time length of the source video, dividing the video time length to generate at least one time range and determining an original text unit matched with each time range; performing text conversion on each original text unit according to a specified language to generate a translation text matched with each original text unit, wherein the translation text is used as a translation text matched with each time range; and generating a translation content display page according to each time range and the matched translation text, wherein the translation content display page comprises at least one translation area matched with each time range, and the translation area matched with each time range is used for displaying the translation text matched with each time range.

According to one or more embodiments of the present disclosure, in the video subtitle translation editing method provided by the present disclosure, generating a translation content presentation page according to each of the time ranges and the matched translation text includes: generating a translation content display page according to each time range, the matched original text unit and the matched translation text, wherein the translation content display page further comprises: and each time range matched original text area is used for displaying the time range matched original text units.

According to one or more embodiments of the present disclosure, after generating a translated content presentation page, the video subtitle translation editing method further includes: when a translation following closing instruction is received, updating a target original text unit according to a received editing instruction of the target original text unit; and when a translation following starting instruction is received, updating the target original text unit according to the received editing instruction of the target original text unit, and correcting the translated text matched with the target original text unit according to the updated target original text unit.

According to one or more embodiments of the present disclosure, in the video subtitle translation editing method provided by the present disclosure, generating a translation content presentation page according to each of the time ranges, the matched original text unit, and the matched translation text includes: adding the original text unit matched with each time range and the matched translated text into the source video to generate a subtitle video; generating a translation content display page according to the subtitle video, each time range, the matched original text unit and the matched translation text, wherein the translation content display page further comprises: and the video playing area is used for playing the subtitle video.

According to one or more embodiments of the present disclosure, in the video subtitle translation editing method provided by the present disclosure, the multi-character language includes english, and the single-character language includes chinese or japanese.

According to one or more embodiments of the present disclosure, there is provided a video subtitle translation editing apparatus including:

According to one or more embodiments of the present disclosure, the video subtitle translation editing apparatus provided by the present disclosure includes a translation editing instruction including a character editing instruction or a word editing instruction, where the character editing instruction is used to edit characters in a first translation to be edited, the first translation to be edited includes one word, and the word editing instruction is used to edit a second translation to be edited, and the second translation to be edited includes at least one word.

According to one or more embodiments of the present disclosure, in a video subtitle translation editing apparatus provided by the present disclosure, the translation editing instruction includes a character editing instruction; the input prompt text display module comprises: the character editing unit is used for acquiring input characters matched with the character editing instruction; in the first to-be-edited translated text, obtaining an unedited character matched with the character editing instruction; and generating at least one cue word according to each input character and the unedited character, and determining the cue word as an input cue text, wherein the cue word comprises each input character and the unedited character.

According to one or more embodiments of the present disclosure, in a video subtitle translation editing apparatus provided by the present disclosure, the translation editing instruction includes a word editing instruction, and the input prompt text display module includes: the word editing unit is used for acquiring at least one input word matched with the word editing instruction; and generating a prompt phrase according to the second text to be edited and each input word, determining the prompt phrase as the input prompt text, wherein the semantic meaning of the prompt phrase is the same as that of the second text to be edited, and the prompt phrase comprises at least one word.

According to one or more embodiments of the present disclosure, the video subtitle translation editing apparatus further includes: the input prompt text stop display module is used for receiving an input instruction of prompt ending content after the input prompt text is displayed and/or receiving a determination instruction of the user for the input prompt text; and stopping displaying the input prompt text.

According to one or more embodiments of the present disclosure, in the video subtitle translation editing apparatus provided by the present disclosure, the input prompt text display module includes: the input prompt text highlighting unit is used for generating a prompt following area at a position related to the translation editing instruction, and displaying the input prompt text in the prompt following area, wherein the translation to be edited is positioned in the middle of the target translation text; and displaying the input prompt text in the target translation text, wherein the text style of the input prompt text is different from that of the target translation text, and the translation to be edited is located at the tail position of the target translation text.

According to one or more embodiments of the present disclosure, in a video subtitle translation editing apparatus provided by the present disclosure, the video subtitle translation editing apparatus further includes: the dynamic prompt text generation module is used for generating a dynamic prompt text when a prompt display triggering instruction is received before a translation editing instruction is received, wherein the dynamic prompt text is the same as the target translation text; the video subtitle translation editing apparatus further includes: and the input prompt text highlighting module is used for replacing the text matched with the translation to be edited with an input prompt text in the dynamic prompt text while displaying the input prompt text, wherein the text style of the input prompt text is different from the text style of the text in the dynamic prompt text.

According to one or more embodiments of the present disclosure, in a video subtitle translation editing apparatus provided by the present disclosure, the translation content presentation page generating module includes: : the original text unit dividing unit is used for acquiring the audio data matched with the source video, performing voice recognition on the audio data matched with the source video and generating an original text; acquiring the video time length of the source video, dividing the video time length to generate at least one time range and determining an original text unit matched with each time range; performing text conversion on each original text unit according to a specified language to generate a translation text matched with each original text unit, wherein the translation text is used as a translation text matched with each time range; and generating a translation content display page according to each time range and the matched translation text, wherein the translation content display page comprises at least one translation area matched with each time range, and the translation area matched with each time range is used for displaying the translation text matched with each time range.

According to one or more embodiments of the present disclosure, in a video subtitle translation editing apparatus provided by the present disclosure, the unit for dividing a text unit of an original text includes: an original text unit display subunit, configured to generate a translation content display page according to each of the time ranges, the matched original text unit, and the matched translation text, where the translation content display page further includes: and each time range matched original text area is used for displaying the time range matched original text units.

According to one or more embodiments of the present disclosure, in a video subtitle translation editing apparatus provided by the present disclosure, the video subtitle translation editing apparatus further includes: the translation following triggering module is used for updating the target original text unit according to the received editing instruction of the target original text unit when receiving a translation following closing instruction after generating a translation content display page; and when a translation following starting instruction is received, updating the target original text unit according to the received editing instruction of the target original text unit, and correcting the translated text matched with the target original text unit according to the updated target original text unit.

According to one or more embodiments of the present disclosure, in a video subtitle translation editing apparatus provided by the present disclosure, the unit for dividing a text unit of an original text includes: the subtitle video display subunit is used for adding the original text unit matched with each time range and the matched translated text into the source video to generate a subtitle video; generating a translation content display page according to the subtitle video, each time range, the matched original text unit and the matched translation text, wherein the translation content display page further comprises: and the video playing area is used for playing the subtitle video.

In accordance with one or more embodiments of the present disclosure, there is provided an electronic device including: the device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the video caption translation editing method according to any one of the embodiments of the disclosure.

According to one or more embodiments of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing a video subtitle translation editing method as described in any one of the embodiments of the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A method for translating and editing video subtitles, comprising:

2. The method according to claim 1, wherein the translation editing instruction comprises a character editing instruction or a word editing instruction, the character editing instruction is used for editing characters in a first translation to be edited, the first translation to be edited comprises one word, the word editing instruction is used for editing a second translation to be edited, and the second translation to be edited comprises at least one word.

3. The method of claim 2, wherein the translation editing instructions comprise character editing instructions;

generating and displaying an input prompt text according to the translation to be edited, wherein the input prompt text comprises the following steps:

acquiring an input character matched with the character editing instruction;

in the first to-be-edited translated text, obtaining an unedited character matched with the character editing instruction;

and generating at least one cue word according to each input character and the unedited character, and determining the cue word as an input cue text, wherein the cue word comprises each input character and the unedited character.

4. The method of claim 2, wherein the translation editing instructions comprise word editing instructions;

acquiring at least one input word matched with the word editing instruction;

and generating a prompt phrase according to the second text to be edited and each input word, determining the prompt phrase as the input prompt text, wherein the semantic meaning of the prompt phrase is the same as that of the second text to be edited, and the prompt phrase comprises at least one word.

5. The method of claim 3 or 4, further comprising, after displaying the input prompt text:

receiving an input instruction for prompting ending content, and/or receiving a determination instruction of the user for inputting a prompt text;

and stopping displaying the input prompt text.

6. The method of claim 1, wherein the displaying input prompt text comprises:

generating a prompt following area at a position associated with the translation editing instruction, and displaying the input prompt text in the prompt following area, wherein the translation to be edited is located in the middle of the target translation text;

and displaying the input prompt text in the target translation text, wherein the text style of the input prompt text is different from that of the target translation text, and the translation to be edited is located at the tail position of the target translation text.

7. The method of claim 1, prior to receiving the translation editing instruction, further comprising:

when a prompt display triggering instruction is received, generating a dynamic prompt text, wherein the dynamic prompt text is the same as the target translation text;

while displaying the input prompt text, the method further comprises the following steps:

and in the dynamic prompt text, replacing the text matched with the translation to be edited with an input prompt text, wherein the text style of the input prompt text is different from the text style of the text in the dynamic prompt text.

8. The method of claim 1, wherein generating a translated content presentation page from the source video comprises:

acquiring audio data matched with the source video, and performing voice recognition on the audio data matched with the source video to generate an original text;

acquiring the video time length of the source video, dividing the video time length to generate at least one time range and determining an original text unit matched with each time range;

performing text conversion on each original text unit according to a specified language to generate a translation text matched with each original text unit, wherein the translation text is used as a translation text matched with each time range;

and generating a translation content display page according to each time range and the matched translation text, wherein the translation content display page comprises at least one translation area matched with each time range, and the translation area matched with each time range is used for displaying the translation text matched with each time range.

9. The method of claim 8, wherein generating a translation content presentation page based on each of the time ranges and the matched translation text comprises:

generating a translation content display page according to each time range, the matched original text unit and the matched translation text, wherein the translation content display page further comprises: and each time range matched original text area is used for displaying the time range matched original text units.

10. The method of claim 9, wherein after generating the translated content presentation page, further comprising:

when a translation following closing instruction is received, updating a target original text unit according to a received editing instruction of the target original text unit;

and when a translation following starting instruction is received, updating the target original text unit according to the received editing instruction of the target original text unit, and correcting the translated text matched with the target original text unit according to the updated target original text unit.

11. The method of claim 9, wherein generating a translation content presentation page based on each of the time ranges, the matched units of original text, and the matched translated text comprises:

adding the original text unit matched with each time range and the matched translated text into the source video to generate a subtitle video;

generating a translation content display page according to the subtitle video, each time range, the matched original text unit and the matched translation text, wherein the translation content display page further comprises: and the video playing area is used for playing the subtitle video.

12. A video subtitle translation editing apparatus, comprising:

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the video title translation editing method according to any one of claims 1-11 when executing the program.

14. A computer-readable storage medium on which a computer program is stored, the program, when being executed by a processor, implementing a video subtitle translation editing method according to any one of claims 1-11.