WO2021259221A1 - 视频翻译方法和装置、存储介质和电子设备 - Google Patents
视频翻译方法和装置、存储介质和电子设备 Download PDFInfo
- Publication number
- WO2021259221A1 WO2021259221A1 PCT/CN2021/101388 CN2021101388W WO2021259221A1 WO 2021259221 A1 WO2021259221 A1 WO 2021259221A1 CN 2021101388 W CN2021101388 W CN 2021101388W WO 2021259221 A1 WO2021259221 A1 WO 2021259221A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- translation
- text
- user
- suggestion
- time information
- Prior art date
Links
- 238000013519 translation Methods 0.000 title claims abstract description 440
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000012790 confirmation Methods 0.000 claims abstract description 48
- 230000004044 response Effects 0.000 claims abstract description 42
- 238000004590 computer program Methods 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 11
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000008685 targeting Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 description 43
- 230000008901 benefit Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 7
- 230000009471 action Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 101000822695 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C1 Proteins 0.000 description 1
- 101000655262 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C2 Proteins 0.000 description 1
- 101000655256 Paraclostridium bifermentans Small, acid-soluble spore protein alpha Proteins 0.000 description 1
- 101000655264 Paraclostridium bifermentans Small, acid-soluble spore protein beta Proteins 0.000 description 1
- 230000004397 blinking Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 229920001690 polydopamine Polymers 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001915 proofreading effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/51—Translation evaluation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/57—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/34—Indicating arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234336—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by media transcoding, e.g. video is transformed into a slideshow of still pictures or audio is converted into text
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440236—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4884—Data services, e.g. news ticker for displaying subtitles
Definitions
- the present disclosure relates to the field of machine translation, and in particular to a video translation method and device, storage medium and electronic equipment.
- the content of the invention is provided to introduce concepts in a brief form, and these concepts will be described in detail in the following specific embodiments.
- the content of the invention is not intended to identify the key features or essential features of the technical solution that is required to be protected, nor is it intended to be used to limit the scope of the technical solution that is required to be protected.
- the present disclosure provides a video translation method, including:
- the first time information is the start time of the text in the video
- the second time information is the The ending time of the text in the video
- an editing area In response to a user's operation on the text or the reference translation, an editing area is displayed, the editing area supports the user to input the translation;
- the present disclosure provides a video translation device, including:
- Conversion module used to convert the voice of the video to be translated into text
- the display module is used to display the text and the first time information, the second time information and the reference translation of the text, the first time information is the start time of the text in the video, and the first time information is the start time of the text in the video. Second, the time information is the end time of the text in the video;
- the display module is further configured to display an editing area in response to a user's operation on the text or the reference translation, and the editing area supports the user to input the translation;
- the suggestion module is used to follow the user's input in the editing area and provide translation suggestions from the reference translation;
- the display module is further configured to display the translation suggestion as a translation result in the editing area in the case of detecting the user's confirmation operation for the translation suggestion; and, after detecting that the user is targeting In the case of the non-confirmation operation of the translation suggestion, a translation input by the user that is different from the translation suggestion is received, and the translation input by the user is displayed as the translation result in the editing area, according to the The reference translation in the translation area is updated with the translation input by the user.
- the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the method described in the first aspect of the present disclosure are implemented.
- an electronic device including:
- a storage device on which a computer program is stored
- the processing device is configured to execute the computer program in the storage device to implement the steps of the method in the first aspect of the present disclosure.
- the voice of the video to be translated can be converted into text
- the first time information, second time information and reference translation of the text can be provided, and follow the user's input in the editing area
- Fig. 1 is a flowchart showing a video translation method according to an exemplary disclosed embodiment.
- Fig. 2 is a schematic diagram showing a translation interface according to an exemplary disclosed embodiment.
- Fig. 3 is a schematic diagram showing a text splitting manner according to an exemplary disclosed embodiment.
- Fig. 4 is a block diagram showing a video translation device according to an exemplary disclosed embodiment.
- Fig. 5 is a block diagram showing an electronic device according to an exemplary disclosed embodiment.
- Figure 1 is a flow chart showing a video translation method according to an exemplary disclosed embodiment.
- This method can be used in terminals, servers and other independent electronic devices, and can also be applied to a translation system.
- the method in the method Each step can be completed by multiple devices in the translation system.
- S12 and S14 shown in FIG. 1 can be executed by the terminal, and S11 and S13 can be executed by the server.
- the video translation method includes the following steps:
- the voice content of the video can be translated, such as audio tracks, and convert the voice content into text content through voice recognition technology.
- the text content can be divided into multiple sentences according to the clauses in the voice content, and the text content of each sentence can correspond to the time information extracted from the speech content of the clause , Use it as the timeline information of the text content of the sentence.
- the voice content of the video to be translated is recognized as multiple sentences.
- the first sentence is "First introduce what is a hot spot”
- the sentence is located between the second to the fifth second of the video
- the sentence text The timeline information corresponding to the content is "00:00:02-00:00:05”
- the second round is "You can see from the right side of ppt”
- this sentence is located between the 5th and 7th seconds of the video
- the timeline information corresponding to the text content of the sentence is "00:00:05-00:00:07”.
- the text may be segmented according to the time information and/or picture frames corresponding to the text in the video to obtain the multiple segmented texts, For example, consider the recognized text of the voice within multiple consecutive seconds as a clause, or use the recognized text of the voice appearing in multiple consecutive frames as a clause; it is also possible to divide the sentence according to the pause in the speech content, for example , You can set a pause threshold. When the human voice content is not recognized within the pause threshold, the sentence can be segmented at any position where the human voice content is not recognized; it can also be segmented according to the semantics of the voice content.
- the recognized text content can be segmented through the sentence model to obtain the text content after the sentence.
- the first time information is the start time of the text in the video
- the second time information is the end time of the text in the video
- the text may be a text that has been divided into sentences
- the first time information is the start time of the current sentence of the sentenced text in the video
- the second time information is the current division of the sentenced text End time in the video.
- the editing area can be displayed above the reference translation of the text.
- the editing area supports the user to input the translation, and the user can perform editing operations in the editing area to obtain the translation result of the text. Among them, the editing area can be displayed above the reference translation for the user to compare and modify.
- the text may be a text that has been divided into sentences, and each sentence text is displayed in a different area, and for each sentence text, the first time information, the second time information, and the reference translation of the sentence text are displayed.
- the text includes a plurality of clause texts, each of the clause texts is displayed in a different area, and a split that provides the user to split the clause text can also be displayed.
- the function bar and in response to the user's splitting operation for any one of the sentenced texts, splits the sentenced text into at least two sentenced sentenced texts, and associated display for each of the sentenced texts.
- the split function bar may be provided in response to a user's operation on the clause text or reference translation, and the split function bar may be hidden before the user selects the clause text or reference text.
- the timeline information of this piece of text is "00:00:15-00:00:18”
- the first The first time information is 00:00:15
- the second time information is 00:00:18.
- the user divides it into two clauses: "What I want to introduce to you today" and "The three cities that are about to rise in our country”, then You can set the time axis for each clause according to the length of the text before editing and the length of the text of each clause after editing.
- Figure 3 shows a schematic diagram of a possible text splitting method.
- the user can use the cursor to select the position where the sentence needs to be split, and click the sentence button; the text before the sentence will be split into two subsections.
- the first time information and the second time information of each clause are all obtained by splitting the first time information and the second time information before the clause.
- a section of text in the dashed box before splitting is split into two subsections in the dashed box.
- the text includes a plurality of clause texts, each of the clause texts is displayed in a different area, and a merge function column that provides the user to merge the clauses may also be displayed, And in response to the user's merging operation for any two adjacent clause texts, the two adjacent clause texts are merged into a new clause text, and the new clause text is displayed in association The new clause text, the first time information, the second time information of the new clause text, and the reference translation of the new clause text.
- the merging function bar may be provided in response to the user's operation on the clause text or the reference translation, and the merging function bar may be hidden before the user selects the clause text or the reference text.
- the text includes multiple sentence texts, each of the sentence texts is displayed in a different area, and a play function bar that provides the user to play the sentence texts may also be displayed. , And in response to the user's operation on the play function bar, play the voice corresponding to the clause text.
- the play function bar may be provided in response to the user's operation on the sentence text or the reference translation, and the play function bar may be hidden before the user selects the sentence text or the reference text.
- the reference translation or the translation result may be used as a subtitle to play the video corresponding to the sentence text, so that the user can view the effect of the translated subtitle.
- FIG 2 is a schematic diagram of a possible translation interface.
- the inside of the dashed box is the translation interface of a section of text content that has been selected by the user.
- the selected text content will display the editing area and the play, merge, and split function bars.
- the text content of the video to be translated is displayed above the reference translation, and different clauses have different display areas. Each display area can be translated independently and will not be updated due to the modification of other areas.
- the user can enter characters in the editing area or modify the characters of the text to be translated.
- the translation interface may also include time axis information, including first time information representing the start time and second time information representing the end time.
- the reference translation is gray, and the translation is suggested to be black.
- the reference translation can be moved down one line to align with the function bar, while the original reference translation is located
- the area becomes the editing area, which is used to display translation suggestions and receive user modifications.
- the method provided by the embodiment of the present disclosure includes displaying the translation suggestion as the translation result in the editing area in the case of detecting the user's confirmation operation for the translation suggestion, and when it is detected that the user is targeting the translation suggestion In the case of a non-confirmation operation of the translation suggestion, a translation that is different from the translation suggestion input by the user is received, and the reference translation in the translation area is updated according to the translation input by the user.
- the above confirmation operation may be the user's operation on a preset shortcut key.
- the translation suggestion is displayed in the editing area as the translation result.
- the action of displaying the translation suggestion as the translation result in the editing area will be the user's input in the editing area as described in step S14, that is, in this case, step S14 is It shows that the method provided by the embodiment of the present disclosure can provide the next translation suggestion from the reference translation in response to displaying the translation suggestion provided this time as the translation result in the editing area (the next translation suggestion may be the provided translation It is recommended to refer to the subsequent translation in the reference translation).
- the detection of the non-confirmation operation of the user for the translation suggestion may be the detection of the inconsistency between the translation input by the user and the translation suggestion provided this time.
- the method provided by the embodiment of the present disclosure It can receive a translation that is different from the suggested translation input by the user, and update the reference translation in the translation area according to the translation input by the user.
- step S14 indicates that this The method provided by the disclosed embodiment can provide the next translation suggestion from the reference translation updated according to the translation input by the user in response to the user inputting a translation in the editing area that is different from the translation suggestion.
- the translation suggestion provided this time is "my”. If it is detected that the translation input by the user is a translation "I” that is different from the translation suggestion "my”, the reference translation is updated according to the translation "I”, and the updated Suggestions for the next translation of the translation "I" are provided in the reference translation of.
- the translation suggestions from the reference translation can be provided according to the user's input, and the user can directly use the translation suggestion as the translation result through the confirmation operation, reducing the user's input time.
- the present disclosure combines manual accuracy and machine efficiency , Can improve the efficiency of translation and the quality of translation.
- the providing translation suggestions in step S14 may include: highlighting the translation suggestions from the reference translation in the translation area.
- the highlighting of the translation suggestion in the translation area can be cancelled.
- the highlighting can be bold fonts, highlighted fonts, heterochromatic characters, heterochromatic backgrounds, shading effects, etc., which can highlight the translation suggestions.
- the highlighting may be a display mode different from the display mode of the input translation, for example, the input translation may be a bold font, the translation suggestion is a normal font, or the input translation It can be black words, the translation suggestion is gray words, etc.
- the display mode of the translation suggestion can be adjusted to be the same as the display method of the input translation.
- the input translation may be in bold font, and the translation suggestion is in normal font.
- the translation suggestion is adjusted to be displayed in bold font.
- the confirmation operation may be a user's input operation on a shortcut key of the electronic device.
- the electronic device may be a mobile phone, and the shortcut key may be a virtual key on the display area of the mobile phone or an entity of the mobile phone.
- Key for example: volume key
- the user can operate the above shortcut key to adopt the translation suggestion, then in the case of detecting the user's input operation on the above shortcut key, the translation suggestion can be displayed in the editing area as a question result ;
- the electronic device can also be a computer, and the shortcut key can be a designated or custom key on the computer keyboard or mouse (for example: keyboard alt key, mouse side key, etc.).
- the confirmation operation may also be a gesture confirmation operation recognized after being captured by the camera, such as nodding, blinking, making a preset gesture, etc.; it may also be a voice operation recognized after being captured by a microphone.
- the translation suggestion from the reference translation includes at least one of a word, a phrase, and a sentence.
- the user When the user translates the text content, he can refer to the reference translation displayed in the translation area and input in the editing area (it is worth noting that the input here includes the input of characters, such as typing letters, words, etc., as well as Key operation input, such as clicking the editing area, etc.), can provide translation suggestions from the reference translation.
- the translation suggestion can be a translation suggestion for the whole sentence of a clause, or a more fine-grained translation suggestion provided word by word or phrase by phrase.
- the user can adopt the translation suggestion through the confirmation operation, and use the confirmation operation as an input operation in the editing area to continue to provide translation suggestions from the reference translation, for example, in the case of detecting the user's confirmation operation for "Some” , Display "Some” as the translation result in the editing area, and provide users with the next translation suggestion "cities".
- the non-confirmation operation can be a preset operation that represents non-confirmation (clicking a preset button, making a preset action, etc.), or it can refer to other conditions besides the aforementioned confirmation operation, for example, The confirmation operation is not performed within the set time, or the operation to continue input is performed.
- the reference translation of the text "Some cities continue to rise with the advantage of a complete high-speed rail network” is "Some cities continue to rise with the advantage of the perfect high-speed rail network.”
- the click input operation provide the translation suggestion "Some” from the reference translation, and based on the user's confirmation operation, display the translation suggestion "Some” as the translation result in the editing area, and continue to provide the user with the next translation suggestion "cities” .
- the translation suggestion "with” if you receive the user's input "b” that is different from the translation suggestion, you can update the reference translation to "Some cities continue to rise because of the advantage of the perfect high-speed” based on the translation input by the user rail network.” and provide users with translation suggestions "because”.
- the user can directly edit the translation suggestion in the editing area, for example, insert a word in the translation suggestion, delete a word in the translation suggestion, and change the translation Suggested words etc.
- the translation suggestion for the text "Some cities continue to rise by virtue of the perfect high-speed rail network” is the same as the reference translation, which is “Some cities continue to rise with the advantage of the perfect high-speed rail network.”
- users can Directly modify "with” to "because of” in the translation suggestion, and update the reference translation to "ome city continue to rise because of the advantage of the perfect high-speed rail network.” according to the user's modification, and inform the user Provide a translation suggestion from the reference translation, and the user can confirm the translation suggestion as the translation result.
- the reference translation and translation suggestions can be provided by machine translation (for example, a deep learning translation model, etc.). It is worth noting that when a reference translation that conforms to the text content cannot be generated based on the translation entered by the user in the editing area, the user can correct the translation characters entered by the user based on the pre-stored dictionary content, and update the translation based on the corrected translation.
- the reference translation can be provided by machine translation (for example, a deep learning translation model, etc.).
- the translation language is English and the original text is Chinese as examples in this disclosure, this disclosure does not limit the language of the translation and the language of the original text.
- the original text in this disclosure may also be Chinese classical Chinese or translated text. It can be in the Chinese vernacular, or in various combinations such as the original text in Japanese and the translated text in English.
- the original text display area is an editable area, and in response to a user's operation to modify the text content in the original text display area, the reference translation in the translation area can be updated.
- the user Before or after the user enters the translation in the translation area, the user can edit the content of the text, that is, the original translation, and the entered translation will not be overwritten due to the modification of the original text, but will be based on the modified text by the user
- the content and the input translation characters update the translation result.
- the text content before editing is "Some cities continue to rise by virtue of the perfect Qualcomm network”
- the corresponding translation suggestion is "Some cities continue to rise with the advantage of a perfect Qualcomm network.”
- the user is editing
- the translation result of the regional input is "Some cities continue to rise b”. If the translation that is different from the translation suggestion is "b”, you can update the reference translation to "Some cities continue to rise because of the advantage of a perfect Qualcomm network.” ".
- the text content of the sentence may be misrecognized text due to noise, the accent of the voice narrator and other factors.
- the length of the text content after editing is greater than the length of the text content before editing. According to the time axis information of the text content before editing, the time axis information of the edited text content is obtained through interpolation processing. .
- the timeline information of each text will be reset to the original 9/11, and when subsequent users perform operations such as clauses, merging, etc., the clause or merged sub-segment is determined based on the timeline information of each text Timeline information.
- the translation result may be added as a subtitle to the frame of the video to be translated.
- the timeline of the translation result of the first sentence of the video to be translated is "00:00:00-00:00:02" (the first time information is 00:00:00, and the second time information is 00:00:02 ), the timeline of the second translation result is "00:00:03-00:00:07” (the first time information is 00:00:03, the second time information is 00:00:07), you can Insert the translation result with the timeline of "00:00:00-00:00:02" from the 0th second to the 2nd second of the video to be translated, and insert the translation result from the 3rd to the 7th second of the video to be translated
- the timeline is the translation result of "00:00:03-00:00:07", which can be inserted into the video to be translated in the form of subtitles.
- the translated video can be generated in a format specified by the user and provided to the user for download.
- the voice of the video to be translated can be converted into text
- the first time information, second time information and reference translation of the text can be provided, and follow the user's input in the editing area
- Fig. 4 is a block diagram showing a video translation device according to an exemplary disclosed embodiment. As shown in FIG. 4, the video translation device 400 includes:
- the conversion module 410 is used to convert the voice of the video to be translated into text.
- the display module 420 is used to display the text and the first time information, the second time information and the reference translation of the text, the first time information is the start time of the text in the video, the The second time information is the end time of the text in the video.
- the display module 420 is further configured to display an editing area in response to a user's operation on the text or the reference translation, and the editing area supports the user to input a translation.
- the suggestion module 430 is configured to follow the user's input in the editing area and provide translation suggestions from the reference translation;
- the display module 420 is further configured to display the translation suggestion as a translation result in the editing area when the user's confirmation operation for the translation suggestion is detected; and, when the user is detected In the case of a non-confirmation operation for the translation suggestion, receiving a translation input by the user that is different from the translation suggestion, displaying the translation input by the user as the translation result in the editing area, according to The translation input by the user updates the reference translation in the translation area.
- the display module 420 is further configured to segment the text according to the time information and/or picture frames corresponding to the text in the video to obtain the multiple segmented texts; The clause text displays the clause text and the first time information, the second time information and the reference translation of the clause text.
- the text includes a plurality of sectioned texts, each of the sectioned texts is displayed in a different area
- the device further includes a splitting module for displaying a splitting function bar, and the splitting function bar supports The user splits the sentence text; in response to the user's split operation for any one of the sentence texts, the sentence text is split into at least two sentence sentence texts, and each 1.
- the text includes a plurality of clause texts, each of the clause texts is displayed in a different area, the device further includes a merging module for displaying a merged function column, and the merged function column supports the user Merging the clause text; in response to the user's merging operation for any two adjacent clause texts, merging the two adjacent clause texts into a new clause text, and targeting the
- the new clause text is associated and displayed with the new clause text, the first time information, the second time information of the new clause text, and the reference translation of the new clause text.
- the text includes a plurality of sectioned texts, each of the sectioned texts is displayed in a different area, and the device further includes a play module for displaying a play function bar, and the play function bar supports the user Playing the voice corresponding to the sentence text; in response to the user's operation on the playback function bar, playing the voice corresponding to the sentence text.
- the suggestion module 430 is configured to display the translation suggestion in a display mode different from the input translation in the editing area; and in response to the user's confirmation operation on the translation suggestion, the The display of the translation suggestion as the translation result to the editing area includes: in response to the user confirming the translation suggestion, displaying it in the editing area in the same manner as the input translation as the translation result The said translation suggestion.
- the suggestion module 430 is further configured to display the translation suggestion as a translation result in the editing area in response to a user's triggering operation of the shortcut key.
- the voice of the video to be translated can be converted into text
- the first time information, second time information and reference translation of the text can be provided, and follow the user's input in the editing area
- FIG. 5 shows a schematic structural diagram of an electronic device (for example, the terminal device or the server in FIG. 1) 500 suitable for implementing the embodiments of the present disclosure.
- the terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (personal digital assistants, PDAs), tablet computers (portable android devices, PAD), and portable multimedia players.
- Mobile terminals such as portable media player (PMP), in-vehicle terminals (for example, in-vehicle navigation terminals), and fixed terminals such as digital television (DTV) and desktop computers.
- PMP portable media player
- in-vehicle terminals for example, in-vehicle navigation terminals
- DTV digital television
- the electronic device shown in FIG. 5 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
- the electronic device 500 may include a processing device (such as a central processing unit, a graphics processor, etc.) 501, which may be based on a program stored in a read-only memory (read-only memory, ROM) 502 or from a storage device 508 is loaded into a program in a random access memory (RAM) 503 to execute various appropriate actions and processing.
- a processing device such as a central processing unit, a graphics processor, etc.
- ROM read-only memory
- RAM random access memory
- various programs and data required for the operation of the electronic device 500 are also stored.
- the processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
- An input/output (I/O) interface 505 is also connected to the bus 504.
- the following devices can be connected to the I/O interface 505: including input devices 506 such as touch screen, touch panel, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, liquid crystal display (LCD) Output devices 507 such as speakers, vibrators, etc.; storage devices 508 such as magnetic tapes, hard disks, etc.; and communication devices 509.
- the communication device 509 may allow the electronic device 500 to perform wireless or wired communication with other devices to exchange data.
- FIG. 5 shows an electronic device 500 having various devices, it should be understood that it is not required to implement or have all of the illustrated devices. It may be implemented alternatively or provided with more or fewer devices.
- an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a non-transitory computer readable medium, and the computer program contains program code for executing the method shown in the flowchart.
- the computer program may be downloaded and installed from the network through the communication device 509, or installed from the storage device 508, or installed from the ROM 502.
- the processing device 501 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
- the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
- the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above.
- Computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Erasable programmable read-only memory (EPROM) or flash memory, optical fiber, portable compact disc read-only memory (CD-ROM), optical storage device, magnetic storage device, or any of the above The right combination.
- a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein.
- This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
- the computer-readable signal medium may send, propagate or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
- the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, radio frequency (RF), etc., or any suitable combination of the foregoing.
- the client and server can communicate with any currently known or future-developed network protocol, such as hypertext transfer protocol (HTTP), and can communicate with digital data in any form or medium.
- Communication e.g., communication network
- Examples of communication networks include local area networks (LAN), wide area networks (WAN), the Internet (e.g., the Internet), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any current Know or develop a network in the future.
- LAN local area networks
- WAN wide area networks
- the Internet e.g., the Internet
- end-to-end networks e.g., ad hoc end-to-end networks
- the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.
- the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains at least two Internet protocol addresses; A node evaluation request for an Internet Protocol address, wherein the node evaluation device selects an Internet Protocol address from the at least two Internet Protocol addresses and returns it; receives the Internet Protocol address returned by the node evaluation device; wherein, the obtained The Internet Protocol address indicates the edge node in the content distribution network.
- the aforementioned computer-readable medium carries one or more programs, and when the aforementioned one or more programs are executed by the electronic device, the electronic device: receives a node evaluation request including at least two Internet Protocol addresses; Among at least two Internet Protocol addresses, select an Internet Protocol address; return the selected Internet Protocol address; wherein, the received Internet Protocol address indicates an edge node in the content distribution network.
- the computer program code used to perform the operations of the present disclosure can be written in one or more programming languages or a combination thereof.
- the above-mentioned programming languages include, but are not limited to, object-oriented programming languages—such as Java, Smalltalk, C++, and Including conventional procedural programming languages-such as "C" language or similar programming languages.
- the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
- the remote computer can be connected to the user's computer through any kind of network including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to Connect via the Internet).
- LAN local area network
- WAN wide area network
- each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logic function.
- Executable instructions can also occur in a different order from the order marked in the drawings. For example, two blocks shown one after the other can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
- each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
- the modules involved in the embodiments described in the present disclosure can be implemented in software or hardware. Wherein, the name of the module does not constitute a limitation on the module itself under certain circumstances.
- the first obtaining module can also be described as "a module for obtaining at least two Internet Protocol addresses.”
- exemplary types of hardware logic components include: field programmable gate array (FPGA), application specific integrated circuit (ASIC), application specific standard product parts, ASSP), system on a chip (SOC), complex programmable logic device (CPLD), etc.
- FPGA field programmable gate array
- ASIC application specific integrated circuit
- ASSP application specific standard product parts
- SOC system on a chip
- CPLD complex programmable logic device
- a machine-readable medium may be a tangible medium, which may contain or store a program for use by the instruction execution system, apparatus, or device or in combination with the instruction execution system, apparatus, or device.
- the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- the machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any suitable combination of the foregoing.
- machine-readable storage media would include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or flash memory erasable programmable read-only memory
- CD-ROM compact disk read only memory
- magnetic storage device or any suitable combination of the foregoing.
- Example 1 provides a video translation method, including converting the voice of the video to be translated into text; displaying the text and the first time information and second time information of the text Information and a reference translation, the first time information is the start time of the text in the video, and the second time information is the end time of the text in the video; in response to the user’s response to the The operation of the text or the reference translation shows the editing area, the editing area supports the user to input the translation; following the user’s input in the editing area, the translation suggestions from the reference translation are provided; wherein, in the detection In the case of the user's confirmation operation for the translation suggestion, the translation suggestion is displayed in the editing area as the translation result; and, in the case of detecting the user's non-confirmation operation for the translation suggestion Next, receiving a translation input by the user that is different from the translation suggestion, displaying the translation input by the user as the translation result in the editing area, and updating the translation according to the translation input by the user The reference translation in the translation
- Example 2 provides the method of Example 1.
- the text is segmented according to the time information and/or picture frames corresponding to the text in the video to obtain the multiple A clause text; for each clause text, the clause text and the first time information, the second time information and the reference translation of the clause text are displayed.
- Example 3 provides the method of Example 1.
- the text includes a plurality of clause texts, each of the clause texts is displayed in a different area, and the method further includes: A sub-function column, where the split function column supports the user to split the clause text; in response to the user's split operation for any one of the clause texts, split the clause text At least two sentence-divided texts are formed, and for each of the sentence-divided texts, the sentence-divided text, the first time information, the second time information of the sentence-divided text, and the reference translation of the sentence-divided text are displayed in association with each other .
- Example 4 provides the method of Example 1.
- the text includes a plurality of clause texts, each of the clause texts is displayed in a different area, and the method further includes: display merge A function bar, the merging function bar supports the user to merge the clause text; in response to the user's merging operation for any two adjacent clause texts, the two adjacent clause texts are merged Into a new clause text, and for the new clause text, the new clause text, the first time information, the second time information of the new clause text, and the new clause text are displayed in association with each other The reference translation of the clause text.
- Example 5 provides the method of Examples 1-4, the text includes a plurality of clause texts, each of the clause texts is displayed in a different area, and the method further includes: A play function bar is displayed, the play function bar supports the user to play the voice corresponding to the sentence text; in response to the user's operation on the play function bar, the voice corresponding to the sentence text is played.
- Example 6 provides the method of Examples 1-4, and the providing the translation suggestions from the reference translation includes: displaying in the editing area in a manner different from the input translation Displaying the translation suggestion; the displaying the translation suggestion in the editing area as a translation result in response to the user's confirmation operation on the translation suggestion includes: responding to the user's confirmation of the translation suggestion Operation to display the translation suggestion as the translation result in the editing area in the same manner as the display manner of the input translation.
- Example 7 provides the method of Examples 1-4.
- the translation suggestion is displayed to the translation result as a translation result.
- the editing area includes: in response to a user's input operation on the shortcut key, displaying the translation suggestion as a translation result in the editing area.
- Example 8 provides a video translation device.
- a conversion module is used to convert the voice of a video to be translated into text; and a display module is used to display the text and the content of the text.
- the first time information, the second time information and the reference translation the first time information is the start time of the text in the video, and the second time information is the end of the text in the video Time;
- the display module is also used to display the editing area in response to the user's operation on the text or the reference translation, and the editing area supports the user to input the translation;
- the suggestion module is used to follow the user in the editing area To provide translation suggestions from the reference translation;
- the display module is also used to display the translation suggestions as a translation result in the case of detecting the user's confirmation operation for the translation suggestions The editing area; and, in the case of detecting a non-confirmation operation of the user for the translation suggestion, receiving a translation input by the user that is different from the translation suggestion, and using the translation input by the user as The translation result is displayed in
- Example 9 provides the device of Example 8, and the display module is further configured to perform a display on the text according to the time information and/or picture frames corresponding to the text in the video. Clause to obtain the multiple clause texts; for each clause text, display the clause text and the first time information, the second time information and the reference translation of the clause text.
- Example 10 provides the device of Example 8.
- the text includes a plurality of clause texts, each of the clause texts is displayed in a different area, and the device further includes a splitting module , Used to display the split function bar, the split function bar supports the user to split the sentence text; in response to the user's split operation for any of the sentence text, the The sentenced text is split into at least two sentenced sentence texts, and for each sentenced sentence text, the sentenced sentence text, the first time information, the second time information of the sentenced text, and the divided sentence text are displayed in association with each other.
- the reference translation of the sentence text is used to display the split function bar, the split function bar supports the user to split the sentence text; in response to the user's split operation for any of the sentence text, the The sentenced text is split into at least two sentenced sentence texts, and for each sentenced sentence text, the sentenced sentence text, the first time information, the second time information of the sentenced text, and the divided sentence text are displayed in association with each other.
- the reference translation of the sentence text
- Example 11 provides the device of Example 8.
- the text includes a plurality of clause texts, each of the clause texts is displayed in a different area, and the device further includes a merging module, Used to display the merge function column, the merge function column supports the user to merge the clause text; in response to the user's merging operation for any two adjacent clause texts, the two adjacent The clause text is merged into a new clause text, and for the new clause text, the new clause text, the first time information, the second time information, and the new clause text are displayed in association with each other.
- the reference translation of the new clause text Used to display the merge function column, the merge function column supports the user to merge the clause text; in response to the user's merging operation for any two adjacent clause texts, the two adjacent The clause text is merged into a new clause text, and for the new clause text, the new clause text, the first time information, the second time information, and the new clause text are displayed in association with each other.
- the reference translation of the new clause text is used to display the merge function column, the merge
- Example 12 provides the device of Examples 8-11, where the text includes a plurality of sectioned texts, each of the sectioned texts is displayed in a different area, and the apparatus further includes playing Module, used to display the playback function bar, the playback function bar supports the user to play the voice corresponding to the sentence text; in response to the user's operation on the playback function bar, play the voice corresponding to the sentence text voice.
- Example 13 provides the device of Examples 8-11, and the suggestion module is used to display the translation suggestion in a display manner different from the input translation in the editing area; Said that in response to the user's confirmation operation on the translation suggestion, displaying the translation suggestion as a translation result in the editing area includes: responding to the user's confirmation operation on the translation suggestion to compare with the input translation suggestion.
- the translation suggestion as the translation result is displayed in the editing area in the same manner as the display method.
- Example 14 provides the device of Examples 8-11.
- the suggestion module is further configured to display the translation suggestion as a translation result in response to a user's triggering operation on the shortcut key.
- the editing area is further configured to display the translation suggestion as a translation result in response to a user's triggering operation on the shortcut key.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Machine Translation (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
本公开涉及一种视频翻译方法、装置、存储介质和电子设备,所述方法包括:将待翻译的视频的语音转换为文本;展示所述文本和所述文本的第一时间信息、第二时间信息及参考译文;响应于用户对所述文本或所述参考译文的操作,展示编辑区域,所述编辑区域支持所述用户输入译文;跟随所述用户在所述编辑区域的输入,提供来自所述参考译文的译文建议;其中,在检测到所述用户针对所述译文建议的确认操作的情况下,将所述译文建议作为译文结果显示到所述编辑区域;以及,在检测到所述用户针对所述译文建议的非确认操作的情况下,接收所述用户输入的不同于所述译文建议的译文,将所述用户输入的所述译文作为所述译文结果显示到所述编辑区域,根据所述用户输入的所述译文更新所述译文区域中的参考译文。本公开可以提升翻译的效率以及翻译的质量。
Description
本申请要求于2020年06月23日提交中国国家知识产权局、申请号为202010583177.4、申请名称为“视频翻译方法和装置、存储介质和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本公开涉及机器翻译领域,尤其涉及一种视频翻译方法和装置、存储介质和电子设备。
在机器翻译发达的如今,简单的翻译任务已经可以通过机器翻译加人工校对的方式进行,也就是采用机器译后编辑(machine translation post-editting,MTPE)的模式进行。但现在的MTPE技术下,人工修改和机器翻译不能很好地兼容,导致翻译的质量仍旧不及人工翻译,这些问题严重制约了MTPE的发展,对翻译结果的准确性要求较高的翻译场景下,通常还是会使用人工翻译的方式进行翻译,但人工翻译效率较低、速度较慢的问题仍旧无法解决。并且,当需要对视频进行翻译时,需要人工对视频的内容逐句进行听取和翻译,这种翻译方式效率低下,无法满足当前全球化背景下的大量的视频翻译需求。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
第一方面,本公开提供一种视频翻译方法,包括:
将待翻译的视频的语音转换为文本;
展示所述文本和所述文本的第一时间信息、第二时间信息及参考译文,所述第一时间信息为所述文本在所述视频中的起始时间,所述第二时间信息为所述文本在所述视频中的结束时间;
响应于用户对所述文本或所述参考译文的操作,展示编辑区域,所述编辑区域支持所述用户输入译文;
跟随所述用户在所述编辑区域的输入,提供来自所述参考译文的译文建议;
其中,在检测到所述用户针对所述译文建议的确认操作的情况下,将所述译文建议作为译文结果显示到所述编辑区域;
以及,在检测到所述用户针对所述译文建议的非确认操作的情况下,接收所述用户输入的不同于所述译文建议的译文,将所述用户输入的所述译文 作为所述译文结果显示到所述编辑区域,根据所述用户输入的所述译文更新所述译文区域中的参考译文。
第二方面,本公开提供一种视频翻译装置,包括:
转换模块,用于将待翻译视频的语音转换为文本;
展示模块,用于展示所述文本和所述文本的第一时间信息、第二时间信息及参考译文,所述第一时间信息为所述文本在所述视频中的起始时间,所述第二时间信息为所述文本在所述视频中的结束时间;
展示模块,还用于响应于用户对所述文本或所述参考译文的操作,展示编辑区域,所述编辑区域支持所述用户输入译文;
建议模块,用于跟随用户在所述编辑区域的输入,提供来自所述参考译文的译文建议;
所述展示模块,还用于在检测到所述用户针对所述译文建议的确认操作的情况下,将所述译文建议作为译文结果显示到所述编辑区域;以及,在检测到所述用户针对所述译文建议的非确认操作的情况下,接收所述用户输入的不同于所述译文建议的译文,将所述用户输入的所述译文作为所述译文结果显示到所述编辑区域,根据所述用户输入的所述译文更新所述译文区域中的参考译文。
第三方面,本公开提供一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现本公开第一方面中所述方法的步骤。
第四方面,本公开提供一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现本公开第一方面中所述方法的步骤。
基于上述的技术方案,至少可以达到以下技术效果:可以将待翻译视频的语音转换为文本,并提供该文本的第一时间信息、第二时间信息及参考译文,并跟随用户在编辑区域的输入提供来自参考译文的译文建议,并响应于用户的确认操作将译文建议作为译文结果,从而可以节省用户的输入时间,结合人工的准确性及机器的高效性,进而提升视频翻译的效率以及视频翻译的质量。
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。在附图中:
图1是根据一示例性公开实施例示出的一种视频翻译方法的流程图。
图2是根据一示例性公开实施例示出的一种译文界面的示意图。
图3是根据一示例性公开实施例示出的一种文本拆分方式的示意图。
图4是根据一示例性公开实施例示出的一种视频翻译装置的框图。
图5是根据一示例性公开实施例示出的一种电子设备的框图。
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
图1是根据一示例性公开实施例示出的一种视频翻译方法的流程图,本方法可以用于终端、服务器及其他独立电子设备,也可以应用于翻译系统,此种情况下,方法中的各个步骤可以由翻译系统中的多个设备配合完成,例如图1中所示的S12和S14可以由终端执行,S11和S13可以由服务器执行等。如图1所示,所述视频翻译方法包括以下步骤:
S11、将待翻译的视频的语音转换为文本。
可以提取待翻译视频中的语音内容,如音轨等,并通过语音识别技术,将该语音内容转换为文本内容。值得说明的是,在将语音内容转换为文本内容时,可以根据语音内容中的分句将文本内容分为多句,且每句文本内容可以对应一个提取到该分句的语音内容的时刻信息,将其作为该句文本内容的时间轴信息。
例如,待翻译视频的语音内容被识别为了多句话,其中,第一句为“首 先介绍一下什么是热点”,该句话位于视频的第2秒至第5秒之间,则该句文本内容对应的时间轴信息为“00:00:02-00:00:05”,第二局为“从ppt的右边可以看到”,该句话位于视频的第5秒至第7秒之间,则该句文本内容对应的时间轴信息为“00:00:05-00:00:07”。
在将待翻译视频的语音内容转换为文本内容时,可以根据所述文本在所述视频中对应的时刻信息和/或画面帧对所述文本进行分句,得到所述多个分句文本,例如,将每连续多秒内的语音的识别文本作为一个分句,或者将连续多个画面帧内出现的语音的识别文本作为一个分句;还可以根据语音内容中的停顿进行分句,例如,可以设置一个停顿阈值,在该停顿阈值内没有识别到人声内容时,可以在没有识别到人声内容的任意位置进行分句;还可以根据语音内容的语义进行分句,在分句词前后进行分句,例如,可以设置将完整的“主语+谓语+宾语”结构的“宾语”作为分句词,对语音内容进行分句,还可以将时间助词、停顿词等作为分句词,在这些词前后进行分句。具体的,可以通过分句模型对识别到的文本内容进行分句,得到分句后的文本内容。
S12、展示所述文本和所述文本的第一时间信息、第二时间信息及参考译文。
其中,所述第一时间信息为所述文本在所述视频中的起始时间,所述第二时间信息为所述文本在所述视频中的结束时间。
其中,该文本可以是已经分句后的文本,该第一时间信息为分句后的文本的当前分句在视频中的起始时间,该第二时间信息为分句后的文本的当前分局在视频中的结束时间。对每一分句可以展示所述分句文本和所述分句文本的第一时间信息、第二时间信息及参考译文。
S13、响应于用户对所述文本或所述参考译文的操作,展示编辑区域。
在用户选中了文本或参考译文对应的区域后,可以在该文本的参考译文上方展示编辑区域,编辑区域支持用户输入译文,用户可以在编辑区域中进行编辑操作,以获得该文本的翻译结果。其中,该编辑区域可以显示于参考译文上方,以便用户对照修改。
其中,该文本可以是已经分句后的文本,每一分句文本在不同的区域内展示,针对每一分句文本展示该分句文本的第一时间信息、第二时间信息及参考译文。
在一种可能的实施方式中,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,还可以展示提供所述用户对所述分句文本进行拆分的拆分功能栏,并响应于所述用户针对任一所述分句文本的拆分操作,将所述分句文本拆分成至少两句分句子文本,并针对每一所述分句子文本,关联显示所述分句子文本、所述分句子文本的第一时间信息、第二时间信息以及所述分句子文本的参考译文。可选地,该拆分功能栏可以是响应于用户对所述 分句文本或参考译文的操作所提供的,在用户选中该分句文本或参考文本之前可以隐藏该拆分功能栏。
例如,对一段文本内容“今天要为大家介绍的就是我国即将崛起的三座城市”而言,该段文本内容的时间轴信息为“00:00:15-00:00:18”,其中第一时间信息为00:00:15、第二时间信息为00:00:18,用户将其分为了两个子句“今天要为大家介绍的就是”和“我国即将崛起的三座城市”,则可以根据编辑前的文本的长度和编辑后的各个子句的文本的长度,为各个子句设置时间轴,例如,可以将原有的时间轴拆分为长度相同的两段,将第一个子段“今天要为大家介绍的就是”的时间轴设置为“00:00:15-00:00:16”,将第二个子段“我国即将崛起的三座城市”的时间轴信息为“00:00:17-00:00:18”。
还可以根据文本内容的字数,为该段文本内容的每个字分配时间轴,并在进行分句后为分句后的子段分配其对应的字数的时间轴。
如图3示的是一种可能的文本拆分方式的示意图,如图所示,用户可以通过光标选择需要分句的位置,并点击分句按键;分句前的文本会拆分为两个子句,并按照顺序显示,各个子句的第一时间信息和第二时间信息均是由分句前的第一时间信息和第二时间信息拆分得到。图3中,拆分前的虚线框中的一段文本内容,被拆分成了虚线框中的两个子段。
在一种可能的实施方式中,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,还可以展示提供所述用户对所述分句进行合并的合并功能栏,并响应于所述用户针对任意相邻两个分句文本的合并操作,将所述相邻两个分句文本合并成一段新的分句文本,并针对所述新的分句文本,关联显示所述新的分句文本、所述新的分句文本的第一时间信息、第二时间信息以及所述新的分句文本的参考译文。可选地,该合并功能栏可以是响应于用户对所述分句文本或参考译文的操作所提供的,在用户选中该分句文本或参考文本之前可以隐藏该合并功能栏。
在一种可能的实施方式中,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,还可以展示提供所述用户对所述分句文本进行播放的播放功能栏,并响应于所述用户针对所述播放功能栏的操作,播放所述分句文本对应的语音。可选地,该播放功能栏可以是响应于用户对所述分句文本或参考译文的操作所提供的,在用户选中该分句文本或参考文本之前可以隐藏该播放功能栏。并且,在一种可能的实施方式中,可以以参考译文或者译文结果作为字幕播放该分句文本对应的视频,以便用户查看译文字幕的效果。
如图2所示的是一种可能的译文界面的示意图,其中,虚线框内部为一段已被用户选中的文本内容的译文界面,图2中一共有三段不同的文本内容,其中,被用户选中的文本内容会展示编辑区域和播放、合并、拆分功能栏。待翻译视频的文本内容显示在参考译文上方,且不同的分句有不同的展示区域,各个展示区域可以独立翻译,不因其他区域的修改而更新。用户可以在 编辑区域输入字符,或修改待翻译的文本的字符。译文界面还可以包括时间轴信息,包括表征起始时刻的第一时间信息和表征结束时刻的第二时间信息。在图2中,参考译文为灰色字,译文建议为黑色字,如图2所示,在用户选中一段文本内容后,参考译文可以下移一行,与功能栏同行,而原本的参考译文所在的区域成为编辑区域,用于展示译文建议并接收用户的修改。
S14、跟随用户在所述编辑区域的输入,提供来自所述参考译文的译文建议。
其中,基于该译文建议,本公开实施例提供的方法包括在检测到用户针对该译文建议的确认操作的情况下,将该译文建议作为译文结果显示到该编辑区域,以及在检测到用户针对该译文建议的非确认操作的情况下,接收用户输入的不同于该译文建议的译文,并根据用户输入的译文更新该译文区域中的参考译文。
在具体实施时,上述确认操作可以是用户针对预设的快捷键的操作,例如,用户通过点击该快捷键表明采纳译文建议的意图,因此,可以在检测到用户点击快捷键的操作的情况下,将该译文建议作为译文结果显示到该编辑区域。值得说明的是,将译文建议作为译文结果显示到该编辑区域这一动作将作为步骤S14中所述的用户在该编辑区域内的输入,也就是说,在此种情况下,步骤S14即是表明,本公开实施例提供的方法可以响应于将本次提供的译文建议作为译文结果显示到该编辑区域,提供来自该参考译文的下一个译文建议(该下一个译文建议可以是已提供的译文建议在参考译文中的后续译文)。
可选地,上述检测到用户针对该译文建议的非确认操作的情况可以是检测到用户输入的译文与本次提供的译文建议不一致的情况,在此种情况下,本公开实施例提供的方法可以接收用户输入的不同于该译文建议的译文,并根据用户输入的译文更新该译文区域中的参考译文。同样值得说明的是,用户输入的与译文建议不同的译文,将作为步骤S14中所述的用户在该编辑区域内的输入,也就是说,在此种情况下,步骤S14即是表明,本公开实施例提供的方法可以响应于用户在该编辑区域输入与译文建议不同的译文,提供来自根据用户输入的译文更新后的参考译文的下一个译文建议。例如,本次提供的译文建议为“my”,在检测到用户输入的译文为不同与译文建议“my”的译文“I”的情况下,根据译文“I”更新参考译文,并从更新后的参考译文中提供译文“I”的下一个译文建议。
采用上述方法,可以根据用户的输入提供来自参考译文的译文建议,并且,用户可以通过确认操作直接将译文建议作为译文结果,减少用户的输入时间,本公开结合人工的准确性及机器的高效性,可以提升翻译的效率以及翻译的质量。
为了使本领域技术人员更加理解本公开实施例提供的技术方案,下面对 本公开实施例提供的文档翻译方法进行详细说明。
可选地,步骤S14中的所述提供译文建议可以包括:在所述译文区域突出显示来自所述参考译文的所述译文建议。在此种情况下,在检测到用户针对该译文建议的确认操作的情况下,可以取消所述译文建议在所述译文区域的突出显示。该突出显示可以是加粗字体、高亮字体、异色字、异色背景、底纹效果等可以将译文建议突出展示的显示方式。
在一种可能的实施方式中,该突出显示可以是与已输入译文的显示方式不同的显示方式,例如,已输入的译文可以是加粗字体,译文建议是正常字体,或者,已输入的译文可以是黑色字,译文建议是灰色字等。在检测到用户针对译文建议的确认操作的情况下,可以将译文建议的显示方式调整为与已输入译文的显示方式相同。例如,已输入的译文可以是加粗字体,译文建议是正常字体,在检测到用户的确认操作的情况下,将该译文建议调整为加粗字体显示。
在一种可能的实施方式中,该确认操作可以是用户对电子设备的快捷键的输入操作,例如,该电子设备可以是手机,该快捷键可以是手机显示区域上的虚拟键或手机的实体键(例如:音量键),用户可以对上述的快捷键进行操作以采纳该译文建议,则在检测到用户对上述快捷键的输入操作的情况下,可以将译文建议作为疑问结果显示到编辑区域;该电子设备还可以是电脑,该快捷键可以是电脑键盘或鼠标上的指定或自定义按键(例如:键盘alt键、鼠标侧键等)。
该确认操作还可以是由摄像头获取后识别得到的姿势确认操作,如点头、眨眼、作出预设手势等;还可以是由麦克风获取后识别得到的语音操作。
在一种可能的实施方式中,该来自所述参考译文的译文建议包括词、词组、句子中的至少一项。
下面对译文建议的提供方式进行详细阐述:
在用户针对文本内容进行翻译时,可以参考在译文区域中显示的参考译文,在编辑区域内进行输入(值得说明的是,此处的输入包括字符的输入,例如键入字母、单词等,也包括按键操作输入,例如点击编辑区域等),可以提供来自参考译文的译文建议。
其中,该译文建议可以是针对分句的整句的译文建议,也可以是逐词、逐短语提供的更细粒度的译文建议。
例如,文本为“有些城市凭借着完善的高铁网络这一优势不断崛起”,参考译文为“Some cities continue to rise with the advantage of the perfect high-speed rail network”,则在用户点击编辑区域,或者在编辑区域输入了字符“S”之后,可以提供来自参考译文的译文建议“Some”(或者“Some cities continue to rise”等更粗粒度的译文建议)。
用户可以通过确认操作采纳该译文建议,并且,将该确认操作作为在编 辑区域内的输入操作,继续提供来自参考译文的译文建议,例如,在检测到用户针对“Some”的确认操作的情况下,将“Some”作为译文结果显示到编辑区域,并为用户提供下一译文建议“cities”。
在检测到用户针对译文建议的非确认操作的情况下,接收用户输入的不同于译文建议的译文,并根据用户输入的译文更新译文区域中的参考译文。其中,该非确认操作可以是进行了预设的代表非确认的操作(点击预设按键、做出预设动作等),也可以是指除了前述的确认操作以外的其他情况,例如,在预设时间内没有进行确认操作,或者进行了继续输入的操作。
例如,文本内容“有些城市凭借着完善的高铁网络这一优势不断崛起”的参考译文为“Some cities continue to rise with the advantage of the perfect high-speed rail network.”,在接收到用户对编辑区域的点击输入操作后,提供来自参考译文的译文建议“Some”,并基于用户的确认操作,将译文建议“Some”作为译文结果显示到编辑区域,并继续为用户提供下一译文建议“cities”。在提供译文建议“with”时,接收到了用户不同于译文建议的输入“b”,则可以基于用户输入的译文,更新参考译文为“Some cities continue to rise because of the advantage of the perfect high-speed rail network.”,并为用户提供译文建议“because”。
值得说明的是,当译文建议为该分句的整句参考译文时,用户可以直接在编辑区域中对译文建议进行编辑,例如,在译文建议中插入单词、删除译文建议中的单词、更改译文建议中的单词等。
例如,文本内容“有些城市凭借着完善的高铁网络这一优势不断崛起”的译文建议与参考译文相同,为“Some cities continue to rise with the advantage of the perfect high-speed rail network.”,用户可以直接在译文建议中将“with”修改为“because of”,并根据用户的修改将参考译文更新为“ome cities continue to rise because of the advantage of the perfect high-speed rail network.”,并向用户提供来自该参考译文的译文建议,用户可以通过确认操作将该译文建议作为译文结果。
其中,参考译文以及译文建议可以由机器翻译(例如深度学习翻译模型等)提供。值得说明的是,当基于所述用户在编辑区域输入的译文无法生成符合文本内容的参考译文时,可以基于预存的词典内容对用户输入的译文字符进行纠错,并根据纠错后的译文更新该参考译文。
值得说明的是,本公开中虽然以翻译语言为英文、原文为中文的案例进行举例,但是,本公开不对翻译的语言以及原文的语言进行限制,本公开中的原文还可以为中文文言文、译文可以为中文白话文,或者原文为日文、译文为英文等各种组合。
在一种可能的实施方式中,原文显示区域为可编辑区域,响应于用户对所述原文显示区域中的文本内容进行修改的操作,可以更新所述译文区域内 的参考译文。
在用户在译文区域输入译文之前或之后,用户都可以对文本的内容,即翻译原文进行编辑,并且,已经输入的译文不会因为原文的修改而被覆盖,而是会根据用户修改后的文本内容及输入的译文字符更新翻译结果。
例如,编辑前的文本内容为“有些城市凭借着完善的高通网路这一优势不断崛起”,对应的译文建议为“Some cities continue to rise with the advantage of a perfect Qualcomm network.”,用户在编辑区域输入的译文结果为“Some cities continue to rise b”,其中,不同于译文建议的译文为“b”,则可以更新参考译文为“Some cities continue to rise because of the advantage of a perfect Qualcomm network.”。但是,该句文本内容可能是由于杂音、语音讲述人的口音等因素导致的误识别文本,用户发现其原本的文本应当是“有些城市凭借着完善的高铁网络这一优势不断崛起。”,则用户可以将文本内容中的“高通网路”编辑为“高铁网络”,则更新后的参考译文变为“Some cities continue to rise because of the advantage of the perfect high-speed rail network.”,并为用户提供来自更新后的参考译文的译文建议。
在一种可能的实施方式中,在编辑后的文本内容的长度大于编辑前的文本内容的长度,根据编辑前的文本内容的时间轴信息,通过插值处理得到编辑后的文本内容的时间轴信息。
例如,编辑前的文本内容为“今天要为大家介绍的就是我国的三座城市”,编辑后的文本内容为“今天要为大家介绍的就是我国即将崛起的三座城市”,则编辑后的文本内容中,各个文字的时间轴信息都会被重置为原来的9/11,并且在后续用户进行分句、合并等操作时,基于各个文字的时间轴信息确定分句或合并后的子段的时间轴信息。
在一种可能的实施方式中,可以基于所述第一时间信息和第二时间信息,将所述翻译结果作为字幕添加至所述待翻译视频的画面帧中。
例如,待翻译视频的第一句翻译结果的时间轴为“00:00:00-00:00:02”(第一时间信息为00:00:00,第二时间信息为00:00:02),第二段翻译结果的时间轴为“00:00:03-00:00:07”(第一时间信息为00:00:03,第二时间信息为00:00:07),则可以在待翻译视频的第0秒至第2秒间,插入时间轴为“00:00:00-00:00:02”的翻译结果,在待翻译视频的第3秒至第7秒间,插入时间轴为“00:00:03-00:00:07”的翻译结果,该翻译结果可以以字幕的形式插入待翻译视频中。
在所有的翻译结果均插入待翻译视频后,可以将翻译完成的视频以用户指定的格式生成,并提供给用户进行下载。
基于上述的技术方案,至少可以达到以下技术效果:可以将待翻译视频的语音转换为文本,并提供该文本的第一时间信息、第二时间信息及参考译文,并跟随用户在编辑区域的输入提供来自参考译文的译文建议,并响应于 用户的确认操作将译文建议作为译文结果,从而可以节省用户的输入时间,结合人工的准确性及机器的高效性,进而提升视频翻译的效率以及视频翻译的质量。
图4是根据一示例性公开实施例示出的一种视频翻译装置的框图。如图4所示,所述视频翻译装置400包括:
转换模块410,用于将待翻译的视频的语音转换为文本。
展示模块420,用于展示所述文本和所述文本的第一时间信息、第二时间信息及参考译文,所述第一时间信息为所述文本在所述视频中的起始时间,所述第二时间信息为所述文本在所述视频中的结束时间。
展示模块420,还用于响应于用户对所述文本或所述参考译文的操作,展示编辑区域,所述编辑区域支持所述用户输入译文。
建议模块430,用于跟随用户在所述编辑区域的输入,提供来自所述参考译文的译文建议;
所述展示模块420,还用于在检测到所述用户针对所述译文建议的确认操作的情况下,将所述译文建议作为译文结果显示到所述编辑区域;以及,在检测到所述用户针对所述译文建议的非确认操作的情况下,接收所述用户输入的不同于所述译文建议的译文,将所述用户输入的所述译文作为所述译文结果显示到所述编辑区域,根据用户输入的所述译文更新所述译文区域中的参考译文。可选地,所述展示模块420还用于根据所述文本在所述视频中对应的时刻信息和/或画面帧对所述文本进行分句,得到所述多个分句文本;针对每一所述分句文本,展示所述分句文本和所述分句文本的第一时间信息、第二时间信息及参考译文。
可选地,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,所述装置还包括拆分模块,用于展示拆分功能栏,所述拆分功能栏支持所述用户对所述分句文本进行拆分;响应于所述用户针对任一所述分句文本的拆分操作,将所述分句文本拆分成至少两句分句子文本,并针对每一所述分句子文本,关联显示所述分句子文本、所述分句子文本的第一时间信息、第二时间信息以及所述分句子文本的参考译文。
可选地,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,所述装置还包括合并模块,用于展示合并功能栏,所述合并功能栏支持所述用户对所述分句文本进行合并;响应于所述用户针对任意相邻两个分句文本的合并操作,将所述相邻两个分句文本合并成一段新的分句文本,并针对所述新的分句文本,关联显示所述新的分句文本、所述新的分句文本的第一时间信息、第二时间信息以及所述新的分句文本的参考译文。
可选地,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,所述装置还包括播放模块,用于展示播放功能栏,所述播放功能栏支持 所述用户对所述分句文本对应的语音进行播放;响应于所述用户针对所述播放功能栏的操作,播放所述分句文本对应的语音。
可选地,所述建议模块430用于在所述编辑区域以不同于已输入译文的显示方式来显示所述译文建议;所述响应于所述用户对所述译文建议的确认操作,将所述译文建议作为译文结果显示到所述编辑区域包括:响应于所述用户对所述译文建议的确认操作,以与已输入译文的显示方式相同的方式来在所述编辑区域内显示作为译文结果的所述译文建议。
可选地,所述建议模块430,还用于响应于用户对快捷键的触发操作,将所述译文建议作为译文结果显示到所述编辑区域。
上述各模块的功能在上一实施例中的方法步骤中已详细阐述,在此不做赘述。
基于上述的技术方案,至少可以达到以下技术效果:可以将待翻译视频的语音转换为文本,并提供该文本的第一时间信息、第二时间信息及参考译文,并跟随用户在编辑区域的输入提供来自参考译文的译文建议,并响应于用户的确认操作将译文建议作为译文结果,从而可以节省用户的输入时间,结合人工的准确性及机器的高效性,进而提升视频翻译的效率以及视频翻译的质量。
下面参考图5,其示出了适于用来实现本公开实施例的电子设备(例如图1中的终端设备或服务器)500的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(personal digital assistant,PDA)、平板电脑(portable android device,PAD)、便携式多媒体播放器(portable media player,PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字电视(digital television,DTV)、台式计算机等等的固定终端。图5示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图5所示,电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501,其可以根据存储在只读存储器(read-only memory,ROM)502中的程序或者从存储装置508加载到随机访问存储器(random access memory,RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中,还存储有电子设备500操作所需的各种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。
通常,以下装置可以连接至I/O接口505:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506;包括例如液晶显示器(liquid crystal display,LCD)、扬声器、振动器等的输出装置507;包括例如磁带、硬盘等的存储装置508;以及通信装置509。通信装置 509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有各种装置的电子设备500,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置509从网络上被下载和安装,或者从存储装置508被安装,或者从ROM 502被安装。在该计算机程序被处理装置501执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(erasable programmable read-only memory,EPROM)或闪存、光纤、便携式紧凑磁盘只读存储器(compact disc read-only memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、射频(radio frequency,RF)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如超文本传输协议(hypertext transfer protocol,HTTP)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(local area network,LAN),广域网(wide area network,WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取至少两个网际协议地址;向节点评价设备发送包括所述至少两个网际协议地址的节点评价请求,其中,所述节点评价设备从所述至少两个网际协议地址中,选取网际协议地址并返回;接收所述节点评价设备返回的网际协议地址;其中,所获取的网际协议地址指示内容分发网络中的边缘节点。
或者,上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:接收包括至少两个网际协议地址的节点评价请求;从所述至少两个网际协议地址中,选取网际协议地址;返回选取出的网际协议地址;其中,接收到的网际协议地址指示内容分发网络中的边缘节点。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定,例如,第一获取模块还可以被描述为“获取至少两个网际协议地址的模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(fieldprogrammable gate array,FPGA)、专用集成电路(application specific integrated circuit,ASIC)、专用标准产品(application specific standard parts,ASSP)、片上系统(system on a chip,SOC)、复杂可编程逻辑设备(complex programmable logic device,CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,示例1提供了一种视频翻译方法,包括将待翻译的视频的语音转换为文本;展示所述文本和所述文本的第一时间信息、第二时间信息及参考译文,所述第一时间信息为所述文本在所述视频中的起始时间,所述第二时间信息为所述文本在所述视频中的结束时间;响应于用户对所述文本或所述参考译文的操作,展示编辑区域,所述编辑区域支持所述用户输入译文;跟随所述用户在所述编辑区域的输入,提供来自所述参考译文的译文建议;其中,在检测到所述用户针对所述译文建议的确认操作的情况下,将所述译文建议作为译文结果显示到所述编辑区域;以及,在检测到所述用户针对所述译文建议的非确认操作的情况下,接收所述用户输入的不同于所述译文建议的译文,将所述用户输入的所述译文作为所述译文结果显示到所述编辑区域,根据所述用户输入的所述译文更新所述译文区域中的参考译文。根据本公开的一个或多个实施例,示例2提供了示例1的方法,根据所述文本在所述视频中对应的时刻信息和/或画面帧对所述文本进行分句,得到所述多个分句文本;针对每一所述分句文本,展示所述分句文本和所述分句文本的第一时间信息、第二时间信息及参考译文。
根据本公开的一个或多个实施例,示例3提供了示例1的方法,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,所述方法还包括:展示拆分功能栏,所述拆分功能栏支持所述用户对所述分句文本进行拆分;响应于所述用户针对任一所述分句文本的拆分操作,将所述分句文本拆分成至少两句分句子文本,并针对每一所述分句子文本,关联显示所述分句子文本、所述分句子文本的第一时间信息、第二时间信息以及所述分句子文本的参考译文。
根据本公开的一个或多个实施例,示例4提供了示例1的方法,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,所述方法还包括: 展示合并功能栏,所述合并功能栏支持所述用户对所述分句文本进行合并;响应于所述用户针对任意相邻两个分句文本的合并操作,将所述相邻两个分句文本合并成一段新的分句文本,并针对所述新的分句文本,关联显示所述新的分句文本、所述新的分句文本的第一时间信息、第二时间信息以及所述新的分句文本的参考译文。
根据本公开的一个或多个实施例,示例5提供了示例1-4的方法,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,所述方法还包括:展示播放功能栏,所述播放功能栏支持所述用户对所述分句文本对应的语音进行播放;响应于所述用户针对所述播放功能栏的操作,播放所述分句文本对应的语音。
根据本公开的一个或多个实施例,示例6提供了示例1-4的方法,所述提供来自所述参考译文的译文建议包括:在所述编辑区域以不同于已输入译文的显示方式来显示所述译文建议;所述响应于所述用户对所述译文建议的确认操作,将所述译文建议作为译文结果显示到所述编辑区域包括:响应于所述用户对所述译文建议的确认操作,以与已输入译文的显示方式相同的方式来在所述编辑区域内显示作为译文结果的所述译文建议。
根据本公开的一个或多个实施例,示例7提供了示例1-4的方法,所述响应于所述用户对所述译文建议的确认操作,将所述译文建议作为译文结果显示到所述编辑区域,包括:响应于用户对快捷键的输入操作,将所述译文建议作为译文结果显示到所述编辑区域。
根据本公开的一个或多个实施例,示例8提供了一种视频翻译装置,转换模块,用于将待翻译视频的语音转换为文本;展示模块,用于展示所述文本和所述文本的第一时间信息、第二时间信息及参考译文,所述第一时间信息为所述文本在所述视频中的起始时间,所述第二时间信息为所述文本在所述视频中的结束时间;展示模块,还用于响应于用户对所述文本或所述参考译文的操作,展示编辑区域,所述编辑区域支持所述用户输入译文;建议模块,用于跟随用户在所述编辑区域的输入,提供来自所述参考译文的译文建议;所述展示模块,还用于在检测到所述用户针对所述译文建议的确认操作的情况下,将所述译文建议作为译文结果显示到所述编辑区域;以及,在检测到所述用户针对所述译文建议的非确认操作的情况下,接收所述用户输入的不同于所述译文建议的译文,将所述用户输入的所述译文作为所述译文结果显示到所述编辑区域,根据所述用户输入的所述译文更新所述译文区域中的参考译文。
根据本公开的一个或多个实施例,示例9提供了示例8的装置,所述展示模块还用于根据所述文本在所述视频中对应的时刻信息和/或画面帧对所述文本进行分句,得到所述多个分句文本;针对每一所述分句文本,展示所述分句文本和所述分句文本的第一时间信息、第二时间信息及参考译文。
根据本公开的一个或多个实施例,示例10提供了示例8的装置,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,所述装置还包括拆分模块,用于展示拆分功能栏,所述拆分功能栏支持所述用户对所述分句文本进行拆分;响应于所述用户针对任一所述分句文本的拆分操作,将所述分句文本拆分成至少两句分句子文本,并针对每一所述分句子文本,关联显示所述分句子文本、所述分句子文本的第一时间信息、第二时间信息以及所述分句子文本的参考译文。
根据本公开的一个或多个实施例,示例11提供了示例8的装置,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,所述装置还包括合并模块,用于展示合并功能栏,所述合并功能栏支持所述用户对所述分句文本进行合并;响应于所述用户针对任意相邻两个分句文本的合并操作,将所述相邻两个分句文本合并成一段新的分句文本,并针对所述新的分句文本,关联显示所述新的分句文本、所述新的分句文本的第一时间信息、第二时间信息以及所述新的分句文本的参考译文。
根据本公开的一个或多个实施例,示例12提供了示例8-11的装置,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,所述装置还包括播放模块,用于展示播放功能栏,所述播放功能栏支持所述用户对所述分句文本对应的语音进行播放;响应于所述用户针对播放功能栏的操作,播放所述分句文本对应的语音。
根据本公开的一个或多个实施例,示例13提供了示例8-11的装置,所述建议模块用于在所述编辑区域以不同于已输入译文的显示方式来显示所述译文建议;所述响应于所述用户对所述译文建议的确认操作,将所述译文建议作为译文结果显示到所述编辑区域包括:响应于所述用户对所述译文建议的确认操作,以与已输入译文的显示方式相同的方式来在所述编辑区域内显示作为译文结果的所述译文建议。
根据本公开的一个或多个实施例,示例14提供了示例8-11的装置,所述建议模块,还用于响应于用户对快捷键的触发操作,将所述译文建议作为译文结果显示到所述编辑区域
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现 细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
Claims (10)
- 一种视频翻译方法,其特征在于,所述方法包括:将待翻译的视频的语音转换为文本;展示所述文本和所述文本的第一时间信息、第二时间信息及参考译文,所述第一时间信息为所述文本在所述视频中的起始时间,所述第二时间信息为所述文本在所述视频中的结束时间;响应于用户对所述文本或所述参考译文的操作,展示编辑区域,所述编辑区域支持所述用户输入译文;跟随所述用户在所述编辑区域的输入,提供来自所述参考译文的译文建议;其中,在检测到所述用户针对所述译文建议的确认操作的情况下,将所述译文建议作为译文结果显示到所述编辑区域;以及,在检测到所述用户针对所述译文建议的非确认操作的情况下,接收所述用户输入的不同于所述译文建议的译文,将所述用户输入的所述译文作为所述译文结果显示到所述编辑区域,根据所述用户输入的所述译文更新所述译文区域中的参考译文。
- 根据权利要求1所述的方法,其特征在于,所述展示所述文本和所述文本的第一时间信息、第二时间信息及参考译文,包括:根据所述文本在所述视频中对应的时刻信息和/或画面帧对所述文本进行分句,得到所述多个分句文本;针对每一所述分句文本,展示所述分句文本和所述分句文本的第一时间信息、第二时间信息及参考译文。
- 根据权利要求1所述的方法,其特征在于,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,所述方法还包括:展示拆分功能栏,所述拆分功能栏支持所述用户对所述分句文本进行拆分;响应于所述用户针对任一所述分句文本的拆分操作,将所述分句文本拆分成至少两句分句子文本,并针对每一所述分句子文本,关联显示所述分句子文本、所述分句子文本的第一时间信息、第二时间信息以及所述分句子文本的参考译文。
- 根据权利要求1所述的方法,其特征在于,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,所述方法还包括:展示合并功能栏,所述合并功能栏支持所述用户对所述分句文本进行合并;响应于所述用户针对任意相邻两个分句文本的合并操作,将所述相邻两个分句文本合并成一段新的分句文本,并针对所述新的分句文本,关联显示所述新的分句文本、所述新的分句文本的第一时间信息、第二时间信息以及所述新的分句文本的参考译文。
- 根据权利要求1-4任一项所述的方法,其特征在于,所述文本包括多个分句文本,每一所述分句文本在不同区域展示,所述方法还包括:展示播放功能栏,所述播放功能栏支持所述用户对所述分句文本对应的语音进行播放;响应于所述用户针对所述播放功能栏的操作,播放所述分句文本对应的语音。
- 根据权利要求1-4任一项所述的方法,其特征在于,所述提供来自所述参考译文的译文建议包括:在所述编辑区域以不同于已输入译文的显示方式来显示所述译文建议;所述响应于所述用户对所述译文建议的确认操作,将所述译文建议作为译文结果显示到所述编辑区域包括:响应于所述用户对所述译文建议的确认操作,以与已输入译文的显示方式相同的方式来在所述编辑区域内显示作为译文结果的所述译文建议。
- 根据权利要求1-4任一项所述的方法,其特征在于,所述响应于所述用户对所述译文建议的确认操作,将所述译文建议作为译文结果显示到所述编辑区域,包括:响应于用户对快捷键的触发操作,将所述译文建议作为译文结果显示到所述编辑区域。
- 一种视频翻译装置,其特征在于,所述装置包括:转换模块,用于将待翻译视频的语音转换为文本;展示模块,用于展示所述文本和所述文本的第一时间信息、第二时间信息及参考译文,所述第一时间信息为所述文本在所述视频中的起始时间,所述第二时间信息为所述文本在所述视频中的结束时间;展示模块,还用于响应于用户对所述文本或所述参考译文的操作,展示编辑区域,所述编辑区域支持所述用户输入译文;建议模块,用于跟随用户在所述编辑区域的输入,提供来自所述参考译文的译文建议;所述展示模块,还用于在检测到所述用户针对所述译文建议的确认操作的情况下,将所述译文建议作为译文结果显示到所述编辑区域;以及,在检测到所述用户针对所述译文建议的非确认操作的情况下,接收所述用户输入的不同于所述译文建议的译文,将所述用户输入的所述译文作为所述译文结果显示到所述编辑区域,根据所述用户输入的所述译文更新所述译文区域中的参考译文。
- 一种计算机可读介质,其上存储有计算机程序,其特征在于,该程序被处理装置执行时实现权利要求1-7中任一项所述方法的步骤。
- 一种电子设备,其特征在于,包括:存储装置,其上存储有计算机程序;处理装置,用于执行所述存储装置中的所述计算机程序,以实现权利要求1-7中任一项所述方法的步骤。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022564506A JP7548602B2 (ja) | 2020-06-23 | 2021-06-22 | ビデオ翻訳方法、装置、記憶媒体及び電子機器 |
EP21830302.2A EP4170543A4 (en) | 2020-06-23 | 2021-06-22 | VIDEO TRANSLATION METHOD AND APPARATUS, STORAGE MEDIUM AND ELECTRONIC DEVICE |
KR1020227030540A KR20220127361A (ko) | 2020-06-23 | 2021-06-22 | 비디오 번역 방법 및 장치, 저장 매체 및 전자 디바이스 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010583177.4A CN111753558B (zh) | 2020-06-23 | 2020-06-23 | 视频翻译方法和装置、存储介质和电子设备 |
CN202010583177.4 | 2020-06-23 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/818,969 Continuation US11763103B2 (en) | 2020-06-23 | 2022-08-10 | Video translation method and apparatus, storage medium, and electronic device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021259221A1 true WO2021259221A1 (zh) | 2021-12-30 |
Family
ID=72676904
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/101388 WO2021259221A1 (zh) | 2020-06-23 | 2021-06-22 | 视频翻译方法和装置、存储介质和电子设备 |
Country Status (6)
Country | Link |
---|---|
US (1) | US11763103B2 (zh) |
EP (1) | EP4170543A4 (zh) |
JP (1) | JP7548602B2 (zh) |
KR (1) | KR20220127361A (zh) |
CN (1) | CN111753558B (zh) |
WO (1) | WO2021259221A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114596882A (zh) * | 2022-03-09 | 2022-06-07 | 云学堂信息科技(江苏)有限公司 | 一种可实现对课程内容快速定位的剪辑方法 |
WO2023212920A1 (zh) * | 2022-05-06 | 2023-11-09 | 湖南师范大学 | 一种基于自建模板的多模态快速转写及标注系统 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111753558B (zh) * | 2020-06-23 | 2022-03-04 | 北京字节跳动网络技术有限公司 | 视频翻译方法和装置、存储介质和电子设备 |
KR20230124420A (ko) * | 2022-02-18 | 2023-08-25 | 에이아이링고 주식회사 | 번역된 콘텐츠의 편집 인터페이스 제공 방법 및 컴퓨터 프로그램 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150042771A1 (en) * | 2013-08-07 | 2015-02-12 | United Video Properties, Inc. | Methods and systems for presenting supplemental content in media assets |
CN105828101A (zh) * | 2016-03-29 | 2016-08-03 | 北京小米移动软件有限公司 | 生成字幕文件的方法及装置 |
CN107885729A (zh) * | 2017-09-25 | 2018-04-06 | 沈阳航空航天大学 | 基于双语片段的交互式机器翻译方法 |
CN111753558A (zh) * | 2020-06-23 | 2020-10-09 | 北京字节跳动网络技术有限公司 | 视频翻译方法和装置、存储介质和电子设备 |
Family Cites Families (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6549911B2 (en) * | 1998-11-02 | 2003-04-15 | Survivors Of The Shoah Visual History Foundation | Method and apparatus for cataloguing multimedia data |
US6782384B2 (en) * | 2000-10-04 | 2004-08-24 | Idiom Merger Sub, Inc. | Method of and system for splitting and/or merging content to facilitate content processing |
US7035804B2 (en) * | 2001-04-26 | 2006-04-25 | Stenograph, L.L.C. | Systems and methods for automated audio transcription, translation, and transfer |
JP2005129971A (ja) | 2002-01-28 | 2005-05-19 | Telecommunication Advancement Organization Of Japan | 半自動型字幕番組制作システム |
US7111044B2 (en) * | 2002-07-17 | 2006-09-19 | Fastmobile, Inc. | Method and system for displaying group chat sessions on wireless mobile terminals |
JP3999771B2 (ja) | 2004-07-06 | 2007-10-31 | 株式会社東芝 | 翻訳支援プログラム、翻訳支援装置、翻訳支援方法 |
JP2006166407A (ja) | 2004-11-09 | 2006-06-22 | Canon Inc | 撮像装置及びその制御方法 |
JP2007035056A (ja) | 2006-08-29 | 2007-02-08 | Ebook Initiative Japan Co Ltd | 翻訳情報生成装置、翻訳情報生成方法並びにコンピュータプログラム |
JPWO2009038209A1 (ja) * | 2007-09-20 | 2011-01-13 | 日本電気株式会社 | 機械翻訳システム、機械翻訳方法及び機械翻訳プログラム |
JP2010074482A (ja) | 2008-09-18 | 2010-04-02 | Toshiba Corp | 外国語放送編集システム、翻訳サーバおよび翻訳支援方法 |
US8843359B2 (en) * | 2009-02-27 | 2014-09-23 | Andrew Nelthropp Lauder | Language translation employing a combination of machine and human translations |
US20100332214A1 (en) * | 2009-06-30 | 2010-12-30 | Shpalter Shahar | System and method for network transmision of subtitles |
US20110246172A1 (en) * | 2010-03-30 | 2011-10-06 | Polycom, Inc. | Method and System for Adding Translation in a Videoconference |
GB2502944A (en) * | 2012-03-30 | 2013-12-18 | Jpal Ltd | Segmentation and transcription of speech |
EP2946279B1 (en) * | 2013-01-15 | 2019-10-16 | Viki, Inc. | System and method for captioning media |
US9183198B2 (en) * | 2013-03-19 | 2015-11-10 | International Business Machines Corporation | Customizable and low-latency interactive computer-aided translation |
CN103226947B (zh) * | 2013-03-27 | 2016-08-17 | 广东欧珀移动通信有限公司 | 一种基于移动终端的音频处理方法及装置 |
US9946712B2 (en) * | 2013-06-13 | 2018-04-17 | Google Llc | Techniques for user identification of and translation of media |
JP6327848B2 (ja) * | 2013-12-20 | 2018-05-23 | 株式会社東芝 | コミュニケーション支援装置、コミュニケーション支援方法およびプログラム |
US10169313B2 (en) * | 2014-12-04 | 2019-01-01 | Sap Se | In-context editing of text for elements of a graphical user interface |
US9772816B1 (en) * | 2014-12-22 | 2017-09-26 | Google Inc. | Transcription and tagging system |
CN104731776B (zh) * | 2015-03-27 | 2017-12-26 | 百度在线网络技术(北京)有限公司 | 翻译信息的提供方法及系统 |
JP6470097B2 (ja) * | 2015-04-22 | 2019-02-13 | 株式会社東芝 | 通訳装置、方法およびプログラム |
JP6471074B2 (ja) * | 2015-09-30 | 2019-02-13 | 株式会社東芝 | 機械翻訳装置、方法及びプログラム |
US9558182B1 (en) * | 2016-01-08 | 2017-01-31 | International Business Machines Corporation | Smart terminology marker system for a language translation system |
KR102495517B1 (ko) * | 2016-01-26 | 2023-02-03 | 삼성전자 주식회사 | 전자 장치, 전자 장치의 음성 인식 방법 |
JP2017151768A (ja) * | 2016-02-25 | 2017-08-31 | 富士ゼロックス株式会社 | 翻訳プログラム及び情報処理装置 |
CN108664201B (zh) | 2017-03-29 | 2021-12-28 | 北京搜狗科技发展有限公司 | 一种文本编辑方法、装置及电子设备 |
CN107943797A (zh) * | 2017-11-22 | 2018-04-20 | 语联网(武汉)信息技术有限公司 | 一种全原文参考的在线翻译系统 |
CN108259965B (zh) * | 2018-03-31 | 2020-05-12 | 湖南广播电视台广播传媒中心 | 一种视频剪辑方法和剪辑系统 |
KR102085908B1 (ko) | 2018-05-10 | 2020-03-09 | 네이버 주식회사 | 컨텐츠 제공 서버, 컨텐츠 제공 단말 및 컨텐츠 제공 방법 |
KR102382477B1 (ko) | 2018-08-29 | 2022-04-04 | 주식회사 아이팩토리 | 특허 문서 작성 장치, 방법, 컴퓨터 프로그램, 컴퓨터로 판독 가능한 기록매체, 서버 및 시스템 |
US11636273B2 (en) * | 2019-06-14 | 2023-04-25 | Netflix, Inc. | Machine-assisted translation for subtitle localization |
CN110489763B (zh) * | 2019-07-18 | 2023-03-10 | 深圳市轱辘车联数据技术有限公司 | 一种视频翻译方法及装置 |
US11301644B2 (en) * | 2019-12-03 | 2022-04-12 | Trint Limited | Generating and editing media |
US11580312B2 (en) * | 2020-03-16 | 2023-02-14 | Servicenow, Inc. | Machine translation of chat sessions |
US11545156B2 (en) * | 2020-05-27 | 2023-01-03 | Microsoft Technology Licensing, Llc | Automated meeting minutes generation service |
-
2020
- 2020-06-23 CN CN202010583177.4A patent/CN111753558B/zh active Active
-
2021
- 2021-06-22 WO PCT/CN2021/101388 patent/WO2021259221A1/zh unknown
- 2021-06-22 EP EP21830302.2A patent/EP4170543A4/en active Pending
- 2021-06-22 KR KR1020227030540A patent/KR20220127361A/ko unknown
- 2021-06-22 JP JP2022564506A patent/JP7548602B2/ja active Active
-
2022
- 2022-08-10 US US17/818,969 patent/US11763103B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150042771A1 (en) * | 2013-08-07 | 2015-02-12 | United Video Properties, Inc. | Methods and systems for presenting supplemental content in media assets |
CN105828101A (zh) * | 2016-03-29 | 2016-08-03 | 北京小米移动软件有限公司 | 生成字幕文件的方法及装置 |
CN107885729A (zh) * | 2017-09-25 | 2018-04-06 | 沈阳航空航天大学 | 基于双语片段的交互式机器翻译方法 |
CN111753558A (zh) * | 2020-06-23 | 2020-10-09 | 北京字节跳动网络技术有限公司 | 视频翻译方法和装置、存储介质和电子设备 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4170543A4 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114596882A (zh) * | 2022-03-09 | 2022-06-07 | 云学堂信息科技(江苏)有限公司 | 一种可实现对课程内容快速定位的剪辑方法 |
CN114596882B (zh) * | 2022-03-09 | 2024-02-02 | 云学堂信息科技(江苏)有限公司 | 一种可实现对课程内容快速定位的剪辑方法 |
WO2023212920A1 (zh) * | 2022-05-06 | 2023-11-09 | 湖南师范大学 | 一种基于自建模板的多模态快速转写及标注系统 |
Also Published As
Publication number | Publication date |
---|---|
JP2023522469A (ja) | 2023-05-30 |
US11763103B2 (en) | 2023-09-19 |
US20220383000A1 (en) | 2022-12-01 |
JP7548602B2 (ja) | 2024-09-10 |
CN111753558A (zh) | 2020-10-09 |
CN111753558B (zh) | 2022-03-04 |
EP4170543A4 (en) | 2023-10-25 |
KR20220127361A (ko) | 2022-09-19 |
EP4170543A1 (en) | 2023-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021259221A1 (zh) | 视频翻译方法和装置、存储介质和电子设备 | |
US11917344B2 (en) | Interactive information processing method, device and medium | |
WO2021259061A1 (zh) | 文档翻译方法和装置、存储介质和电子设备 | |
CN111898388B (zh) | 视频字幕翻译编辑方法、装置、电子设备及存储介质 | |
US11954455B2 (en) | Method for translating words in a picture, electronic device, and storage medium | |
CN111970577A (zh) | 字幕编辑方法、装置和电子设备 | |
WO2023029904A1 (zh) | 文本内容匹配方法、装置、电子设备及存储介质 | |
CN113010698B (zh) | 多媒体的交互方法、信息交互方法、装置、设备及介质 | |
CN113778419B (zh) | 多媒体数据的生成方法、装置、可读介质及电子设备 | |
WO2022105760A1 (zh) | 一种多媒体浏览方法、装置、设备及介质 | |
JP7548678B2 (ja) | オーディオとテキストとの同期方法、装置、読取可能な媒体及び電子機器 | |
CN111860000A (zh) | 文本翻译编辑方法、装置、电子设备及存储介质 | |
CN108491178B (zh) | 信息浏览方法、浏览器和服务器 | |
CN112163433B (zh) | 关键词汇的匹配方法、装置、电子设备及存储介质 | |
WO2022068494A1 (zh) | 搜索目标内容的方法、装置、电子设备及存储介质 | |
CN113132789B (zh) | 一种多媒体的交互方法、装置、设备及介质 | |
US20230140442A1 (en) | Method for searching target content, and electronic device and storage medium | |
WO2021161908A1 (ja) | 情報処理装置及び情報処理方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21830302 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022564506 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2021830302 Country of ref document: EP Effective date: 20230123 |