WO2024056022A1 - 字幕处理方法及装置 - Google Patents

字幕处理方法及装置 Download PDF

Info

Publication number
WO2024056022A1
WO2024056022A1 PCT/CN2023/118772 CN2023118772W WO2024056022A1 WO 2024056022 A1 WO2024056022 A1 WO 2024056022A1 CN 2023118772 W CN2023118772 W CN 2023118772W WO 2024056022 A1 WO2024056022 A1 WO 2024056022A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
subtitle
audio
text element
multimedia material
Prior art date
Application number
PCT/CN2023/118772
Other languages
English (en)
French (fr)
Inventor
黄雪航
黄展鹏
俞志云
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Priority to EP23820731.0A priority Critical patent/EP4362451A1/en
Priority to US18/543,836 priority patent/US20240119654A1/en
Publication of WO2024056022A1 publication Critical patent/WO2024056022A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/802D [Two Dimensional] animation, e.g. using sprites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/203Drawing of straight lines or curves

Definitions

  • Embodiments of the present disclosure relate to a subtitle processing method and device.
  • Subtitles in videos can assist in understanding the content of the video, so subtitles are often added when editing videos.
  • subtitle text is usually input manually or the corresponding audio is recognized using a subtitle recognition tool to obtain the subtitle text. Then, the subtitle text is adjusted and segmented by repeatedly listening to the audio to obtain a large number of text fragments, and these text fragments are synthesized with the video. Add subtitles to videos. For batch text scenarios such as subtitles, if users want to achieve a certain expected subtitle effect, they need to repeatedly adjust, synthesize and preview the subtitle text segmentation results. The efficiency of subtitle editing using the above method is very low.
  • the present disclosure provides a subtitle processing method and device.
  • an embodiment of the present disclosure provides a subtitle processing method, including:
  • Each of the text elements is synthesized with the material clips within the matching time range to obtain the target multimedia material with a subtitle text pop-up animation effect.
  • an embodiment of the present disclosure provides a subtitle processing device, including:
  • a speech recognition module configured to perform speech recognition on the audio corresponding to the multimedia material during the editing process of the multimedia material to obtain the subtitle text corresponding to the audio and each text element included in the subtitle text corresponding to the audio clip. timestamp information;
  • a matching module configured to match each material unit in the multimedia material according to the timestamp information of the corresponding audio fragment of each text element, and determine the material fragment in the multimedia material that matches each of the text elements; wherein , the time on the editing timeline of the material clip matching the text element is consistent with the time on the editing timeline of the audio clip corresponding to the text element;
  • the subtitle synthesis module is used to synthesize each of the text elements with material fragments within a matching time range to obtain target multimedia material with a subtitle text pop-out animation effect.
  • an embodiment of the present disclosure provides an electronic device, including: a memory and a processor; the memory is configured to store computer program instructions; the processor is configured to execute the computer program instructions, such that the The electronic device implements the subtitle processing method described in the first aspect.
  • embodiments of the present disclosure provide a readable storage medium, including: computer program instructions. At least one processor of an electronic device executes the computer program instructions, so that the electronic device implements subtitles as described in the first aspect. Approach.
  • embodiments of the present disclosure provide a computer program product, and an electronic device executes the computer program product, so that the electronic device implements the subtitle processing method described in the first aspect.
  • Figure 1 is a flow chart of a subtitle processing method provided by an embodiment of the present disclosure
  • Figure 2 is a flow chart of a subtitle processing method provided by another embodiment of the present disclosure.
  • Figure 3 is a flow chart of a subtitle processing method provided by another embodiment of the present disclosure.
  • FIGS. 4A to 4I are schematic diagrams of the human-computer interaction interface provided by the present disclosure.
  • FIG. 5 is a schematic structural diagram of a subtitle processing device provided by an embodiment of the present disclosure.
  • Subtitles can help users understand video content, and different subtitle effects can also express more dimensional content. For example, the corresponding text appears only when the audio in the video speaks of a certain word. Such subtitle effects are often used in plot interpretation. Used to express voice-over, and in the talking category, it is often used to express the speaker's confident and passionate emotions.
  • the user usually inputs subtitles manually to divide the subtitle text into individual words, and then repeatedly listens to the voice to make adjustments.
  • the user can also enter a completed sentence and use keyframe masks to achieve the effect of text appearing one by one. Not only is the subtitle editing efficiency low, but it is also extremely inconvenient to operate on the mobile terminal.
  • embodiments of the present disclosure provide a subtitle processing method and device, wherein the method includes: in the process of editing multimedia material, obtaining the subtitle text and subtitles corresponding to the audio by performing speech recognition on the audio corresponding to the multimedia material.
  • the timestamp information of the audio fragment corresponding to each text element in the text according to the timestamp information of the audio fragment corresponding to each text element, determine the material fragment in the multimedia material fragment that matches the text element; and then match each text element with
  • the material clips are synthesized within a certain period of time to obtain the target multimedia material with the subtitle text jumping out animation effect word by word.
  • the starting moment of the time range of the video frame image matching the text element is consistent with the starting moment of the audio segment corresponding to the text element, which can achieve the subtitle animation effect of the corresponding text subtitles appearing when a certain word is spoken.
  • the user input instructions can realize the automatic generation of dynamic subtitles, the user operation is simple, and it is conducive to improving the user experience, and the disclosed method can be applied to various types of devices and has a wide scope of application.
  • the method provided by the present disclosure can be executed by an electronic device.
  • the electronic device can be, but is not limited to, a tablet.
  • Computers mobile phones (such as foldable screen mobile phones, large screen mobile phones, etc.), wearable devices, vehicle-mounted devices, augmented reality (AR)/virtual reality (VR) devices, laptops, ultra mobile personal computers (ultra - mobile personal computer (UMPC), netbook, personal digital assistant (personal digital assistant, PDA), etc.
  • AR augmented reality
  • VR virtual reality
  • laptops ultra mobile personal computers
  • ultra - mobile personal computer (UMPC), netbook personal digital assistant (personal digital assistant, PDA), etc.
  • PDA personal digital assistant
  • FIG. 1 is a schematic flowchart of a subtitle processing method provided by an embodiment of the present disclosure. Taking the method of executing this embodiment on an electronic device as an example, the electronic device is equipped with an editing application program, and the user can edit multimedia materials through the application program. Please refer to Figure 1.
  • the method in this embodiment includes:
  • Multimedia materials can be video materials recorded in real time by users, video materials that have been edited before, or video materials stored in electronic devices. This disclosure does not limit this, and they can also be audio materials, image materials, etc. , this disclosure does not limit the type of multimedia materials. Furthermore, this disclosure does not limit the number of multimedia materials. If there are multiple multimedia materials, the multiple multimedia materials can be arranged in the order of import and can be regarded as a whole.
  • the process of editing multimedia materials can be understood as recording in advance or importing multimedia materials with audio or audio materials, or adding background music to multimedia materials (such as video materials, image materials).
  • multimedia materials such as video materials, image materials.
  • the editing methods are not limited to this.
  • the subtitle text is obtained through text recognition based on the audio corresponding to the multimedia material currently being edited.
  • the audio corresponding to the multimedia material can be the original audio included in the multimedia material or the background music added by the user to the multimedia material.
  • the background music can be an application.
  • An audio in the audio such as a complete song, or a partial fragment of a song, or a cut audio fragment, etc. This disclosure does not limit this.
  • speech recognition can be performed on the multimedia material itself.
  • the application can send audio to the middle-end service through an electronic device, and the middle-end service calls a subtitle recognition tool to perform text recognition on the audio to obtain the corresponding subtitle text and the audio fragments corresponding to each text element in the subtitle text.
  • Timestamp information which may include the start time and end time of the audio clip.
  • the total duration of the audio corresponding to the multimedia material clip is 7 seconds.
  • the subtitle text obtained by speech recognition of the audio is: I am very happy today.
  • the duration of each text element corresponding to the audio clip is 1 second. , therefore, the corresponding relationship between each text element and the timestamp information of the corresponding audio clip is as shown in Table 1 below:
  • the audio language is Chinese
  • the corresponding text elements are in units of words; if the audio is in other languages, the text elements are in corresponding words.
  • the text elements are In English words.
  • the application may perform speech recognition in response to instructions entered by the user.
  • voice recognition instructions may include but are not limited to click, double-click, long press, slide and other types of operations. For example, when an area/control corresponding to adding recognition subtitles to multimedia materials is set on a page of the application, the voice recognition instruction can be an operation received on the area/control.
  • the material fragments in the multimedia material that match text elements can be understood as image fragments/video fragments, and the material fragments include image frames/video frames synthesized with text elements.
  • the material fragments in the multimedia material that match the text can be understood as audio fragments, and the material fragments include one or more speech pronunciation units synthesized with text elements.
  • the subtitle processing method provided by the present disclosure aims to achieve the subtitle effect in which corresponding text subtitles appear when a certain word is spoken, when the matching material fragment is determined based on the timestamp information of the audio fragment corresponding to each text element, the text element
  • the time of the corresponding audio clip on the editing timeline is consistent with the time of the material clip on the editing timeline.
  • consistency on the editing timeline can be understood as meaning that the starting moment of the material clip matching the text element on the editing timeline is consistent with the starting moment of the audio clip corresponding to the text element on the editing timeline.
  • the disappearing time of text elements in subtitles can be flexible. It can disappear when its corresponding audio clip ends, or it can disappear when its sentence (or text clip of specified length) reaches the end position, or it can also disappear. It can disappear after a preset time period after the corresponding audio clip ends, which is not limited by this disclosure.
  • the end moment of the corresponding time of the material segment that matches the text element can be equal to the end moment of the audio segment corresponding to the text element.
  • the subtitle text has a word-by-word pop-up animation effect, and the preceding The text element will disappear when its corresponding audio clip reaches the end moment.
  • the end moment of the time range of the video frame image that matches the text element can be later than the end moment of the audio clip corresponding to the text element. In this way, the text elements will appear one by one, and the previously appearing text elements will be in their corresponding After the audio clip ends, it remains for a period of time before disappearing.
  • the speed of switching text elements depends on the speaking speed of the pronunciation object in the audio.
  • a preset first subtitle animation style can be used for the text element, and the subtitles automatically added to the multimedia material will automatically carry the subtitles corresponding to the first subtitle animation style when generated.
  • the effect satisfies users’ needs for subtitle effects and reduces users’ post-operation operations.
  • the first subtitle animation style may include one or more of an entry style, an exit style, and a loop style of the text element.
  • Steps S102 and S103 can be automatically implemented by calling a dynamic subtitle resource package (which can also be called a subtitle animation resource package), and the subtitle text and the timestamp information of each text element included in the subtitle text are passed into the dynamic subtitle resource package.
  • the dynamic subtitle resource package The package applies preset subtitle animation styles in batches to each text element, and overlays text elements with preset subtitle animation styles on matching In the material clip, thereby adding subtitles with subtitle text using the first subtitle animation style to the multimedia material to jump out of the animation effect word by word.
  • the method of this embodiment can achieve the subtitle animation effect of the corresponding text subtitle appearing when a certain word is said; in addition, the user input command can realize the automatic generation of dynamic subtitles, which is simple for the user to operate and is conducive to improving the user experience.
  • the method of the embodiment can be applied to various types of devices and has a wide range of applications. In batch text scenarios, for mobile devices with smaller screens, subtitles with specified effects can also be quickly added to multimedia material clips.
  • FIG. 2 is a flow chart of a subtitle processing method provided by another embodiment of the present disclosure. Please refer to Figure 2. Based on the embodiment shown in Figure 1, the method of this embodiment also includes:
  • brackets indicate the timestamp information of the audio clip corresponding to the text element.
  • the updated subtitle text is: Today (00:00-00:01) Day (00:01-00:02) True (00:02-00:04) Open (00 :04-00:05) (00:05-00:06), the brackets indicate the timestamp information of the audio clip corresponding to the text element.
  • the text insertion performed in this step is to insert new text elements without deleting existing text elements in the subtitle text.
  • different processing methods can be configured according to different insertion positions of the new text elements. In some embodiments, if the position where the new text element is inserted is in the middle of the subtitle text or at the end of the text, the new text element is combined with the adjacent previous text element. And, share the timestamp of the audio clip corresponding to the previous adjacent text element; if the position where the new text element is inserted is at the front end of the subtitle text, the new text element is merged with the first text element of the subtitle text, sharing the same timestamp. The timestamp of the audio segment corresponding to a text element.
  • the subtitle text is: Today (00:00-00:01) Day (00:01-00:02) True (00:02-00:04) Open (00:04-00 :05) ⁇ (00:05-00:06)Ah(00:06-00:07), the brackets indicate the timestamp information of the audio clip corresponding to the text element.
  • Scenario 1 After inserting the text element "of” after the text element "true", the updated subtitle text is: Today (00:00-00:01) Day (00:01-00:02) True (00 :02-00:04)Open(00:04-00:05)Heart(00:05-00:06)Ah(00:06-00:07), the brackets indicate the timestamp of the audio clip corresponding to the text element information.
  • Scenario 2 After inserting the new text element "Haha” in front of the text element "Today", the updated subtitle text is: Haha Today (00:00-00:01) Today (00:01-00:02) True (00:02-00:04)Open (00:04-00:05)Heart (00:05-00:06)Ah (00:06-00:07), the brackets indicate the audio clip corresponding to the text element Timestamp information.
  • the timestamp information corresponding to the replacement text is equal to the timestamp information of the audio segment corresponding to the replaced text element.
  • the replacement text may include one or more text elements, the replacement text may be understood as a whole, and the number of replaced text elements may also be one or more consecutive text elements.
  • the subtitle text is: Today (00:00-00:01) Day (00:01-00:02) True (00:02-00:04) Open (00:04-00 :05) ⁇ (00:05-00:06)Ah(00:06-00:07), the brackets indicate the timestamp information of the audio clip corresponding to the text element.
  • S107 Determine the material fragments in the multimedia material that match each of the text elements in the multimedia material according to the timestamp information of the audio fragments corresponding to each text element in the updated subtitle text.
  • Steps S107 to S108 are respectively similar to the implementation of step S102 and step S103 in the embodiment shown in FIG. 1 , and reference may be made to the detailed description of the embodiment shown in FIG. 1 .
  • the updated subtitle text and the timestamp information of each text element included in the updated subtitle text will be re-transmitted into the dynamic subtitle resource package.
  • the dynamic subtitle resource package will include the updated subtitle text. Re-apply the preset subtitle animation style to each text element in batches, and superimpose the text elements with the preset subtitle animation style in the matching material clips, thereby re-adding subtitles to the multimedia material.
  • the method of this embodiment can meet the user's need to adjust the subtitle content when adding subtitles to multimedia materials.
  • For the updated subtitle text it can automatically generate subtitles with specified subtitle effects, which is convenient for users to use and is conducive to user improvement. experience.
  • FIG. 3 is a schematic flowchart of a subtitle processing method provided by another embodiment of the present disclosure. Please refer to Figure 3. The method in this embodiment includes:
  • Steps S301 to S302 in this embodiment are respectively similar to steps S101 to S103 in the embodiment shown in FIG. 1. Reference may be made to the detailed description of the embodiment shown in FIG. 1, which will not be described again here. It should be noted that the first subtitle animation style can be understood as the default subtitle animation style of the application.
  • the application can provide the user with a subtitle animation style editing page through the electronic device.
  • the page can display one or more areas or controls corresponding to the subtitle animation style for the user to select.
  • the user can operate the subtitle animation style corresponding to the page. Enter subtitle animation style switching instructions in the area or control.
  • the updated subtitle text and the timestamp information of each text element included in the updated subtitle text will be re-transmitted into the dynamic subtitle resource package.
  • the dynamic subtitle resource package will include the updated subtitle text.
  • the second subtitle animation style specified by the user is re-applied in batches to each text element, and the text elements with the second subtitle animation style are superimposed in the matching material clips, thereby re-adding subtitles to the multimedia material.
  • the method of this embodiment can meet the user's needs for later adjustment of subtitle effects, supports batch editing of subtitle animation styles, and has high subtitle processing efficiency.
  • this disclosure takes an electronic device as an example, and illustrates the subtitle processing method provided by the disclosure in conjunction with the accompanying drawings and application scenarios.
  • the electronic device is a mobile phone
  • a video editing application (referred to as application 1) is installed in the mobile phone as an example.
  • the multimedia materials imported by the user into Application 1 are video materials.
  • Figures 4A-4I are schematic diagrams of human-computer interaction interfaces provided by embodiments of the present disclosure.
  • Application 1 can exemplarily display the user interface 11 as shown in Figure 4A on the mobile phone.
  • the user interface 11 is used to display a multimedia material editing page (hereinafter referred to as the editing page).
  • Application 1 executes a certain set of functions in the editing page. For example, preview the editing effect of playing multimedia materials, and Add background music to multimedia materials, add filters, stickers, text, etc. to multimedia materials.
  • the user interface 11 includes an area a1, which is a preview area for the editing effect of multimedia materials; the user interface 11 also includes an area a2, in which the multimedia materials and other clips added during the editing process can be displayed according to the timeline. material.
  • the user interface 11 also includes an area a3, which can provide the user with multiple editing function entrances.
  • area a3 includes control 101, which is used to enter the text function collection page of application 1.
  • the text function collection page includes multiple controls, and the multiple controls respectively correspond to different text functions.
  • the application 1 can display the user interface 12 as shown in FIG. 4B on the mobile phone, and the user Interface 12 shows the text function collection page provided by Application 1.
  • the text function collection page can provide users with entrances to various text functions. Users can enter the corresponding text function operation page through the entrance to add text content to multimedia materials.
  • the user interface 12 includes: area a4.
  • the area a4 includes entrances respectively corresponding to the new text function, text template function, subtitle recognition function, lyrics recognition function, sticker function and graffiti pen function.
  • the control 102 shown in the user interface 12 is the entrance corresponding to the subtitle recognition function.
  • application 1 After application 1 receives that the user performs an operation such as clicking on control 102 in user interface 12 shown in Figure 4B, application 1 can exemplarily display user interface 13 as shown in Figure 4C on the mobile phone.
  • User interface 13 is used for display.
  • the subtitle recognition panel provided by Application 1 is used.
  • the subtitle recognition panel can provide users with recognition type options, language selection entry, switch for marking invalid segments, switch for dynamic subtitles, and switch for clearing existing subtitles at the same time.
  • dynamic subtitles represent the function of adding subtitles with subtitle text jumping out animation effects to multimedia materials.
  • the subtitle effect presented by the added subtitles is in the form of a sentence fragment as a single subtitle;
  • the subtitle effect presented by the added subtitles is The effect of the subtitle text popping out word by word, that is, the text elements in the subtitle text appear one by one and the text elements are displayed at the beginning of the corresponding audio clip.
  • the user's selection can be memorized, and when the subtitle recognition panel is opened, the on/off status of the dynamic subtitles when the user last exited subtitle recognition is displayed, which is more in line with the user's usage habits.
  • the switch of dynamic subtitles may be turned off, as shown in user interface 13.
  • Application 1 receives the user's click on dynamic subtitles in the user interface 13 as shown in FIG. 4C
  • the user interface 14 shown in FIG. 4D is displayed.
  • the switch state of the dynamic subtitles is on.
  • the user interface 14 also includes a control 103, which is used to instruct starting speech recognition and adding subtitles with a word-by-word pop-up animation effect.
  • the application 1 After the application 1 responds to the user performing an operation such as clicking the control 103 in the user interface 14, the application 1 exemplarily displays the user interface 15 as shown in Figure 4E on the mobile phone.
  • the subtitle recognition panel in the user interface 15 is closed, and in area a4 Display prompt content, such as animation and prompt text, to remind the user that a dynamic subtitle animation is currently being created.
  • area a4 can be located above area a1. It should be understood that area a4 can also be located at other locations, and this disclosure does not limit this.
  • the user's switch on the dynamic subtitles and the operation of the control 103 trigger the application 1 to perform speech recognition on the audio corresponding to the multimedia material and automatically add dynamic subtitles with a word-by-word pop-up animation effect.
  • Application 1 can exemplarily display the user interface 16 as shown in Figure 4F on the mobile phone.
  • area a4 can display prompt content, such as the prompt text "Recognition successful, automatically Generate subtitles".
  • the user can click the preview play button to preview the subtitle effect in area a1. If it meets the user's expectations, the edited multimedia material can be exported as a target video for publishing or saving.
  • the present disclosure provides users with a dynamic subtitle switch in the pre-stage of subtitle recognition, which is more convenient for users to use. It also remembers the on/off status of the dynamic subtitle switch when the user exited the subtitle recognition panel last time. The user does not need to operate it when using it again, thereby improving the effect again and the user does not need to perform too many operations.
  • Application 1 also provides users with the function of adding dynamic subtitles or modifying existing subtitle animation styles in subsequent links.
  • the area a2 displays the identification corresponding to the multimedia material and the subtitle text according to the timeline.
  • the identification of the subtitle text displayed in the area a2 If clicked, you can trigger the subtitles to be edited again.
  • Application 1 receives the user's click operation on the text fragment contained in any subtitle in area a2 of user interface 16, and application 1 can exemplarily display user interface 17 as shown in Figure 4G on the mobile phone.
  • a text box 104 corresponding to the subtitle text is displayed in area a1.
  • the text box 104 contains the text content corresponding to the current preview position, which can be one or more statements (i.e., text fragments), and area a1 can also display controls for operating on the text box, such as rotation, copy, and the user can also use two fingers to Trigger to enlarge or reduce the size of the text box.
  • the size of the text elements in the text box will also change as the size of the text box changes.
  • the user interface 17 also includes an area a5.
  • the area a5 is used to display a subtitle editing function collection page.
  • the subtitle editing function collection page provides access to multiple editing functions for editing currently added subtitles.
  • the user interface 17 includes a control 105, which is used to enter the subtitle animation panel to add subtitle effects (including dynamic subtitle effects) to the current subtitles or to modify the subtitle animation style used for the current subtitles.
  • the application 1 After the application 1 receives the user's operation such as clicking on the control 105 in the user interface 17, it displays the user interface 18 as shown in FIG. 4H.
  • the user interface 18 includes an area a6.
  • the subtitle animation panel includes tags 106 for setting animation styles, as well as font tags, style tags, flower tags, text template tags, etc.
  • entering the subtitle animation style panel can be positioned to the label 106 by default, and the relevant content of the label 106 can be displayed.
  • other tags can be positioned, and application 1 displays relevant content of tag 106 after receiving the user's click operation on tag 106 .
  • area a6 also includes: a dynamic subtitle switch 107.
  • a dynamic subtitle switch 107 By operating the dynamic subtitle switch 107, a subtitle effect with text elements displayed one by one can be added to the current subtitle.
  • the area may be displayed as an on state; if the user has not used dynamic subtitles in the pre-production stage, the area may be displayed as an off state.
  • the switch state of the dynamic subtitle switch 107 displayed in the user interface 18 can be switched to an on state. Among them, in the embodiment shown in Figure 4H, the dynamic subtitle switch 107 is in a closed state.
  • area a6 also includes: a tag 108 for setting the subtitle entry style, a tag 109 for setting the subtitle exit style, a tag 110 for setting the subtitle loop style, a tag 111 for setting the dynamic subtitle animation style, and area a7, where area a7 is Based on the currently positioned label, the content of the corresponding label is displayed.
  • the content related to any tag can be displayed by default.
  • the user interface 17 shown in FIG. 4H displays the relevant content corresponding to the tag 108 by default.
  • the application 1 when the application 1 receives the user's execution of the dynamic subtitle switch 107 in the user interface 18 operation (such as a click operation), when the dynamic subtitle switch 107 is switched from the closed state to the open state, the application 1 can exemplarily display the user interface 19 as shown in Figure 4I on the mobile phone.
  • the user interface 19 the dynamic subtitle switch 107 is in an open state, and the label 111 is in a selected state.
  • Area a7 is used to display one or more dynamic subtitle animation styles related to dynamic subtitles that can be selected by the user.
  • Multiple dynamic subtitle animation styles The corresponding display logos can be arranged in sequence from left to right, and the user can view them back and forth by sliding the screen left or right.
  • the default dynamic subtitle animation style in Application 1 can be displayed in the first position from left to right, so that the user can clearly understand which dynamic subtitle animation style is used by Application 1 by default.
  • the area a7 may also include a disabling button 112.
  • the disabling button may be set at the leftmost side of the area a7. Of course, it may also be set at other positions. This disclosure does not limit this.
  • the disable button 112 When the user clicks the disable button 112 to turn off the dynamic subtitle effect, the dynamic subtitle switch 107 will be switched to a closed state.
  • Application 1 responds to the subtitle animation style switching instruction and provides each text element included in the subtitle text. Apply a second dynamic subtitle style. Users can switch the dynamic subtitle animation style multiple times until they get the subtitle effect that meets the user's expectations.
  • area a5 also includes: area a8.
  • Area a8 is used to display a text editing box.
  • the user can delete the subtitle text through the text editing box. text element, insert new text or replace the original text element.
  • the user's operation on the text editing box is equivalent to inputting a delete command, an insert command and a replacement command into Application 1.
  • the edited text content is synchronously displayed in the text box 104 shown in area a1, which is beneficial to the user to preview the edited subtitle content and the subtitle content in the multimedia How the clip appears in the video frame image.
  • the dynamic subtitle switch and the dynamic subtitle animation style tag are set in the subtitle animation style panel to meet the user's needs and adjustments for adding dynamic subtitles in the post-processing stage. Requirements for subtitle dynamic styles used by dynamic subtitles.
  • FIG. 5 is a schematic structural diagram of a subtitle processing device provided by an embodiment of the present disclosure. Please refer to Figure 5.
  • the device 500 provided in this embodiment includes:
  • the speech recognition module 501 is used to perform speech recognition on the audio corresponding to the multimedia material to obtain the subtitle text corresponding to the audio and each text element included in the subtitle text corresponding to the audio segment during the editing process of the multimedia material. timestamp information.
  • the matching module 502 is configured to match each material unit in the multimedia material according to the timestamp information of the corresponding audio fragment of each text element, and determine the material fragment in the multimedia material that matches each of the text elements; Wherein, the time on the editing timeline of the material segment matching the text element is consistent with the time on the editing timeline of the audio segment corresponding to the text element.
  • the subtitle synthesis module 503 is used to synthesize each of the text elements with material segments within a matching time range to obtain target multimedia material with a subtitle text pop-out animation effect.
  • the starting moment of the time of the material segment matching the text element is consistent with the starting moment of the audio segment corresponding to the text element; and in the editing On the timeline, the end time of the time to which the material fragment matching the text element belongs is consistent with the end time of the audio fragment corresponding to the text element, or the end time of the time to which the material fragment matching the text element belongs. Later than the end moment of the audio segment corresponding to the text element.
  • the subtitle synthesis module 503 is specifically configured to batch apply a specified first subtitle animation style to each of the text elements, and synthesize the text elements with the first subtitle animation style with the material clips within a matching time period.
  • the target multimedia material with subtitle text using the first subtitle animation style to pop out the animation effect word by word is obtained.
  • the device 500 further includes: a subtitle text update module 504.
  • the subtitle text update module 504 is configured to respond to a text deletion instruction and delete corresponding text elements from the subtitle text to obtain updated subtitle text.
  • the matching module 502 is also configured to determine the material fragments in the multimedia material that match each of the text elements according to the timestamp information of the audio fragments corresponding to each text element in the updated subtitle text.
  • the subtitle synthesis module 503 is also used to synthesize each of the text elements included in the updated subtitle text with the material fragments within the matching time, so as to add a word-by-word pop-up animation effect of the subtitle text to the multimedia material. subtitles.
  • the subtitle text update module 504 is also used to respond to text insertion instructions, Insert a new text element into the subtitle text to obtain updated subtitle text.
  • the matching module 502 is also configured to match the timestamp information of each audio clip corresponding to each text element in the updated subtitle text with each material unit in the multimedia material, and determine the corresponding content of each text element in the multimedia material.
  • the text elements respectively match the material segments; wherein the newly added text element merges with the adjacent text element to share the timestamp information of the audio segment corresponding to the adjacent text element.
  • the subtitle synthesis module 503 is also used to combine each of the text elements included in the updated subtitle text with the material fragments within the matching time range, so as to re-add the subtitle text with word-by-word pop-up to the multimedia material.
  • Animated subtitles are also used to combine each of the text elements included in the updated subtitle text with the material fragments within the matching time range, so as to re-add the subtitle text with word-by-word pop-up to the multimedia material.
  • the new text element if the insertion position of the new text element is the front end of the subtitle text, the new text element is merged with the first text element in the subtitle text, sharing the The timestamp of the audio segment corresponding to the first text element; if the insertion position of the new text element is the middle or end position of the subtitle text, the new text element is merged with the adjacent previous text element , sharing the timestamp of the audio clip corresponding to the previous text element.
  • the subtitle text update module 504 is also configured to respond to a text replacement instruction and replace one or more text elements in the subtitle text with replacement text to obtain updated subtitle text.
  • the matching module 502 is also configured to determine the material fragments in the multimedia material that match each of the text elements respectively according to the timestamp information of the audio fragments corresponding to each text element in the updated subtitle text; wherein , the replacement text corresponds to the timestamp information of the audio segment corresponding to the replaced text element.
  • the subtitle synthesis module 503 is also used to synthesize each of the text elements included in the updated subtitle text with the material fragments within the matching time, so as to add a word-by-word pop-up animation effect of the subtitle text to the multimedia material. subtitles.
  • the subtitle synthesis module 503 is also configured to respond to a subtitle animation style switching instruction, batch apply a second subtitle animation style to each of the text elements, and match the text elements with the second subtitle animation style with The material clips within the time range are synthesized to obtain the target multimedia material with the subtitle text using the second subtitle animation style to pop out the animation effect word by word.
  • the audio corresponding to the multimedia material is the original audio included in the multimedia material or the background music added to the multimedia material.
  • the subtitle processing device provided in this embodiment can be used to execute the technical solution of any of the foregoing method embodiments. Its implementation principles and technical effects are similar. Please refer to the detailed description of the foregoing method embodiments. For the sake of simplicity, they will not be described again here.
  • the present disclosure provides an electronic device, including: one or more processors; a memory; and one or more computer programs; wherein the one or more computer programs are stored in the memory; the one or more processors When one or more computer programs are executed, the electronic device is caused to implement the subtitle processing method of the previous embodiment.
  • the present disclosure provides a chip system, which is applied to an electronic device including a memory and a sensor; the chip system includes: a processor; when the processor executes the subtitle processing method of the previous embodiment.
  • the present disclosure provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor so that the electronic device implements the subtitle processing method of the previous embodiment.
  • the present disclosure provides a computer program product, which when run on a computer causes the computer to execute the subtitle processing method of the foregoing embodiments.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Studio Circuits (AREA)

Abstract

本公开涉及一种字幕处理方法及装置,其中,该方法包括:在对多媒体素材片段进行编辑的过程中,通过对多媒体素材对应的音频进行语音识别得到音频对应的字幕文本以及字幕文本中各文本元素对应的音频片段的时间戳信息;根据各文本元素对应的音频片段的时间戳信息,确定多媒体素材片段中与文本元素相匹配的素材片段;再将各文本元素分别与相匹配的时间内的素材片段合成得到有字幕文本逐字跳出动画效果的目标多媒体素材。本公开的方案能够实现说到某个词时相应文本字幕出现的字幕动画效果;此外,用户输入指令可实现动态字幕自动生成,用户操作简单,有利于提升用户体验。

Description

字幕处理方法及装置
本申请要求于2022年9月14日递交的中国专利申请第202211117721.1号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开的实施例涉及一种字幕处理方法及装置。
背景技术
视频中的字幕能够起到辅助理解视频内容的作用,因此,在对视频进行编辑时常常会添加字幕。
目前,通常采用人工输入字幕文本或者使用字幕识别工具对相应的音频进行识别得到字幕文本,然后,再通过反复试听音频对字幕文本进行调整分割得到大量的文本片段,将这些文本片段与视频合成从而为视频添加字幕。针对字幕这类批量文本场景,若用户想要实现某种预期字幕效果,需要反复对字幕文本的分割结果进行调整、合成以及预览字幕效果,采用上述方式进行字幕编辑效率很低。
发明内容
为了解决上述技术问题,本公开提供了一种字幕处理方法及装置。
第一方面,本公开实施例提供了一种字幕处理方法,包括:
在对多媒体素材进行编辑的过程中,对所述多媒体素材对应的音频进行语音识别得到所述音频对应的字幕文本以及所述字幕文本包括的各文本元素分别对应音频片段的时间戳信息;
根据各所述文本元素分别对应音频片段的时间戳信息与所述多媒体素材中各素材单元进行匹配,确定所述多媒体素材中与各所述文本元素分别匹配的素材片段;其中,与所述文本元素相匹配的素材片段在编辑时间线上的时间与所述文本元素对应的音频片段在编辑时间线上的时间一致;
将各所述文本元素分别与相匹配的时间范围内的素材片段合成得到有字幕文本逐字跳出动画效果的目标多媒体素材。
第二方面,本公开实施例提供了一种字幕处理装置,包括:
语音识别模块,用于在对多媒体素材进行编辑的过程中,对所述多媒体素材对应的音频进行语音识别得到所述音频对应的字幕文本以及所述字幕文本包括的各文本元素分别对应音频片段的时间戳信息;
匹配模块,用于根据各所述文本元素分别对应音频片段的时间戳信息与所述多媒体素材中各素材单元进行匹配,确定所述多媒体素材中与各所述文本元素分别匹配的素材片段;其中,与所述文本元素相匹配的素材片段在编辑时间线上的时间与所述文本元素对应的音频片段在编辑时间线上的时间一致;
字幕合成模块,用于将各所述文本元素分别与相匹配的时间范围内的素材片段合成得到有字幕文本逐字跳出动画效果的目标多媒体素材。
第三方面,本公开实施例提供了一种电子设备,包括:存储器和处理器;所述存储器被配置为存储计算机程序指令;所述处理器被配置为执行所述计算机程序指令,使得所述电子设备实现第一方面所述的字幕处理方法。
第四方面,本公开实施例提供一种可读存储介质,包括:计算机程序指令,电子设备的至少一个处理器执行所述计算机程序指令,使得所述电子设备实现如第一方面所述的字幕处理方法。
第五方面,本公开实施例提供一种计算机程序产品,电子设备执行所述计算机程序产品,使得所述电子设备实现第一方面所述的字幕处理方法。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
为了更清楚地说明本公开实施例的技术方案,下面将对实施例所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开一实施例提供的字幕处理方法的流程图;
图2为本公开另一实施例提供的字幕处理方法的流程图;
图3为本公开另一实施例提供的字幕处理方法的流程图;
图4A至图4I为本公开提供的人机交互界面示意图;以及
图5为本公开一实施例提供的字幕处理装置的结构示意图。
具体实施方式
为了能够更清楚地理解本公开的上述目的、特征和优点,下面将对本公开的方案进行进一步描述。需要说明的是,在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本公开,但本公开还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本公开的一部分实施例,而不是全部的实施例。
字幕能够辅助用户理解视频内容,不同的字幕效果还能够表示更多维度的内容,例如,视频中的音频说到某个字时对应的文本才出现,这样的字幕效果在剧情演绎的垂类常常用来表示画外音,在talking垂类常常用来表示说话者的自信激昂的情绪。实现上述特定的字幕效果常通过用户手动输入字幕将字幕文本分割成一个个的字,之后再反复试听语音进行调整。此外,还可以通过用户输入一个完成的语句,使用关键帧蒙版的方式实现文本逐个出现的效果。不仅字幕编辑效率较低,且在移动端操作极其不便。
基于此,本公开实施例提供一种字幕处理方法及装置,其中,该方法包括:在对多媒体素材进行编辑的过程中,通过对多媒体素材对应的音频进行语音识别得到音频对应的字幕文本以及字幕文本中各文本元素对应的音频片段的时间戳信息;根据各文本元素对应的音频片段的时间戳信息,确定多媒体素材片段中与文本元素相匹配的素材片段;再将各文本元素分别与相匹配的时间内的素材片段合成得到有字幕文本逐字跳出动画效果的目标多媒体素材。本公开中,与文本元素相匹配的视频帧图像的时间范围的起始时刻与该文本元素对应的音频片段的起始时刻一致,能够实现说到某个词时相应文本字幕出现的字幕动画效果;此外,用户输入指令可实现动态字幕自动生成,用户操作简单,有利于提升用户体验,且本公开的方法能够适用于各种类型的设备,适用范围较广。
本公开提供的方法可以由电子设备执行,电子设备可以但不限于是平板 电脑、手机(如折叠屏手机、大屏手机等)、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等,本公开对电子设备的具体类型不作任何限制。
图1为本公开一实施例提供的字幕处理方法的流程示意图。以电子设备执行本实施例的方法为例进行举例说明,电子设备中安装有编辑类应用程序,用户可以通过应用程序编辑多媒体素材。请参阅图1所示,本实施例的方法包括:
S101、在对多媒体素材进行编辑的过程中,对多媒体素材对应的音频进行语音识别得到音频对应的字幕文本以及字幕文本包括的各文本元素分别对应音频片段的时间戳信息。
多媒体素材可以为用户实时录制的视频素材,也可以为之前编辑过的视频素材,也可以为电子设备中存储的视频素材,本公开对此不做限定,也可以为音频素材、图像素材等等,本公开对于多媒体素材的类型不做限定。且本公开对于多媒体素材的数量不做限定,若有多个多媒体素材,多个多媒体素材可以按照导入顺序进行排列,可以视作一个整体。
其中,对多媒体素材进行编辑的过程可以理解为提前录制或者导入带有音频的多媒体素材或者音频素材,或者,为多媒体素材(如视频素材、图像素材)添加背景音乐。当然,编辑的方式不仅限于此。
字幕文本是通过当前正在编辑的多媒体素材对应的音频进行文本识别得到的,其中,多媒体素材对应的音频可以为多媒体素材包括的原始音频或者用户向多媒体素材添加的背景音乐,背景音乐可以是应用程序中的一个音频,如一首完整的歌曲,或者,一首歌曲的部分片段,或者,一个剪切后的音频片段等等,本公开对此不做限定。当多媒体素材为音频素材时,可以对多媒体素材本身进行语音识别。
在一些实施例中,应用程序可以通过电子设备向中台服务发送音频,中台服务调用字幕识别工具对音频进行文本识别,得到相应的字幕文本以及字幕文本中各文本元素分别对应的音频片段的时间戳信息,时间戳信息可以包括音频片段的起始时刻和结束时刻。
例如,多媒体素材片段对应的音频的总时长为7秒,对音频进行语音识别得到的字幕文本为:我今天很开心呀,总共7个文本元素,每个文本元素对应音频片段的时间为1秒,因此,各文本元素与相应的音频片段的时间戳信息的对应关系如下表1所示:
表1
上述示例以音频所采用的语言种类为中文进行示例,相应的文本元素是以字为单位;若音频采用其他语言种类时,文本元素以相应的单词为单位,例如,音频采用英文时,文本元素以英文单词为单位。
在一些实施例中,应用程序可以响应用户输入的指令执行语音识别。本公开对于触发语音识别的指令的实现方式不做限定。在一些实施例中,语音识别的指令可以包括但不限于点击、双击、长按、滑动等类型的操作。例如,在应用程序的一个页面上设置有一个为多媒体素材添加识别字幕对应的区域/控件时,语音识别的指令可以为该区域/控件上接收到的操作。
S102、根据各文本元素分别对应音频片段的时间戳信息与多媒体素材中各素材单元进行匹配,确定多媒体素材中与各文本元素分别匹配的素材片段;其中,与文本元素相匹配的素材片段在编辑时间线上的时间与文本元素对应的音频片段在编辑时间线上的时间一致。
若多媒体素材为图像素材/视频素材时,多媒体素材中与文本元素相匹配的素材片段可以理解为图像片段/视频片段,素材片段中包括与文本元素进行合成的图像帧/视频帧。若多媒体素材为音频素材时,多媒体素材中与文本相匹配的素材片段可以理解为音频片段,素材片段中包括与文本元素进行合成的一个或多个语音发音单元。
由于本公开提供的字幕处理方法旨在实现说到某个词时相应文本字幕出现的字幕效果,因此,根据各文本元素分别对应的音频片段的时间戳信息确定相匹配的素材片段时,文本元素对应的音频片段在编辑时间线上的时间与素材片段在编辑时间线上的时间一致。
此处在编辑时间线上一致可以理解为与文本元素相匹配的素材片段在编辑时间线上的起始时刻与文本元素对应的音频片段在编辑时间线上的起始时刻一致。
而字幕中文本元素消失的时间可以是灵活多变的,可以在其对应的音频片段结束时消失,也可以在其所属的语句(或者指定长度的文本片段)达到结束位置时消失,或者,还可以在其对应的音频片段结束后经过预设时长消失,本公开不做限定。
因此,在编辑时间线上,与文本元素相匹配的素材片段对应时间的结束时刻可以等于该文本元素对应的音频片段的结束时刻,采用该方式,字幕文本具有逐字跳出动画效果,且前面出现的文本元素会随着其对应的音频片段达到结束时刻而消失。
与文本元素相匹配的视频帧图像所属时间范围的结束时刻可以晚于该文本元素对应的音频片段的结束时刻,采用该方式,文本元素会逐个出现,且前面出现的文本元素会在其相应的音频片段结束后保留一段时间再消失。
其中,文本元素切换的速度取决于音频中发音对象的语速。
S103、将各文本元素分别与相匹配的时间范围内的素材片段合成得到有字幕文本逐字跳出动画效果的目标多媒体素材。
其中,将文本元素与相匹配的素材片段合成时,可以为文本元素使用预先设定的第一字幕动画样式,为多媒体素材自动添加的字幕在生成时便自动携带第一字幕动画样式对应的字幕效果,满足用户对于字幕效果的需求,减少用户在后期操作。第一字幕动画样式可以包括文本元素的入场样式、出场样式以及循环样式中的一项或多项。
步骤S102和步骤S103可以通过调用动态字幕资源包(也可以称为字幕动画资源包)自动实现,将字幕文本以及字幕文本包括的各文本元素的时间戳信息传入动态字幕资源包,动态字幕资源包为各文本元素批量应用预先设定的字幕动画样式,并将带有预设的字幕动画样式的文本元素叠加在相匹配 的素材片段中,从而为多媒体素材添加具有字幕文本采用第一字幕动画样式逐字跳出动画效果的字幕。
本实施例的方法,能够实现说到某个词时相应文本字幕出现的字幕动画效果;此外,用户输入指令可实现动态字幕自动生成,用户操作简单,有利于提升用户体验,且实施例的方法能够适用于各种类型的设备,适用范围较广,在批量文本场景中,对于屏幕较小的移动设备也能够快速地为多媒体素材片段添加指定效果的字幕。
通过图1所示实施例的方法为多媒体素材片段添加字幕后,用户还可以对字幕文本的内容进行再次编辑,再次编辑可以但不限于:删除文本元素、插入新增文本元素、替换文本元素。图2为本公开另一实施例提供的字幕处理方法的流程图。请参阅图2所示,本实施例的方法在图1所示实施例的基础上,还包括:
S104、响应文本删除指令,从所述字幕文本中删除相应的文本元素得到更新后的字幕文本。
其中,删除字幕文本中的文本元素,保留剩余的文本元素以及剩余文本元素的时间戳信息即可,得到更新后的字幕文本以及更新后的字幕文本中各文本元素的时间戳信息。
示例性地:假设,删除前字幕文本为:今(00:00-00:01)天(00:01-00:02)真(00:02-00:04)开(00:04-00:05)心(00:05-00:06)啊(00:06-00:07),括号内表示文本元素对应的音频片段的时间戳信息。
删除最后一个文本元素“啊”之后,更新后的字幕文本为:今(00:00-00:01)天(00:01-00:02)真(00:02-00:04)开(00:04-00:05)心(00:05-00:06),括号内表示文本元素对应的音频片段的时间戳信息。
若是删除其他位置的文本元素,采用类似的方式处理即可。
S105、响应文本插入指令,在所述字幕文本中插入新增文本元素得到更新后的字幕文本。
本步骤所执行的文本插入为不删除字幕文本中现有的文本元素的情况下插入新增文本元素。在一些实施例中,可根据新增文本元素的插入位置不同配置不同的处理方式。在一些实施例中,若插入新增文本元素的位置为字幕文本的中间或者文本末端,则将新增文本元素与相邻的前一个文本元素合 并,共用相邻的前一个文本元素对应的音频片段的时间戳;若插入新增文本元素的位置为字幕文本的最前端,新增文本元素与字幕文本的第一个文本元素合并,共用第一个文本元素对应的音频片段的时间戳。
示例性地,假设插入文本之前,字幕文本为:今(00:00-00:01)天(00:01-00:02)真(00:02-00:04)开(00:04-00:05)心(00:05-00:06)啊(00:06-00:07),括号内表示文本元素对应的音频片段的时间戳信息。
情形一、在文本元素“真”的后面插入文本元素“的”之后,更新后的字幕文本为:今(00:00-00:01)天(00:01-00:02)真的(00:02-00:04)开(00:04-00:05)心(00:05-00:06)啊(00:06-00:07),括号内表示文本元素对应的音频片段的时间戳信息。
对比可知,执行插入新增文本元素之后,“真的”共用原先“真”所对应的音频片段的时间戳信息(00:02-00:04)。
情形二、在文本元素“今”的前面插入新增文本元素“哈哈”之后,更新后的字幕文本为:哈哈今(00:00-00:01)天(00:01-00:02)真(00:02-00:04)开(00:04-00:05)心(00:05-00:06)啊(00:06-00:07),括号内表示文本元素对应的音频片段的时间戳信息。
对比可知,执行插入新增文本元素之后,“哈哈我”共用原先“我”所对应的音频片段的时间戳(00:00-00:01)。
S106、响应文本替换指令,采用替换文本替换所述字幕文本中的一个或多个文本元素得到更新后的字幕文本。
替换时,替换文本对应的时间戳信息等同于被替换文本元素所对应的音频片段的时间戳信息。在一次替换中,替换文本中可以包括一个或者多个文本元素,替换文本可以理解为一个整体,被替换文本元素的数量也可以为一个或多个位置连续的文本元素。
示例性地,假设插入文本之前,字幕文本为:今(00:00-00:01)天(00:01-00:02)真(00:02-00:04)开(00:04-00:05)心(00:05-00:06)啊(00:06-00:07),括号内表示文本元素对应的音频片段的时间戳信息。
假设,采用替换文本“难过”替换“开心”,采用替换文本“呀”替换“啊”,替换之后得到的更新后的字幕文本为:今(00:00-00:01)天(00:01-00:02)真(00:02-00:04)难过(00:04-00:06)呀(00:06-00:07),括号内表 示文本元素对应的音频片段的时间戳信息。
对比可知,替换之后,“难过”采用原先“开心”分别对应的音频片段的时间戳之和(00:04-00:06);“呀”采用原先“啊”对应的音频片段的时间戳(00:06-00:07)。
对字幕文本进行编辑可以根据需求选择上述一种或多种编辑方式。
S107、根据更新后的字幕文本中各文本元素分别对应音频片段的时间戳信息,确定所述多媒体素材中与各所述文本元素分别匹配的素材片段。
S108、将各所述文本元素分别与相匹配的时间内的素材片段合成,以为所述多媒体素材重新添加字幕。
步骤S107至步骤S108分别与前述图1所示实施例中步骤S102和步骤S103的实现方式类似,可参照前述图1所示实施例的详细描述。
若通过调用动态字幕资源包自动实现,将更新后的字幕文本以及更新后的字幕文本包括的各文本元素的时间戳信息重新传入动态字幕资源包,动态字幕资源包为更新后的字幕文本包括的各文本元素重新批量应用预先设定的字幕动画样式,并将带有预设的字幕动画样式的文本元素叠加在相匹配的素材片段中,从而为多媒体素材重新添加字幕。
本实施例的方法,能够满足用户在为多媒体素材添加字幕时对字幕内容进行调整的需求,针对更新后的字幕文本,能够自动生成带有指定字幕效果的字幕,方便用户使用,有利于提高用户体验。
通过图1所示实施例的方法为多媒体素材添加字幕后,用户还可以对字幕当前所采用的字幕动画样式进行调整,以得到符合用户预期的字幕效果。图3为本公开另一实施例提供的字幕处理方法的流程示意图。请参阅图3所示,本实施例的方法包括:
S301、在对多媒体素材进行编辑的过程中,对多媒体素材对应的音频进行语音识别得到音频对应的字幕文本以及字幕文本包括的各文本元素分别对应音频片段的时间戳信息。
S302、根据各文本元素分别对应音频片段的时间戳信息与多媒体素材中各素材单元进行匹配,确定多媒体素材中与各文本元素分别匹配的素材片段;其中,与文本元素相匹配的素材片段在编辑时间线上的时间与文本元素对应的音频片段在编辑时间线上的时间一致。
S303、为各文本元素批量应用指定的第一字幕动画样式,将具有第一字幕动画样式的文本元素与相匹配的时间内的素材片段合成得到有字幕文本采用所述第一字幕动画样式逐字跳出动画效果的目标多媒体素材。
本实施例中步骤S301至步骤S302分别与图1所示实施例中步骤S101至步骤S103类似,可参照图1所示实施例的详细描述,此处不再赘述。需要说明的是,第一字幕动画样式可以理解为应用程序默认的字幕动画样式。
S304、响应字幕动画样式切换指令,为各文本元素批量应用所述第二字幕动画样式,将具有第二字幕动画样式的文本元素与相匹配的时间内的素材片段合成得到有字幕文本采用第二字幕动画样式逐字跳出动画效果的目标多媒体素材。
其中,应用程序可以通过电子设备向用户提供的字幕动画样式编辑的页面,页面中可以展示一个或多个可供用户选择的字幕动画样式对应的区域或者控件,用户可以通过操作字幕动画样式对应的区域或者控件输入字幕动画样式切换指令。
若通过调用动态字幕资源包自动实现,将更新后的字幕文本以及更新后的字幕文本包括的各文本元素的时间戳信息重新传入动态字幕资源包,动态字幕资源包为更新后的字幕文本包括的各文本元素重新批量应用用户指定的第二字幕动画样式,并将带有第二字幕动画样式的文本元素叠加在相匹配的素材片段中,从而为多媒体素材重新添加字幕。
本实施例的方法能够满足用户对后期调整字幕效果的需求,且支持字幕动画样式批量编辑,字幕处理效率较高。
基于前述描述,本公开以实施例将以电子设备为例,结合附图和应用场景,对本公开提供的字幕处理方法进行举例说明。为了便于说明,图4A-图4K中,以电子设备为手机,手机中安装有视频编辑类应用程序(简称应用1)为例进行示意。此外,用户向应用1中导入的多媒体素材为视频素材。
请参阅图4A-图4I,图4A-图4I为本公开实施例提供的人机交互界面示意图。
应用1可以在手机上示例性地显示如图4A所示的用户界面11,用户界面11用于显示多媒体素材编辑页面(以下简称为编辑页面),应用1在编辑页面中执行某个功能集合,如预览播放多媒体素材的剪辑效果、为多媒体素 材添加背景音乐、为多媒体素材添加滤镜、贴纸、文字等等。
参照图4A所示,用户界面11包括区域a1,为多媒体素材的剪辑效果预览区域;用户界面11还包括区域a2,在区域a2中可以按照时间轴展示多媒体素材以及在编辑过程中添加的其他剪辑素材。用户界面11还包括区域a3,区域a3中可以向用户提供多种剪辑功能入口。例如,区域a3中包括控件101,控件101用于进入到应用1的文本功能集合页,该文本功能集合页中包括多个控件,多个控件分别对应不同的文本功能。
示例性地,在应用1接收到用户在图4A所示的用户界面11中执行如点击控件101的操作后,应用1可以在手机上示例性地显示如图4B所示的用户界面12,用户界面12中展示了应用1提供的文本功能集合页,文本功能集合页中可以向用户提供各种不同文本功能的入口,用户通过入口进入相应的文本功能操作页面向多媒体素材中添加文本内容。
用户界面12包括:区域a4,区域a4中包括:新建文本功能、文字模板功能、字幕识别功能、歌词识别功能、贴纸功能以及涂鸦笔功能分别对应的入口。其中,用户界面12中所示的控件102即为字幕识别功能对应的入口。
应用1接收到用户在图4B所示的用户界面12中执行如点击控件102的操作后,应用1可以在手机上示例性地显示如图4C所示的用户界面13,用户界面13用于展示了应用1提供的字幕识别面板,字幕识别面板中可以向用户提供识别类型的选项、语言种类选择入口、标记无效片段的开关、动态字幕的开关以及同时清空已有字幕的开关。
其中,动态字幕表示为多媒体素材添加具有字幕文本逐字跳出动画效果的字幕的功能。具体地,当动态字幕的开关为关闭状态时,添加的字幕所呈现的字幕效果是以语句片段为单条字幕的形式出现;动态字幕的开关为打开状态时,添加的字幕所呈现的字幕效果是字幕文本逐字跳出的效果,即字幕文本中的文本元素逐个出现且文本元素在相应的音频片段开始时显示。
在一些实施例中,可以记忆用户的选择,打开字幕识别面板时显示用户上次退出字幕识别时动态字幕的开关状态,更加符合用户的使用习惯。在应用1首次更新了动态字幕功能时,动态字幕的开关可以为关闭状态,即如用户界面13所示。
应用1接收到用户在如图4C所示的用户界面13中执行如点击动态字幕 的开关按钮的操作后,显示如图4D所示的用户界面14,用户界面14中,动态字幕的开关状态为打开状态。
用户界面14中还包括控件103,控件103用于指示开始语音识别并添加具有逐字跳出动画效果的字幕。应用1响应于用户在用户界面14中执行如点击控件103的操作后,应用1在手机上示例性显示如图4E所示的用户界面15,用户界面15中字幕识别面板关闭,且在区域a4中显示提示内容,例如动画和提示文字,以提示用户当前正在创建动态字幕动画。为了减小提示动画以及提示文字对区域a1中展示的预览画面的遮挡,区域a4可以位于区域a1的上方,应理解区域a4也可以位于其他位置,本公开对此不做限定。
其中,结合前文所述,用户对动态字幕的开关以及对控件103的操作触发应用1对多媒体素材对应的音频进行语音识别并自动添加具有逐字跳出动画效果的动态字幕。
当动态字幕动画创建完成后,应用1可以在手机上示例性地显示如图4F所示的用户界面16,用户界面16中,区域a4中可以显示提示内容,例如提示文字“识别成功,已自动生成字幕”。
之后,用户可以点击预览播放按钮,在区域a1中预览字幕效果,如果符合用户预期,则可以将编辑好的多媒体素材导出为目标视频进行发布或者保存。
结合图4A至图4F所示的交互过程,本公开在字幕识别的前置阶段为用户提供了动态字幕开关,更便于用户使用。且记忆用户上一次退出字幕识别面板时,动态字幕开关的开关状态,用户再次使用时无需进行操作,从而再使用上再次提效,用户也无需执行过多的操作。
为了能够更好地满足用户需求,应用1还为用户提供了在后置的环节添加动态字幕或者修改已有字幕动画样式的功能。
示例性地,在图4F所示的用户界面16的基础上,区域a2中按照时间轴展示了多媒体素材以及字幕文本分别对应的标识,通过对区域a2中所展示的字幕文本的标识进行操作(如点击),可以触发对字幕进行再次编辑。应用1接收到用户对用户界面16的区域a2中任一字幕包含的文本片段的点击操作,应用1可以在手机上示例性地显示如图4G所示的用户界面17。
用户界面17中,区域a1中显示字幕文本对应的文本框104,文本框104 中包含当前预览位置对应的文本内容,可以是一个或者多个语句(即文本片段),且区域a1中还可以显示针对文本框进行操作的控件,例如,旋转、复制、用户还可以通过双指触发放大或者缩小文本框的尺寸,文本框中文本元素的尺寸也会随着文本框的大小变化而变化。用户界面17中还包括区域a5,区域a5用于显示字幕编辑功能集合页的区域,字幕编辑功能集合页提供了对当前添加的字幕进行编辑的多种编辑功能的入口,例如,可以包括:批量编辑字幕、字幕分割、复制字幕、编辑字幕、删除字幕、花字以及字幕动画样式等等功能分别对应的入口。其中,用户界面17中包含控件105,控件105用于进入字幕动画面板,以为当前字幕添加字幕效果(其中包含动态字幕效果)或者对当前字幕所采用的字幕动画样式进行修改。
应用1接收用户在用户界面17中执行如点击控件105的操作后,显示如图4H所示的用户界面18,用户界面18包括区域a6。
其中,区域a6用户展示字幕动画面板,字幕动画面板包括设置动画样式的标签106以及字体标签、样式标签、花字标签、文字模板标签等。一些实施例中,可以如图4H所示,进入字幕动画样式面板可以默认定位至标签106,并显示标签106的相关内容。另一些实施例中,可以定位至其他标签,应用1在接收到用户对标签106的点击操作之后,显示标签106的相关内容。
参照图4H所示,区域a6中还包括:动态字幕开关107,通过操作动态字幕开关107可以为当前字幕添加具有文本元素逐个显示的字幕效果。
在一些实施例中,若在前置阶段,用户已添加过动态字幕,则此处可以显示为开启状态;若在前置阶段,用户未使用动态字幕,则此处可以显示为关闭状态,用户可以切换用户界面18中显示的动态字幕开关107的开关状态为打开状态。其中,图4H所示实施例中,动态字幕开关107为关闭状态。
此外,区域a6中还包括:设置字幕入场样式的标签108、设置字幕出场样式的标签109、设置字幕循环样式的标签110、设置动态字幕动画样式的标签111以及区域a7,其中,区域a7用于根据当前定位的标签,显示相应标签的内容。一些情况下,动态字幕开关107为关闭状态时,可以默认显示任一标签相关的内容,例如图4H所示用户界面17,默认显示标签108对应的相关内容。
其中,当应用1接收到用户针对用户界面18中动态字幕开关107执行 的操作(如点击操作),动态字幕开关107由关闭状态切换为打开状态时,应用1可在手机上示例性地显示如图4I所示的用户界面19,参照图4I所示,用户界面19中,动态字幕开关107为打开状态,且标签111为选中的状态,在区域a7用于显示与动态字幕相关的一种或多种可供用户选择的动态字幕动画样式,多种动态字幕动画样式对应的显示标识可以按照由左向右的方式依次排列,用户通过左右滑动屏幕可以来回进行查看。其中,应用1中默认的动态字幕动画样式可以显示由左向右的第一个位置,以便用户清楚了解应用1默认使用的动态字幕动画样式是哪个。
其中,区域a7中还可以包括禁用按钮112,禁用按钮可以设置在区域a7的最左侧,当然也可以设置在其他位置,本公开对此不做限定。当用户点击禁用按钮112对应关闭动态字幕效果,动态字幕开关107会切换为关闭状态。
假设用户点击了区域a7中由左向右的第二个动态字幕动画样式,相当于向应用1输入字幕动画样式切换指令,应用1响应字幕动画样式切换指令,为字幕文本中包括的各文本元素应用第二个动态字幕样式。用户可以多次切换动态字幕动画样式,直至得到符合用户预期的字幕效果。
在图4H所示的用户界面18以及图4I所示的用户界面19的基础上,区域a5中还包括:区域a8,区域a8用于展示文本编辑框,用户可以通过文本编辑框删除字幕文本中的文本元素、插入新增文本或者替换原先的文本元素,用户针对文本编辑框的操作相当于向应用1输入了删除指令、插入指令以及替换指令。在对区域a8中文本编辑框中的文本内容进行编辑时,编辑后的文本内容同步地展示在区域a1中所示的文本框104中,有利于用户预览编辑后的字幕内容以及字幕内容在多媒体素材片段的视频帧图像中的显示效果。
通过如上所示的图4F至图4I所示实施例,在后置阶段通过在字幕动画样式面板中设置动态字幕开关以及动态字幕动画样式标签,满足用户在后置环节添加动态字幕的需求以及调整动态字幕所采用的字幕动态样式的需求。
需要说明的是,上述图4A至图4I所示的交互界面示意图并不是对本公开提供的字幕处理方法的限制,应理解,一些控件、面板、标签的样式、触发方式等等均可以根据需求灵活调整。
图5为本公开一实施例提供的字幕处理装置的结构示意图。请参阅图5所示,本实施例提供的装置500包括:
语音识别模块501,用于在对多媒体素材进行编辑的过程中,对所述多媒体素材对应的音频进行语音识别得到所述音频对应的字幕文本以及所述字幕文本包括的各文本元素分别对应音频片段的时间戳信息。
匹配模块502,用于根据各所述文本元素分别对应音频片段的时间戳信息与所述多媒体素材中各素材单元进行匹配,确定所述多媒体素材中与各所述文本元素分别匹配的素材片段;其中,与所述文本元素相匹配的素材片段在编辑时间线上的时间与所述文本元素对应的音频片段在编辑时间线上的时间一致。
字幕合成模块503,用于将各所述文本元素分别与相匹配的时间范围内的素材片段合成得到有字幕文本逐字跳出动画效果的目标多媒体素材。
在一些实施例中,在所述编辑时间线上,与所述文本元素相匹配的素材片段所属时间的起始时刻与所述文本元素对应的音频片段的起始时刻一致;且在所述编辑时间线上,与所述文本元素相匹配的素材片段所属时间的结束时刻与所述文本元素对应的音频片段的结束时刻一致,或者,与所述文本元素相匹配的素材片段所属时间的结束时刻晚于所述文本元素对应的音频片段的结束时刻。
在一些实施例中,字幕合成模块503,具体用于为各所述文本元素批量应用指定的第一字幕动画样式,将具有第一字幕动画样式的文本元素与相匹配的时间内的素材片段合成得到有字幕文本采用所述第一字幕动画样式逐字跳出动画效果的所述目标多媒体素材。
可选地,装置500还包括:字幕文本更新模块504。
在一些实施例中,字幕文本更新模块504,用于响应文本删除指令,从所述字幕文本中删除相应的文本元素得到更新后的字幕文本。
相应地,匹配模块502,还用于根据所述更新后的字幕文本中各文本元素分别对应音频片段的时间戳信息,确定所述多媒体素材中与各所述文本元素分别匹配的素材片段。
字幕合成模块503,还用于将所述更新后的字幕文本包括的各所述文本元素分别与相匹配的时间内的素材片段合成,以为所述多媒体素材重新添加具有字幕文本逐字跳出动画效果的字幕。
在一些实施例中,字幕文本更新模块504,还用于响应文本插入指令, 在所述字幕文本中插入新增文本元素得到更新后的字幕文本。
相应地,匹配模块502,还用于根据所述更新后的字幕文本中各文本元素分别对应音频片段的时间戳信息与所述多媒体素材中各素材单元进行匹配,确定所述多媒体素材中与各所述文本元素分别匹配的素材片段;其中,新增文本元素与相邻的文本元素合并共用所述相邻的文本元素对应的音频片段的时间戳信息。
字幕合成模块503,还用于将所述更新后的字幕文本包括的各所述文本元素分别与相匹配的时间范围内的素材片段进行结合,以为所述多媒体素材重新添加具有字幕文本逐字跳出动画效果的字幕。
在一些实施例中,若所述新增文本元素的插入位置为所述字幕文本的最前端,则将所述新增文本元素与所述字幕文本中的第一个文本元素合并,共用所述第一个文本元素对应的音频片段的时间戳;若所述新增文本元素的插入位置为所述字幕文本的中间或者最末端位置,所述新增文本元素与相邻的前一个文本元素合并,共用前一个文本元素对应的音频片段的时间戳。
在一些实施例中,字幕文本更新模块504,还用于响应文本替换指令,采用替换文本替换所述字幕文本中的一个或多个文本元素得到更新后的字幕文本。
相应地,匹配模块502,还用于根据所述更新后的字幕文本中各文本元素分别对应音频片段的时间戳信息,确定所述多媒体素材中与各所述文本元素分别匹配的素材片段;其中,所述替换文本对应被替换文本元素所对应的音频片段的时间戳信息。
字幕合成模块503,还用于将所述更新后的字幕文本包括的各所述文本元素分别与相匹配的时间内的素材片段合成,以为所述多媒体素材重新添加具有字幕文本逐字跳出动画效果的字幕。
在一些实施例中,字幕合成模块503,还用于响应字幕动画样式切换指令,为各所述文本元素批量应用第二字幕动画样式,将具有所述第二字幕动画样式的文本元素与相匹配的时间范围内的素材片段合成得到有字幕文本采用第二字幕动画样式逐字跳出动画效果的目标多媒体素材。
在一些实施例中,所述多媒体素材对应的音频为所述多媒体素材包括的原始音频或者为所述多媒体素材添加的背景音乐。
本实施例提供的字幕处理装置可以用于执行前述任一方法实施例的技术方案,其实现原理以及技术效果类似,可参照前述方法实施例的详细描述,简明起见,此处不再赘述。
示例性地,本公开提供一种电子设备,包括:一个或多个处理器;存储器;以及一个或多个计算机程序;其中一个或多个计算机程序被存储在存储器中;一个或多个处理器在执行一个或多个计算机程序时,使得电子设备实现前文实施例的字幕处理方法。
示例性地,本公开提供一种芯片系统,芯片系统应用于包括存储器和传感器的电子设备;芯片系统包括:处理器;当处理器执行前文实施例的字幕处理方法。
示例性地,本公开提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器使得电子设备执行时实现前文实施例的字幕处理方法。
示例性地,本公开提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行前文实施例的字幕处理方法。
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所述的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (13)

  1. 一种字幕处理方法,包括:
    在对多媒体素材进行编辑的过程中,对所述多媒体素材对应的音频进行语音识别得到所述音频对应的字幕文本以及所述字幕文本包括的各文本元素分别对应音频片段的时间戳信息;
    根据各所述文本元素分别对应音频片段的时间戳信息与所述多媒体素材中各素材单元进行匹配,确定所述多媒体素材中与各所述文本元素分别匹配的素材片段;其中,与所述文本元素相匹配的素材片段在编辑时间线上的时间与所述文本元素对应的音频片段在编辑时间线上的时间一致;
    将各所述文本元素分别与相匹配的时间内的素材片段合成得到有字幕文本逐字跳出动画效果的目标多媒体素材。
  2. 根据权利要求1所述的方法,其中,在所述编辑时间线上,与所述文本元素相匹配的素材片段所属时间的起始时刻与所述文本元素对应的音频片段的起始时刻一致;且在所述编辑时间线上,与所述文本元素相匹配的素材片段所属时间的结束时刻与所述文本元素对应的音频片段的结束时刻一致,或者,与所述文本元素相匹配的素材片段所属时间的结束时刻晚于所述文本元素对应的音频片段的结束时刻。
  3. 根据权利要求1或2所述的方法,其中,所述将各所述文本元素分别与相匹配的时间内的素材片段合成得到有字幕文本逐字跳出动画效果的目标多媒体素材,包括:
    为各所述文本元素批量应用指定的第一字幕动画样式,将具有所述第一字幕动画样式的文本元素与相匹配的时间内的素材片段合成得到有字幕文本采用所述第一字幕动画样式逐字跳出动画效果的所述目标多媒体素材。
  4. 根据权利要求1-3任一项所述的方法,还包括:
    响应文本删除指令,从所述字幕文本中删除相应的文本元素得到更新后的字幕文本;
    根据所述更新后的字幕文本中各文本元素分别对应音频片段的时间戳信息,确定所述多媒体素材中与各所述文本元素分别匹配的素材片段;
    将所述更新后的字幕文本包括的各所述文本元素分别与相匹配的时间 内的素材片段合成,以为所述多媒体素材重新添加具有字幕文本逐字跳出动画效果的字幕。
  5. 根据权利要求1-4任一项所述的方法,还包括:
    响应文本插入指令,在所述字幕文本中插入新增文本元素得到更新后的字幕文本;
    根据所述更新后的字幕文本中各文本元素分别对应音频片段的时间戳信息与所述多媒体素材中各素材单元进行匹配,确定所述多媒体素材中与各所述文本元素分别匹配的素材片段;其中,新增文本元素与相邻的文本元素合并共用所述相邻的文本元素对应的音频片段的时间戳信息;
    将所述更新后的字幕文本包括的各所述文本元素分别与相匹配的时间范围内的素材片段进行结合,以为所述多媒体素材重新添加具有字幕文本逐字跳出动画效果的字幕。
  6. 根据权利要求5所述的方法,其中,若所述新增文本元素的插入位置为所述字幕文本的最前端,则将所述新增文本元素与所述字幕文本中的第一个文本元素合并,共用所述第一个文本元素对应的音频片段的时间戳;
    若所述新增文本元素的插入位置为所述字幕文本的中间或者最末端位置,所述新增文本元素与相邻的前一个文本元素合并,共用前一个文本元素对应的音频片段的时间戳。
  7. 根据权利要求1-6任一项所述的方法,还包括:
    响应文本替换指令,采用替换文本替换所述字幕文本中的一个或多个文本元素得到更新后的字幕文本;
    根据所述更新后的字幕文本中各文本元素分别对应音频片段的时间戳信息,确定所述多媒体素材中与各所述文本元素分别匹配的素材片段;其中,所述替换文本对应被替换文本元素所对应的音频片段的时间戳信息;
    将所述更新后的字幕文本包括的各所述文本元素分别与相匹配的时间内的素材片段合成,以为所述多媒体素材重新添加具有字幕文本逐字跳出动画效果的字幕。
  8. 根据权利要求3所述的方法,还包括:
    响应字幕动画样式切换指令,为各所述文本元素批量应用第二字幕动画样式,将具有所述第二字幕动画样式的文本元素与相匹配的时间范围内的素 材片段合成得到有字幕文本采用第二字幕动画样式逐字跳出动画效果的目标多媒体素材。
  9. 根据权利要求1至8任一项所述的方法,其中,所述多媒体素材对应的音频为所述多媒体素材包括的原始音频或者为所述多媒体素材添加的背景音乐。
  10. 一种字幕处理装置,包括:
    语音识别模块,用于在对多媒体素材进行编辑的过程中,对所述多媒体素材对应的音频进行语音识别得到所述音频对应的字幕文本以及所述字幕文本包括的各文本元素分别对应音频片段的时间戳信息;
    匹配模块,用于根据各所述文本元素分别对应音频片段的时间戳信息与所述多媒体素材中各素材单元进行匹配,确定所述多媒体素材中与各所述文本元素分别匹配的素材片段;其中,与所述文本元素相匹配的素材片段在编辑时间线上的时间与所述文本元素对应的音频片段在编辑时间线上的时间一致;
    字幕合成模块,用于将各所述文本元素分别与相匹配的时间范围内的素材片段合成得到有字幕文本逐字跳出动画效果的目标多媒体素材。
  11. 一种电子设备,包括:存储器和处理器,其中,
    所述存储器被配置为存储计算机程序指令;
    所述处理器被配置为执行所述计算机程序指令,使得所述电子设备实现如权利要求1至9任一项所述的字幕处理方法。
  12. 一种可读存储介质,包括:计算机程序指令,其中,
    电子设备的至少一个处理器执行所述计算机程序指令,使得所述电子设备实现如权利要求1至9任一项所述的字幕处理方法。
  13. 一种计算机程序产品,其中,电子设备执行所述计算机程序产品,使得所述电子设备实现如权利要求1至9任一项所述的字幕处理方法。
PCT/CN2023/118772 2022-09-14 2023-09-14 字幕处理方法及装置 WO2024056022A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP23820731.0A EP4362451A1 (en) 2022-09-14 2023-09-14 Subtitle processing method and device
US18/543,836 US20240119654A1 (en) 2022-09-14 2023-12-18 Subtitle processing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211117721.1A CN117749965A (zh) 2022-09-14 2022-09-14 字幕处理方法及装置
CN202211117721.1 2022-09-14

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/543,836 Continuation US20240119654A1 (en) 2022-09-14 2023-12-18 Subtitle processing method and apparatus

Publications (1)

Publication Number Publication Date
WO2024056022A1 true WO2024056022A1 (zh) 2024-03-21

Family

ID=89473228

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/118772 WO2024056022A1 (zh) 2022-09-14 2023-09-14 字幕处理方法及装置

Country Status (4)

Country Link
US (1) US20240119654A1 (zh)
EP (1) EP4362451A1 (zh)
CN (1) CN117749965A (zh)
WO (1) WO2024056022A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108419113A (zh) * 2018-05-24 2018-08-17 广州酷狗计算机科技有限公司 字幕显示方法及装置
CN109246472A (zh) * 2018-08-01 2019-01-18 平安科技(深圳)有限公司 视频播放方法、装置、终端设备及存储介质
CN109257547A (zh) * 2018-09-21 2019-01-22 南京邮电大学 中文在线音视频的字幕生成方法
CN111010614A (zh) * 2019-12-26 2020-04-14 北京奇艺世纪科技有限公司 一种显示直播字幕的方法、装置、服务器及介质
CN115209211A (zh) * 2022-09-13 2022-10-18 北京达佳互联信息技术有限公司 字幕显示方法、装置、电子设备、存储介质及程序产品

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108419113A (zh) * 2018-05-24 2018-08-17 广州酷狗计算机科技有限公司 字幕显示方法及装置
CN109246472A (zh) * 2018-08-01 2019-01-18 平安科技(深圳)有限公司 视频播放方法、装置、终端设备及存储介质
CN109257547A (zh) * 2018-09-21 2019-01-22 南京邮电大学 中文在线音视频的字幕生成方法
CN111010614A (zh) * 2019-12-26 2020-04-14 北京奇艺世纪科技有限公司 一种显示直播字幕的方法、装置、服务器及介质
CN115209211A (zh) * 2022-09-13 2022-10-18 北京达佳互联信息技术有限公司 字幕显示方法、装置、电子设备、存储介质及程序产品

Also Published As

Publication number Publication date
US20240119654A1 (en) 2024-04-11
CN117749965A (zh) 2024-03-22
EP4362451A1 (en) 2024-05-01

Similar Documents

Publication Publication Date Title
US20180286459A1 (en) Audio processing
KR20230042523A (ko) 멀티미디어 데이터의 처리 방법, 생성 방법 및 관련 기기
RU2627096C2 (ru) Способы изготовления прототипов мультимедиа-презентаций, устройства для изготовления прототипов мультимедиа-презентаций, способы использования устройств для изготовления прототипов мультимедиа-презентаций (варианты)
WO2021258821A1 (zh) 视频编辑方法、装置、终端及存储介质
CN110636365B (zh) 视频字符添加方法、装置、电子设备及存储介质
US20200143839A1 (en) Automatic video editing using beat matching detection
JP2001197366A (ja) 画像合成方法及び画像合成プログラムを記録した記録媒体
CN112102841A (zh) 一种音频编辑方法、装置和用于音频编辑的装置
WO2023061414A1 (zh) 一种文件生成方法、装置及电子设备
JP2023529571A (ja) オーディオとテキストとの同期方法、装置、読取可能な媒体及び電子機器
CN112040142B (zh) 用于移动终端上的视频创作的方法
WO2022206198A1 (zh) 一种音频和文本的同步方法、装置、设备以及介质
WO2024078514A1 (zh) 投屏方法、装置、电子设备和存储介质
WO2024041514A1 (zh) 视频播放方法、装置和电子设备
WO2024056022A1 (zh) 字幕处理方法及装置
WO2023179539A1 (zh) 视频编辑方法、装置及电子设备
WO2023093809A1 (zh) 文件编辑处理方法、装置和电子设备
EP4099711A1 (en) Method and apparatus and storage medium for processing video and timing of subtitles
WO2021057908A1 (zh) 即时译文显示方法、装置、移动终端和计算机存储介质
JP7119857B2 (ja) 編集プログラム、編集方法および編集装置
CN114793286A (zh) 基于虚拟形象的视频编辑方法和系统
CN112837668A (zh) 一种语音处理方法、装置和用于处理语音的装置
JP6367733B2 (ja) 画面制御装置、画面制御方法及び画面制御プログラム
WO2024099280A1 (zh) 视频编辑方法、装置、电子设备以及存储介质
JPH08115335A (ja) マルチメディア処理装置

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2023820731

Country of ref document: EP

Effective date: 20231218