WO2022217944A1 - Method for binding subtitle with audio source, and apparatus - Google Patents

Method for binding subtitle with audio source, and apparatus Download PDF

Info

Publication number
WO2022217944A1
WO2022217944A1 PCT/CN2021/135470 CN2021135470W WO2022217944A1 WO 2022217944 A1 WO2022217944 A1 WO 2022217944A1 CN 2021135470 W CN2021135470 W CN 2021135470W WO 2022217944 A1 WO2022217944 A1 WO 2022217944A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
segment
subtitle
audio source
positional relationship
Prior art date
Application number
PCT/CN2021/135470
Other languages
French (fr)
Chinese (zh)
Inventor
陈圣宾
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2022217944A1 publication Critical patent/WO2022217944A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Studio Circuits (AREA)

Abstract

Provided in the present application are a method for binding a subtitle with an audio source, an apparatus, an electronic device, a computer storage medium, and a computer program product, which comprise: determining a target audio source segment in a target video, and identifying and obtaining a target subtitle segment via the target audio source segment; determining a relative positional relationship between a subtitle starting position of the target subtitle segment and an audio source starting position of the target audio source segment; and if an adjustment operation is performed on the target audio source segment and/or the target subtitle segment, preserving the relative positional relationship.

Description

字幕与音源的绑定方法及装置Method and device for binding subtitle and audio source
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请基于申请日为2021年4月14日、申请号为202110402833.0号的中国专利申请,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入作为参考。This application is based on a Chinese patent application with an application date of April 14, 2021 and an application number of 202110402833.0, and claims the priority of the Chinese patent application, the entire contents of which are incorporated herein by reference.
技术领域technical field
本申请实施例涉及计算机技术领域,尤其涉及一种字幕与音源的绑定方法、装置、电子设备、计算机存储介质及计算机程序产品。The embodiments of the present application relate to the field of computer technologies, and in particular, to a method, apparatus, electronic device, computer storage medium, and computer program product for binding subtitles and audio sources.
背景技术Background technique
视频编辑场景中,视频对象通常包含较多的音源片段,如视频主轨原声音源、导入的配音、配乐音源等,并且还可以为每个音源片段配置对应的字幕片段,从而达到更清楚的视频表达效果。In the video editing scene, the video object usually contains more audio clips, such as the original sound source of the main track of the video, the imported dubbing, and the dubbing audio source, etc., and the corresponding subtitle clip can also be configured for each audio clip, so as to achieve a clearer video. expressive effect.
相关技术中,视频对象具有视频主轨,视频主轨可以反映整个视频对象的内容的播放时序,目前,可以将视频对象中每个的音源片段的音源头部,与视频主轨上该音源头部对应的时刻进行绑定,并且将字幕片段的字幕头部,与视频主轨上该字幕头部对应的时刻进行绑定,另外,为了达到较优的视频表达效果,需要满足音源片段与对应字幕片段在播放时序上对齐的需求。In the related art, a video object has a video main track, and the video main track can reflect the playback timing of the content of the entire video object. The corresponding moment of the subtitle segment is bound, and the subtitle header of the subtitle segment is bound to the moment corresponding to the subtitle header on the video main track. In addition, in order to achieve a better video expression effect, it is necessary to satisfy the audio source segment and the corresponding moment. The need for subtitle clips to be aligned in playback timing.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供一种字幕与音源的绑定方法、装置、电子设备、计算机存储介质及计算机程序产品。Embodiments of the present application provide a method, apparatus, electronic device, computer storage medium, and computer program product for binding subtitles and audio sources.
第一方面,本申请实施例提供了一种字幕与音源的绑定方法,该方法包括:In a first aspect, an embodiment of the present application provides a method for binding subtitles and audio sources, and the method includes:
确定目标视频中的目标音源片段,并由所述目标音源片段识别得到目标字幕片段;Determine the target audio clip in the target video, and identify the target subtitle clip by the target audio clip;
确定所述目标字幕片段的字幕头部位置,与所述目标音源片段的音源头部位置之间的相对位置关系;Determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment;
在执行对所述目标音源片段和/或目标字幕片段的调整操作的情况下,保持所述相对位置关系不变。In the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship is kept unchanged.
在一些实施例中,所述在执行对所述目标音源片段和/或目标字幕片段的调整操作的情况下,保持所述相对位置关系不变,包括:In some embodiments, maintaining the relative positional relationship unchanged in the case of performing an adjustment operation on the target audio source segment and/or the target subtitle segment includes:
在调整后的目标音源片段与目标字幕片段之间不具有重叠部分的情况下,触发字幕处理操作;In the case that there is no overlap between the adjusted target audio source segment and the target subtitle segment, trigger the subtitle processing operation;
响应于保留字幕的字幕处理操作,将所述目标字幕片段与所述目标视频的主轨道绑定。The target subtitle segment is bound to the main track of the target video in response to a subtitle processing operation for retained subtitles.
在一些实施例中,调整后的目标音源片段与目标字幕片段之间不具有重叠部分的情况包括:执行了删除所述目标音源片段的情况,或执行了变更所述字幕头部位置和/或所述音源头部位置的情况,或分割所述目标音源片段得到多个音源子片段,且删除了所述多个音源子片段中位于头部的音源子片段的情况。In some embodiments, the case where there is no overlap between the adjusted target audio source segment and the target subtitle segment includes: performing deletion of the target audio source segment, or performing changing the position of the subtitle header and/or performing The case of the head position of the sound source, or the case of dividing the target sound source segment to obtain multiple sound source sub-segments, and deleting the sound source sub-segment located at the head of the multiple sound source sub-segments.
在一些实施例中,所述方法还包括:In some embodiments, the method further includes:
在调整后的目标音源片段与目标字幕片段之间具有重叠部分的情况下,检测调整后的新的字幕头部位置和音源头部位置;In the case that there is an overlap between the adjusted target audio source segment and the target subtitle segment, detecting the adjusted new subtitle header position and audio source header position;
根据所述新的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述更新后的相对位置关系不变。The relative positional relationship is updated according to the new subtitle header position and the audio source header position, and the updated relative positional relationship is kept unchanged.
在一些实施例中,所述方法还包括:In some embodiments, the method further includes:
在将所述目标字幕片段移动至超出所述目标视频的主轨道的边界的情况下,将所述目标字幕片段中超出所述边界的部分进行删除。When the target subtitle segment is moved beyond the boundary of the main track of the target video, the portion of the target subtitle segment that exceeds the boundary is deleted.
在一些实施例中,所述目标音源片段置于音源轨道进行展示;所述目标字幕片段置于字幕轨道进行展示,所述音源轨道、所述字幕轨道与所述目标视频的主轨道采用同一时序;所述目标音源片段与所述主轨道绑定。In some embodiments, the target audio source segment is displayed on an audio source track; the target subtitle segment is displayed on a subtitle track, and the audio source track, the subtitle track and the main track of the target video use the same timing sequence ; The target sound source segment is bound to the main track.
在一些实施例中,所述在执行对所述目标音源片段和/或目标字幕片段的调整操作的情况下,保持所述相对位置关系不变,包括:In some embodiments, maintaining the relative positional relationship unchanged in the case of performing an adjustment operation on the target audio source segment and/or the target subtitle segment includes:
在执行变更所述字幕头部位置和/或所述音源头部位置的调整操作的情况下,根据变更后的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述更新后的相对位置关系不变。In the case of performing an adjustment operation for changing the subtitle header position and/or the audio source header position, the relative positional relationship is updated according to the changed subtitle header position and audio source header position, and the The updated relative position relationship remains unchanged.
在一些实施例中,所述在执行对所述目标音源片段的调整操作的情况下,保持所述相对位置关系不变,包括:In some embodiments, maintaining the relative positional relationship unchanged in the case of performing an adjustment operation on the target sound source segment includes:
在执行整体移动所述目标音源片段的位置的调整操作的情况下,将所述目标字幕片段的位置跟随所述目标音源片段整体移动,并保持所述相对位置关系不变。In the case of performing the adjustment operation of moving the position of the target audio source segment as a whole, the position of the target subtitle segment is moved as a whole following the target audio source segment, and the relative positional relationship is kept unchanged.
在一些实施例中,所述在执行对所述目标音源片段的调整操作的情况下,保持所述相对位置关系不变,包括:In some embodiments, maintaining the relative positional relationship unchanged in the case of performing an adjustment operation on the target sound source segment includes:
在执行替换所述目标音源片段的调整操作的情况下,确定所述目标字幕片段的字幕头部位置和替换后的目标音源片段的音源头部位置之间的新的相对位置关系,并保持所述新的相对位置关系不变。In the case of performing the adjustment operation of replacing the target audio source segment, determine a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the replaced target audio source segment, and keep the The new relative position relationship remains unchanged.
在一些实施例中,所述在执行对所述目标音源片段的调整操作的情况下,保持所述相对位置关系不变,包括:In some embodiments, maintaining the relative positional relationship unchanged in the case of performing an adjustment operation on the target sound source segment includes:
在根据预设的变速值,执行对所述目标音源片段的变速操作的情况下,将所述目标字幕片段按照所述变速值执行变速操作,并保持所述相对位置关系不变。In the case of performing a speed change operation on the target audio source segment according to a preset speed change value, the target subtitle segment is subjected to a speed change operation according to the speed change value, and the relative positional relationship is kept unchanged.
在一些实施例中,所述将所述目标字幕片段按照所述变速值执行变速操作,包括:In some embodiments, performing a speed change operation on the target subtitle segment according to the speed change value includes:
在变速前的目标音源片段的第一时长大于所述目标字幕片段的第二时长的情况下,则 将所述目标字幕片段按照所述预设的变速值执行变速操作;Under the situation that the first duration of the target audio source segment before the variable speed is greater than the second duration of the target subtitle segment, then the target subtitle segment is performed according to the preset variable speed value.
在变速前的目标音源片段的第一时长小于所述目标字幕片段的第二时长的情况下,则将所述目标字幕片段中第一时长的部分按照所述预设的变速值执行变速操作。In the case that the first duration of the target audio source segment before the speed change is smaller than the second duration of the target subtitle segment, the speed change operation is performed on the part of the target subtitle segment with the first duration according to the preset speed change value.
在一些实施例中,所述在执行对所述目标音源片段的调整操作的情况下,保持所述相对位置关系不变,包括:In some embodiments, maintaining the relative positional relationship unchanged in the case of performing an adjustment operation on the target sound source segment includes:
在执行分割所述目标音源片段的调整操作的情况下,得到分割后的多个音源子片段;In the case of performing the adjustment operation of dividing the target sound source segment, obtain a plurality of divided sound source sub-segments;
将所述目标字幕片段的字幕头部位置与目标音源子片段的音源头部位置建立新的相对位置关系,并保持所述新的相对位置关系不变,所述目标音源子片段为所有音源子片段中处于头部位置的音源子片段。Establish a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio sub-segment, and keep the new relative positional relationship unchanged, and the target audio sub-segment is all audio sub-segments. The sound source subclip in the clip at the head position.
第二方面,本申请实施例提供了一种字幕与音源的绑定装置,该装置包括:In a second aspect, an embodiment of the present application provides an apparatus for binding subtitles and audio sources, and the apparatus includes:
识别模块,被配置为确定目标视频中的目标音源片段,并由所述目标音源片段识别得到目标字幕片段;an identification module, configured to determine a target sound source segment in the target video, and identify a target subtitle segment by the target sound source segment;
绑定模块,被配置为确定所述目标字幕片段的字幕头部位置,与所述目标音源片段的音源头部位置之间的相对位置关系;a binding module, configured to determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment;
保持模块,被配置为在执行对所述目标音源片段和/或目标字幕片段的调整操作的情况下,保持所述相对位置关系不变。The maintaining module is configured to maintain the relative positional relationship unchanged when the adjustment operation on the target audio source segment and/or the target subtitle segment is performed.
在一些实施例中,所述保持模块,包括:In some embodiments, the retention module includes:
触发子模块,被配置为在调整后的目标音源片段与目标字幕片段之间不具有重叠部分的情况下,触发字幕处理操作;a triggering submodule, configured to trigger a subtitle processing operation when there is no overlap between the adjusted target audio source segment and the target subtitle segment;
绑定子模块,被配置为响应于保留字幕的字幕处理操作,将所述目标字幕片段与所述目标视频的主轨道绑定。The binding submodule is configured to bind the target subtitle segment with the main track of the target video in response to the subtitle processing operation of the retained subtitle.
在一些实施例中,调整后的目标音源片段与目标字幕片段之间不具有重叠部分的情况包括:执行了删除所述目标音源片段的情况,或执行了变更所述字幕头部位置和/或所述音源头部位置的情况,或分割所述目标音源片段得到多个音源子片段,且删除了所述多个音源子片段中位于头部的音源子片段的情况。In some embodiments, the case where there is no overlap between the adjusted target audio source segment and the target subtitle segment includes: performing deletion of the target audio source segment, or performing changing the position of the subtitle header and/or performing The case of the head position of the sound source, or the case of dividing the target sound source segment to obtain multiple sound source sub-segments, and deleting the sound source sub-segment located at the head of the multiple sound source sub-segments.
在一些实施例中,所述装置还包括:In some embodiments, the apparatus further includes:
检测模块,被配置为在调整后的目标音源片段与目标字幕片段之间具有重叠部分的情况下,检测调整后的新的字幕头部位置和音源头部位置;The detection module is configured to detect the adjusted new subtitle head position and the audio source head position when there is an overlap between the adjusted target audio source segment and the target subtitle segment;
更新模块,被配置为根据所述新的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述更新后的相对位置关系不变。The updating module is configured to update the relative positional relationship according to the new subtitle header position and the audio source header position, and keep the updated relative positional relationship unchanged.
在一些实施例中,所述装置还包括:In some embodiments, the apparatus further includes:
删除模块,被配置为在将所述目标字幕片段移动至超出所述目标视频的主轨道的边界的情况下,将所述目标字幕片段中超出所述边界的部分进行删除。The deletion module is configured to delete the part of the target subtitle segment beyond the boundary in the case of moving the target subtitle segment beyond the boundary of the main track of the target video.
在一些实施例中,所述目标音源片段置于音源轨道进行展示;所述目标字幕片段置于字幕轨道进行展示,所述音源轨道、所述字幕轨道与所述目标视频的主轨道采用同一时序; 所述目标音源片段与所述主轨道绑定。In some embodiments, the target audio source segment is displayed on an audio source track; the target subtitle segment is displayed on a subtitle track, and the audio source track, the subtitle track and the main track of the target video use the same timing sequence ; The target audio source segment is bound to the main track.
在一些实施例中,所述保持模块,包括:In some embodiments, the retention module includes:
更新子模块,被配置为在执行变更所述字幕头部位置和/或所述音源头部位置的调整操作的情况下,根据变更后的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述更新后的相对位置关系不变。The update sub-module is configured to update the relative subtitle head position and the audio source head position according to the changed subtitle head position and the audio source head position when the adjustment operation of changing the subtitle head position and/or the audio source head position is performed. position relationship, and keep the updated relative position relationship unchanged.
在一些实施例中,所述保持模块,包括:In some embodiments, the retention module includes:
移动子模块,被配置为在执行整体移动所述目标音源片段的位置的调整操作的情况下,将所述目标字幕片段的位置跟随所述目标音源片段整体移动,并保持所述相对位置关系不变。The moving sub-module is configured to move the position of the target subtitle segment along with the target sound source segment as a whole under the condition of performing an adjustment operation for moving the position of the target audio source segment as a whole, and keep the relative position relationship unchanged. Change.
在一些实施例中,所述保持模块,包括:In some embodiments, the retention module includes:
替换子模块,被配置为在执行替换所述目标音源片段的调整操作的情况下,确定所述目标字幕片段的字幕头部位置和替换后的目标音源片段的音源头部位置之间的新的相对位置关系,并保持所述新的相对位置关系不变。The replacement submodule is configured to determine a new subtitle position between the subtitle header position of the target subtitle fragment and the audio source header position of the replaced target audio source fragment when performing the adjustment operation of replacing the target audio source fragment. relative positional relationship, and keep the new relative positional relationship unchanged.
在一些实施例中,所述保持模块,包括:In some embodiments, the retention module includes:
变速子模块,被配置为在根据预设的变速值,执行对所述目标音源片段的变速操作的情况下,将所述目标字幕片段按照所述变速值执行变速操作,并保持所述相对位置关系不变。The variable speed sub-module is configured to perform a variable speed operation on the target subtitle segment according to the variable speed value and maintain the relative position in the case of performing a variable speed operation on the target audio source segment according to a preset variable speed value The relationship remains unchanged.
在一些实施例中,所述变速子模块,包括:In some embodiments, the transmission sub-module includes:
第一变速单元,被配置为在变速前的目标音源片段的第一时长大于所述目标字幕片段的第二时长的情况下,则将所述目标字幕片段按照所述预设的变速值执行变速操作;The first shifting unit is configured to perform shifting of the target subtitle segment according to the preset shifting value when the first duration of the target audio segment before shifting is greater than the second duration of the target subtitle segment operate;
第二变速单元,被配置为在变速前的目标音源片段的第一时长小于所述目标字幕片段的第二时长的情况下,则将所述目标字幕片段中第一时长的部分按照所述预设的变速值执行变速操作。The second speed changing unit is configured to, when the first duration of the target audio source segment before shifting is smaller than the second duration of the target subtitle segment, change the part of the first duration in the target subtitle segment according to the preset The speed change operation is performed at the set speed change value.
在一些实施例中,所述保持模块,包括:In some embodiments, the retention module includes:
分割子模块,被配置为在执行分割所述目标音源片段的调整操作的情况下,得到分割后的多个音源子片段;A segmentation sub-module, configured to obtain a plurality of divided audio sub-segments when performing an adjustment operation of segmenting the target audio segment;
裁剪子模块,被配置为将所述目标字幕片段的字幕头部位置与目标音源子片段的音源头部位置建立新的相对位置关系,并保持所述新的相对位置关系不变,所述目标音源子片段为所有音源子片段中处于头部位置的音源子片段。The cropping submodule is configured to establish a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio source subsegment, and keep the new relative positional relationship unchanged. The sound source sub-segment is the sound source sub-segment at the head position among all the sound source sub-segments.
第三方面,本申请实施例还提供了一种电子设备,包括用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令,以实现所述的字幕与音源的绑定。In a third aspect, an embodiment of the present application further provides an electronic device, including a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement the subtitles Binding to the sound source.
第四方面,本申请实施例还提供了一种存储介质,当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行所述的字幕与音源的绑定。In a fourth aspect, an embodiment of the present application further provides a storage medium, when the instructions in the computer-readable storage medium are executed by the processor of the electronic device, the electronic device can perform the binding between the subtitles and the audio source. .
第五方面,本申请实施例还提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现所述的字幕与音源的绑定。In a fifth aspect, an embodiment of the present application further provides a computer program product, including a computer program, which realizes the binding of the subtitles and the audio source when the computer program is executed by the processor.
在本申请实施例中,本申请包括:确定目标视频中的目标音源片段,并由目标音源片段识别得到目标字幕片段;确定目标字幕片段的字幕头部位置,与目标音源片段的音源头部位置之间的相对位置关系;在执行对目标音源片段和/或目标字幕片段的调整操作的情况下,保持相对位置关系不变。本申请可以将目标字幕片段与目标音源片段的头部位置之间的相对位置关系进行绑定,使得对主轨道的编辑过程与对字幕和音源的编辑过程相互隔离,对主轨道的编辑操作不会影响到字幕与音源的对齐关系,从而降低了音源与字幕发生错位的几率。In the embodiment of the present application, the present application includes: determining the target audio source segment in the target video, and identifying the target subtitle segment by the target audio source segment; determining the position of the subtitle header of the target subtitle segment and the position of the audio source header of the target audio source segment The relative positional relationship between them; in the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship remains unchanged. The present application can bind the relative positional relationship between the head position of the target subtitle segment and the target audio source segment, so that the editing process of the main track and the editing process of the subtitle and audio source are isolated from each other, and the editing operation of the main track is not It will affect the alignment between the subtitles and the audio source, thereby reducing the chance of misalignment between the audio source and subtitles.
附图说明Description of drawings
通过阅读下文实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the embodiments. The drawings are for illustrative purposes only and are not to be considered limiting of the application. Also, the same components are denoted by the same reference numerals throughout the drawings. In the attached image:
图1是本申请实施例提供的一种字幕与音源的绑定方法的步骤流程图;1 is a flowchart of steps of a method for binding subtitles and audio sources provided by an embodiment of the present application;
图2是本申请实施例提供的一种字幕与音源的绑定界面图;2 is a binding interface diagram of a subtitle and audio source provided by an embodiment of the present application;
图3是本申请实施例提供的另一种字幕与音源的绑定界面图;Fig. 3 is another binding interface diagram of subtitles and audio sources provided by an embodiment of the present application;
图4是本申请实施例提供的一种字幕与音源的绑定方法的步骤流程图;4 is a flowchart of steps of a method for binding subtitles and audio sources provided by an embodiment of the present application;
图5是本申请实施例提供的另一种字幕与音源的绑定界面图;Fig. 5 is another binding interface diagram of subtitles and audio sources provided by an embodiment of the present application;
图6是本申请实施例提供的另一种字幕与音源的绑定界面图;Fig. 6 is another binding interface diagram of subtitles and audio sources provided by an embodiment of the present application;
图7是本申请实施例提供的另一种字幕与音源的绑定界面图;Fig. 7 is another binding interface diagram of subtitles and audio sources provided by an embodiment of the present application;
图8是本申请实施例提供的一种字幕处理操作的界面图;8 is an interface diagram of a subtitle processing operation provided by an embodiment of the present application;
图9是本申请实施例提供的另一种字幕与音源的绑定界面图;Fig. 9 is another binding interface diagram of subtitles and audio sources provided by an embodiment of the present application;
图10是本申请实施例提供的另一种字幕与音源的绑定界面图;FIG. 10 is another interface diagram of binding between subtitles and audio sources provided by an embodiment of the present application;
图11是本申请实施例提供的另一种字幕与音源的绑定界面图;11 is another interface diagram for binding subtitles and audio sources provided by an embodiment of the present application;
图12是本申请实施例提供的另一种字幕与音源的绑定界面图;Fig. 12 is another binding interface diagram of subtitles and audio sources provided by an embodiment of the present application;
图13是本申请实施例提供的一种字幕与音源的绑定装置的框图;13 is a block diagram of an apparatus for binding subtitles and audio sources provided by an embodiment of the present application;
图14是本申请一个实施例的电子设备的逻辑框图;以及FIG. 14 is a logical block diagram of an electronic device according to an embodiment of the present application; and
图15是本申请另一个实施例的电子设备的逻辑框图。FIG. 15 is a logical block diagram of an electronic device according to another embodiment of the present application.
具体实施方式Detailed ways
下面将参照附图更详细地描述本申请的示例性实施例。虽然附图中显示了本申请的示例性实施例,然而应当理解,可以以各种形式实现本申请而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本申请,并且能够将本申请的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the application will be more thoroughly understood, and will fully convey the scope of the application to those skilled in the art.
图1是本申请实施例提供的一种字幕与音源的绑定方法的步骤流程图,该方法可以由服务器、处理器、车载设备、移动设备、计算设备等执行。如图1所示,该方法可以包括:FIG. 1 is a flowchart of steps of a method for binding subtitles and audio sources provided by an embodiment of the present application. The method may be executed by a server, a processor, a vehicle-mounted device, a mobile device, a computing device, and the like. As shown in Figure 1, the method may include:
步骤101,确定目标视频中的目标音源片段,并由所述目标音源片段识别得到目标字幕片段。Step 101: Determine the target audio source segment in the target video, and identify the target subtitle segment from the target audio source segment.
视频中通常可以包括一段或多段音源片段,音源片段是指视频中出现的声音片段,是一种音色资源,音源片段的类型可以包括主轨原声音源、画中画原声音源、插入音乐/录音/配音音源等,其中,主轨原声音源是视频的原声声音内容;画中画原声音源是插入该视频的一段画中画视频的原声声音内容;插入音乐/录音/配音音源是指额外插入该视频的音乐/录音/配音。A video can usually include one or more audio clips. Audio clips refer to the sound clips that appear in the video and are a kind of timbre resource. The types of audio clips can include main track original sound source, picture-in-picture original sound source, inserted music / recording / Dubbing sound source, etc., where the main track original sound source is the original sound content of the video; the picture-in-picture original sound source is the original sound content of a picture-in-picture video inserted into the video; inserting a music/recording/dubbing sound source refers to additionally inserting the video music/recording/dubbing.
在本申请实施例中,可以通过对目标视频内容的分析,提取其中的目标音源片段,目标音源片段可以为目标视频所包含的所有音源片段中的任一片段,在提取得到目标音源片段后,可以将目标音源片段进行语音识别,得到对应的文本,识别得到的文本内容可以作为由目标音源片段识别得到的目标字幕片段,在播放目标视频的过程中,该目标字幕片段即可作为目标音源片段的展示字幕。In the embodiment of the present application, the target audio clip can be extracted by analyzing the content of the target video, and the target audio clip can be any clip of all the audio clips included in the target video. After the target audio clip is extracted, The target sound source segment can be subjected to speech recognition to obtain the corresponding text, and the recognized text content can be used as the target subtitle segment identified by the target audio source segment. In the process of playing the target video, the target subtitle segment can be used as the target audio source segment. display subtitles.
例如,在一个视频包含两部分音源片段,一部分为视频原声配乐,另一部分为插入的解说配音的情况下,针对这两部分音源片段,可以通过语音识别得到原声配乐对应的字幕片段,以及解说配音得到的字幕片段。For example, if a video contains two parts of audio source clips, one part is the original soundtrack of the video, and the other part is the inserted narration dubbing, for the two parts of the audio source clips, the subtitle clips corresponding to the original soundtrack can be obtained through speech recognition, as well as the narration dubbing. The resulting subtitle clip.
步骤102、确定所述目标字幕片段的字幕头部位置,与所述目标音源片段的音源头部位置之间的相对位置关系。Step 102: Determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio source segment.
在实际应用中,视频具有固定的播放时长,由播放时长可以得到对应的播放时序,如,视频从起始点0分0秒播放至终点10分30秒的时序,基于播放时序可以构建得到视频主轨道,视频主轨道由视频的多帧视频帧组成,视频主轨道可以按照视频的播放时序,将视频以帧序列流的形式展示。用户可以对视频主轨道进行操作,从而便捷的查看、选取和编辑视频不同位置的内容。In practical applications, the video has a fixed playback duration, and the corresponding playback timing can be obtained from the playback duration. For example, the video is played from the starting point 0 minutes and 0 seconds to the end point of 10 minutes and 30 seconds. Track, the main video track is composed of multiple video frames of the video, and the main video track can display the video in the form of a frame sequence stream according to the playback timing of the video. Users can operate the main track of the video, so as to conveniently view, select and edit the content in different positions of the video.
另外,为了达到较佳的播放效果,目标音源片段与对应的目标字幕片段之间需要对齐,一种较常见的音源字幕不对齐的情况是音源与画面中字幕发生错位,如歌词和音乐发生错位。其中,该对齐的含义并不仅仅包含音源中某一字的发音时刻要与字幕中该字对应的文本的展示时刻完全重叠,如,在外语翻译字幕的播放时,由于两种语言的不同,在字幕的时长大于音源的时长的情况下,在播放的时候可以先展示字幕,再播放对应音源;在字幕的时长小于音源的时长的情况下,在播放的时候可以先播放音源,再展示对应字幕。In addition, in order to achieve a better playback effect, the target audio source segment and the corresponding target subtitle segment need to be aligned. A common situation where the audio source subtitles are not aligned is that the audio source and the subtitles in the picture are misaligned, such as lyrics and music. Misalignment . Among them, the meaning of the alignment does not only include that the pronunciation moment of a word in the audio source must completely overlap with the display moment of the text corresponding to the word in the subtitle. When the duration of the subtitles is longer than the duration of the audio source, the subtitles can be displayed first, and then the corresponding audio source is played; when the duration of the subtitles is shorter than the duration of the audio source, the audio source can be played first, and then the corresponding audio source is displayed. subtitle.
相关技术中,音源片段和字幕片段都与视频的主轨道进行绑定,因此随着用户正常对主轨道的编辑操作,会不可避免的对字幕片段产生影响,从而导致音源与字幕发生错位的几率大大提升。In the related art, both the audio source segment and the subtitle segment are bound to the main track of the video. Therefore, with the normal editing operation of the main track by the user, the subtitle segment will inevitably be affected, resulting in the possibility of misalignment between the audio source and the subtitle. Huge improvements.
本申请实施例中,为了降低字幕与对应音源发生错位的几率,可以将目标字幕片段与对应的目标音源片段进行绑定,这样对主轨道的操作不会影响到字幕与音源的对齐关系,从而降低了音源与字幕发生错位的几率。In the embodiment of the present application, in order to reduce the probability of the subtitles and the corresponding audio source being misaligned, the target subtitle segment and the corresponding target audio source segment can be bound, so that the operation on the main track will not affect the alignment between the subtitles and the audio source. Reduced the chance of misalignment between audio source and subtitles.
在一些实施例中,目标字幕片段具有字幕头部位置,目标音源片段具有音源头部位置。 字幕头部位置可以理解为目标字幕片段的起始点在主轨道上对应的时刻;音源头部位置可以理解为目标音源片段的起始点在主轨道上对应的时刻;本申请实施例将目标字幕片段与对应的目标音源片段进行绑定,可以通过将目标字幕片段的字幕头部位置,与目标音源片段的音源头部位置进行绑定实现,并保持两个头部位置之间的相对位置关系不变。In some embodiments, the target subtitle segment has a subtitle header position, and the target audio source segment has an audio source header position. The position of the subtitle head can be understood as the time corresponding to the starting point of the target subtitle segment on the main track; the position of the audio source head can be understood as the time corresponding to the starting point of the target audio segment on the main track; Binding with the corresponding target audio clip can be achieved by binding the subtitle head position of the target subtitle clip with the audio head position of the target audio clip, and maintaining the relative positional relationship between the two head positions. Change.
例如,参照图2,其示出了本申请实施例提供的一种字幕与音源的绑定界面图,在对目标视频识别得到了两个目标音源片段A、B的情况下,将两个目标音源片段A、B进行语音识别,得到了目标音源片段A对应的目标字幕片段a,以及目标音源片段B对应的目标字幕片段b。其中,目标音源片段A、B可以在音源轨道10中进行展示,目标字幕片段a、b可以在字幕轨道20中进行展示,音源轨道、字幕轨道与目标视频的主轨道采用同一时序。For example, referring to FIG. 2 , which shows a binding interface diagram of a subtitle and an audio source provided by an embodiment of the present application, in the case where two target audio source segments A and B are obtained by identifying the target video, the two target The audio source segments A and B perform speech recognition to obtain a target subtitle segment a corresponding to the target audio source segment A and a target subtitle segment b corresponding to the target audio segment B. The target audio source segments A and B can be displayed in the audio source track 10, and the target subtitle segments a and b can be displayed in the subtitle track 20. The audio source track, subtitle track and the main track of the target video use the same timing sequence.
其中,为了达到提前解释目标音源片段A的目的,可以将目标字幕片段a在目标视频一开始播放时就进行展示,则目标音源片段A与目标字幕片段a的绑定可以为:确定目标音源片段A的音源头部位置(00:10)与目标字幕片段a的字幕头部位置(00:00)之间的相对位置关系,并维持该相对位置关系不变,从而达到二者的对齐的目的;针对目标音源片段B和目标字幕片段b严格字音对齐的需求,则目标音源片段A与目标字幕片段a的绑定可以为:确定目标音源片段B的音源头部位置(00:50)与目标字幕片段b的字幕头部位置(00:50)之间的相对位置关系,并维持该相对位置关系不变,从而达到二者的对齐的目的。Among them, in order to achieve the purpose of explaining the target audio segment A in advance, the target subtitle segment a can be displayed when the target video starts playing, and the binding between the target audio segment A and the target subtitle segment a can be: Determine the target audio segment The relative positional relationship between the audio source head position (00:10) of A and the subtitle head position (00:00) of the target subtitle segment a, and keep the relative positional relationship unchanged, so as to achieve the purpose of aligning the two ; Aiming at the requirement of strict phonetic alignment between the target audio source fragment B and the target subtitle fragment b, the binding of the target audio source fragment A and the target subtitle fragment a can be: determine the audio source head position (00:50) of the target audio source fragment B and the target audio source fragment B. The relative positional relationship between the subtitle header positions (00:50) of the subtitle segment b, and the relative positional relationship is maintained unchanged, so as to achieve the purpose of aligning the two.
步骤103、在执行对所述目标音源片段和/或目标字幕片段的调整操作的情况下,保持所述相对位置关系不变。Step 103: In the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, keep the relative positional relationship unchanged.
在本申请实施例中,在绑定目标字幕片段的字幕头部位置,与目标音源片段的音源头部位置之间的相对位置关系之后,后续对目标音源片段、目标字幕片段的调整操作,只要不涉及对字幕头部位置和音源头部位置的调整,都不会对上述相对位置关系的绑定造成影响,并且,后续根据实际需求进行目标音源片段、目标字幕片段除了头部位置的其他调整操作时,也可以保持上述相对位置关系不变,从而达到字幕和音源对齐的目的。In this embodiment of the present application, after binding the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment, the subsequent adjustment operations on the target audio segment and the target subtitle segment are as long as It does not involve the adjustment of the position of the subtitle head and the head position of the audio source, and will not affect the binding of the above-mentioned relative position relationship, and the target audio source clip and target subtitle clip will be adjusted according to actual needs, except for the head position. During operation, the above relative positional relationship can also be kept unchanged, so as to achieve the purpose of aligning subtitles and audio sources.
相关技术中,例如,当音源片段为非主轨原声音源的情况下,在主轨道上进行变速、裁剪等操作,该音源片段对应的字幕片段也会发生变速、裁剪等改变,而音源片段由于是非主轨原声音源,则不会跟随发生改变,从而导致严重的音源字幕错位。In the related art, for example, when the audio source segment is not the original sound source of the main track, and operations such as speed change and trimming are performed on the main track, the corresponding subtitle segment of the audio source segment will also undergo changes such as speed change and trimming. If it is the original sound source of the non-main track, it will not follow the change, resulting in serious subtitle misalignment of the audio source.
参照图2和图3,参照图3,其示出了本申请实施例提供的另一种字幕与音源的绑定界面图,在目标音源片段A、B都为非主轨原声音源,且用户对视频的主轨道30中的区域31进行了2倍变速的情况下,由于目标音源片段A为非主轨原声音源,所以目标音源片段A不会跟随变速,且由于目标字幕片段a现在不与主轨道绑定,而是与目标字幕片段A实现了头部位置绑定,所以目标字幕片段a也不会跟随变速,使得目标字幕片段a与目标字幕片段A的头部位置绑定,并维持了头部位置之间的相对位置关系不变,达到了对齐的目的。另外,在目标音源片段A为主轨原声音源的情况下,响应于用户对视频的主轨道 30中的区域31进行了2倍变速,目标音源片段A也会跟随进行2倍变速,本申请实施例还可以根据目标音源片段A被执行的变速操作,按照2倍的变速值对目标字幕片段a进行同步变速,达到对齐的目的。Referring to FIG. 2 and FIG. 3 , and FIG. 3 , it shows another interface diagram for binding subtitles and audio sources provided by an embodiment of the present application. The target audio source segments A and B are both non-main track original audio sources, and the user In the case where the area 31 in the main track 30 of the video is double-speeded, since the target audio clip A is a non-main track original sound source, the target audio clip A will not follow the variable speed, and because the target subtitle clip a is not The main track is bound, but the head position is bound to the target subtitle fragment A, so the target subtitle fragment a will not follow the speed change, so that the target subtitle fragment a is bound to the head position of the target subtitle fragment A, and maintains The relative positional relationship between the head positions remains unchanged, and the purpose of alignment is achieved. In addition, in the case where the target audio source segment A is the main track original sound source, in response to the user performing a 2-fold speed change on the area 31 in the main track 30 of the video, the target audio source segment A will also follow the 2-fold speed change, which is implemented in this application. For example, according to the speed change operation performed on the target audio source segment A, the target subtitle segment a can be synchronously changed according to the speed change value of 2 times, so as to achieve the purpose of alignment.
针对目标音源片段B,在用户对视频的主轨道30中的区域32进行了裁剪的情况下,由于目标音源片段B为非主轨原声音源,所以目标音源片段B不会跟随变速,且由于目标字幕片段b现在不与主轨道绑定,而是与目标字幕片段B实现了头部位置绑定,所以目标字幕片段B也不会跟随变速,使得目标字幕片段b与目标字幕片段B的头部位置绑定,并维持了头部位置之间的相对位置关系不变,达到了对齐的目的。另外,在目标音源片段B为主轨原声音源的情况下,响应于用户对视频的主轨道30中的区域32进行了裁剪,相关技术中目标字幕片段b对应区域32的部分也会被裁剪,导致字幕的部分信息缺失,而本申请实施例中,虽然目标音源片段B对应区域32的部分会被裁剪,但目标字幕片段b对应区域32的部分则不会被裁剪,从而避免了字幕的部分信息的缺失,即保证了字幕完整性。For the target audio clip B, when the user has trimmed the area 32 in the main track 30 of the video, since the target audio clip B is a non-main track original sound source, the target audio clip B will not follow the speed change, and because the target audio clip B The subtitle segment b is now not bound to the main track, but is bound to the head position of the target subtitle segment B, so the target subtitle segment B will not follow the speed change, so that the target subtitle segment b and the head of the target subtitle segment B The position is bound, and the relative positional relationship between the head positions is maintained unchanged to achieve the purpose of alignment. In addition, in the case that the target audio source segment B is the main track original sound source, in response to the user trimming the region 32 in the main track 30 of the video, the part corresponding to the region 32 of the target subtitle segment b in the related art will also be trimmed, As a result, part of the subtitle information is missing. In the embodiment of the present application, although the part corresponding to the region 32 of the target audio source segment B will be cropped, the part corresponding to the region 32 of the target subtitle segment B will not be cropped, thereby avoiding the subtitle part. The lack of information ensures the integrity of the subtitles.
综上所述,本申请实施例提供的一种字幕与音源的绑定方法,包括:确定目标视频中的目标音源片段,并由目标音源片段识别得到目标字幕片段;确定目标字幕片段的字幕头部位置,与目标音源片段的音源头部位置之间的相对位置关系;在执行对目标音源片段和/或目标字幕片段的调整操作的情况下,保持相对位置关系不变。本申请可以将目标字幕片段与目标音源片段的头部位置之间的相对位置关系进行绑定,使得对主轨道的编辑过程与对字幕和音源的编辑过程相互隔离,对主轨道的编辑操作不会影响到字幕与音源的对齐关系,从而降低了音源与字幕发生错位的几率。To sum up, a method for binding subtitles and audio sources provided by an embodiment of the present application includes: determining a target audio source segment in a target video, and identifying the target subtitle segment from the target audio source segment; determining a subtitle header of the target subtitle segment The relative positional relationship between the head position of the target audio source segment and the audio source header position of the target audio source segment; in the case of performing an adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship remains unchanged. The present application can bind the relative positional relationship between the head position of the target subtitle segment and the target audio source segment, so that the editing process of the main track and the editing process of the subtitle and audio source are isolated from each other, and the editing operation of the main track is not It will affect the alignment between the subtitles and the audio source, thereby reducing the chance of misalignment between the audio source and subtitles.
图4是本申请实施例提供的一种字幕与音源的绑定方法的步骤流程图,该方法可以由服务器、处理器、车载设备、移动设备、计算设备等执行。如图4所示,该方法可以包括:4 is a flowchart of steps of a method for binding subtitles and audio sources provided by an embodiment of the present application, and the method may be executed by a server, a processor, a vehicle-mounted device, a mobile device, a computing device, and the like. As shown in Figure 4, the method may include:
步骤201、确定目标视频中的目标音源片段,并由所述目标音源片段识别得到目标字幕片段。Step 201: Determine the target audio source segment in the target video, and identify the target subtitle segment from the target audio source segment.
该步骤可以参照上述步骤101,此处不再赘述。For this step, reference may be made to the foregoing step 101, which will not be repeated here.
步骤202、确定所述目标字幕片段的字幕头部位置,与所述目标音源片段的音源头部位置之间的相对位置关系。Step 202: Determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio source segment.
该步骤可以参照上述步骤102,此处不再赘述。For this step, reference may be made to the foregoing step 102, which will not be repeated here.
步骤203、在执行对所述目标音源片段和/或目标字幕片段的调整操作的情况下,保持所述相对位置关系不变。Step 203: In the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, keep the relative positional relationship unchanged.
该步骤可以参照上述步骤103,此处不再赘述。For this step, reference may be made to the foregoing step 103, which will not be repeated here.
在一些实施例中,在一种实现方式中,步骤203可以包括:In some embodiments, in one implementation, step 203 may include:
子步骤2031、在调整后的目标音源片段与目标字幕片段之间不具有重叠部分的情况下,触发字幕处理操作。Sub-step 2031: In the case that there is no overlap between the adjusted target audio source segment and the target subtitle segment, trigger a subtitle processing operation.
在本申请实施例中,用户可以基于对字幕和音源的对齐需求,对目标音源片段和/或 目标字幕片段进行位置、长短等方面的调整操作,从而改变整个目标音源片段与整个目标字幕片段之间的相对位置关系,调整操作结束后,在目标音源片段与目标字幕片段之间不具有重叠部分的情况下,可以认为目标音源片段与目标字幕片段完全错位,并触发字幕处理操作,字幕处理操作用于对此时的目标字幕片段进行删除,从而避免错位字幕带来的影响,或用于先保留该目标字幕片段,以便后续编辑操作复用该目标字幕片段。In this embodiment of the present application, the user can adjust the position, length, etc. of the target audio source segment and/or the target subtitle segment based on the alignment requirements of the subtitle and audio source, thereby changing the relationship between the entire target audio source segment and the entire target subtitle segment. After the adjustment operation is completed, if there is no overlap between the target audio source segment and the target subtitle segment, it can be considered that the target audio source segment and the target subtitle segment are completely misplaced, and the subtitle processing operation is triggered. Subtitle processing operation It is used to delete the target subtitle segment at this time, so as to avoid the influence of misplaced subtitles, or to retain the target subtitle segment so that the target subtitle segment can be reused by subsequent editing operations.
在一些实施例中,调整后的目标音源片段与目标字幕片段之间不具有重叠部分的情况包括:执行了删除所述目标音源片段的情况,或执行了变更所述字幕头部位置和/或所述音源头部位置的情况,或分割所述目标音源片段得到多个音源子片段,且删除了所述多个音源子片段中位于头部的音源子片段的情况。In some embodiments, the case where there is no overlap between the adjusted target audio source segment and the target subtitle segment includes: performing deletion of the target audio source segment, or performing changing the position of the subtitle header and/or performing The case of the head position of the sound source, or the case of dividing the target sound source segment to obtain multiple sound source sub-segments, and deleting the sound source sub-segment located at the head of the multiple sound source sub-segments.
针对执行了删除目标音源片段的情况,参照图5,其示出了本申请实施例提供的另一种字幕与音源的绑定界面图,在基于图2示出的状态,执行了删除目标音源片段A的操作的情况下,删除后目标字幕片段a与目标音源片段A之间不具有重叠部分,此时可以触发针对目标字幕片段a的字幕处理操作,从而实现了删除整个音源片段后导致字幕和音源完全错位情况下,对错位的字幕片段的处理过程。For the case where the deletion of the target audio source segment is performed, referring to FIG. 5 , which shows another interface diagram for binding subtitles and audio sources provided by the embodiment of the present application, in the state shown in FIG. 2 , the deletion of the target audio source is performed. In the case of the operation of segment A, there is no overlap between the target subtitle segment a and the target audio source segment A after deletion. At this time, the subtitle processing operation for the target subtitle segment a can be triggered, thereby realizing the deletion of the entire audio source segment. When the audio source is completely misplaced, the process of processing the misplaced subtitle segment.
针对执行了变更所述字幕头部位置和/或所述音源头部位置的情况,参照图6,其示出了本申请实施例提供的另一种字幕与音源的绑定界面图,在基于图2示出的状态,执行了调整目标音源片段B以及目标字幕片段b的头部位置的操作的情况下,将目标音源片段B的头部位置调整为了01:20时刻对应的位置,将目标字幕片段b的头部位置调整为了00:40时刻对应的位置,因此调整后目标字幕片段b与目标音源片段B之间不具有重叠部分,此时可以触发针对目标字幕片段b的字幕处理操作,从而实现了调整音源、字幕的头部位置后导致字幕和音源完全错位情况下,对错位的字幕片段的处理过程。For the case where the position of the subtitle head and/or the head position of the audio source is changed, refer to FIG. 6 , which shows another interface diagram for binding subtitles and audio sources provided by the embodiment of the present application. In the state shown in FIG. 2, when the operation of adjusting the head position of the target audio clip B and the target subtitle clip B is performed, the head position of the target audio clip B is adjusted to the position corresponding to the time of 01:20, and the target The head position of subtitle segment b is adjusted to the position corresponding to the time of 00:40. Therefore, there is no overlap between the target subtitle segment b and the target audio source segment B after adjustment. At this time, the subtitle processing operation for the target subtitle segment b can be triggered. In this way, the process of processing the dislocated subtitle segment in the case that the subtitle and the audio source are completely dislocated after adjusting the head position of the audio source and the subtitle is realized.
针对执行了分割目标音源片段得到多个音源子片段,且删除了多个音源子片段中位于头部的音源子片段的情况,参照图7,其示出了本申请实施例提供的另一种字幕与音源的绑定界面图,在基于图2示出的状态,执行了先将目标音源片段A分割为三个音源子片段,然后将处于头部位置的音源子片段进行删除的操作的情况下,在调整后认为目标字幕片段a缺失了与其建立绑定关系的音源头部位置,使得调整后的目标音源片段A与目标字幕片段a之间不具有重叠部分,此时可以触发针对目标字幕片段a的字幕处理操作,从而实现了字幕和音源完全错位情况下,对错位的字幕片段的处理过程。For the situation in which the target sound source segment is divided to obtain multiple sound source sub-segments, and the sound source sub-segment located at the head of the multiple sound source sub-segments is deleted, referring to FIG. The interface diagram of the binding interface between subtitles and audio sources. Based on the state shown in Figure 2, the operation of first dividing the target audio source segment A into three audio source sub-segments, and then deleting the audio source sub-segment at the head position is performed. After the adjustment, it is considered that the target subtitle segment a lacks the audio source head position with which the binding relationship is established, so that there is no overlap between the adjusted target audio source segment A and the target subtitle segment a. At this time, the target subtitle segment can be triggered. Subtitle processing operation of segment a, so as to realize the process of processing the misplaced subtitle segment when the subtitle and audio source are completely misaligned.
子步骤2032、响应于保留字幕的字幕处理操作,将所述目标字幕片段与所述目标视频的主轨道绑定。Sub-step 2032: Bind the target subtitle segment to the main track of the target video in response to the subtitle processing operation of the retained subtitle.
在本申请实施例中,在响应了保留字幕的字幕处理操作的情况下,可以将目标字幕片段与目标视频的主轨道绑定,以供后续对保留的目标字幕片段的处理。在一些实施例中,可以将目标字幕片段的字幕头部位置,与该字幕头部位置在目标视频的主轨道上的对应时刻进行绑定,从而达到了暂时保留目标字幕片段的目的。例如,针对上述图6示出的状态,可以将调整后的目标字幕片段b的字幕头部位置(00:40)与主轨道上的00:40时刻对应的 位置进行绑定。In the embodiment of the present application, in the case of responding to the subtitle processing operation of the reserved subtitle, the target subtitle segment can be bound to the main track of the target video for subsequent processing of the reserved target subtitle segment. In some embodiments, the subtitle header position of the target subtitle segment can be bound with the corresponding moment of the subtitle header position on the main track of the target video, thereby achieving the purpose of temporarily retaining the target subtitle segment. For example, for the state shown in FIG. 6, the position of the subtitle header (00:40) of the adjusted target subtitle segment b can be bound to the position corresponding to the time 00:40 on the main track.
进一步的,将目标字幕片段与目标视频的主轨道绑定后,后续对将目标字幕片段执行的处理操作可以包括如下场景:例如,一种场景下,针对一段男声音源片段,用户想将该男声音源片段替换成机器声音源片段,则用户可以删除整个男声音源片段,并将男声音源片段对应的字幕片段先与主轨道绑定,之后等待机器声音源片段生成后,再将机器声音源片段插入原先男声音源片段所处的位置,使得字幕片段重新建立与机器声音源片段的对齐关系。另一种场景下,也可以将音源片段进行删除,仅保留其对应的字幕片段与主轨道绑定,使得播放时仅以文本的形式描述该片段的内容。Further, after the target subtitle segment is bound to the main track of the target video, subsequent processing operations performed on the target subtitle segment may include the following scenarios: For example, in a scenario, for a male voice source clip, the user wants to If the audio clip is replaced with the machine sound source clip, the user can delete the entire male voice source clip, and bind the subtitle clip corresponding to the male voice source clip to the main track, and then wait for the machine sound source clip to be generated, and then insert the machine sound source clip into The position of the original male voice source clip makes the subtitle clip re-establish the alignment relationship with the machine voice source clip. In another scenario, the audio source segment can also be deleted, and only the corresponding subtitle segment is kept bound to the main track, so that only the content of the segment is described in the form of text during playback.
需要说明的是,在响应于删除字幕的字幕处理操作的情况下,可以将目标字幕片段进行删除。即用户认为错位的目标字幕片段无利用价值的情况下,也可以执行删除字幕的字幕处理操作,从而将该目标字幕片段删除,避免错位的目标字幕片段带来的干扰。It should be noted that, in the case of a subtitle processing operation in response to subtitle deletion, the target subtitle segment may be deleted. That is, when the user thinks that the misplaced target subtitle segment is of no use value, the subtitle processing operation of deleting subtitles can also be performed, so as to delete the target subtitle segment and avoid the interference caused by the misplaced target subtitle segment.
本申请实施例中,触发的字幕处理操作可以以界面的形式进行提供,例如,参照图8,其示出了本申请实施例提供的一种字幕处理操作的界面图,其中包括用于实现字幕处理操作的控件,该控件包括提醒文案:“此音频识别过字幕,是否一起删除?”、“删除识别字幕和音频”按钮以及“仅删除音频”按钮。用户触发了“删除识别字幕和音频”按钮的情况下,会将音频及对应的字幕一并删除,用户触发了“仅删除音频”按钮的情况下,会仅将音频删除,将音频对应的字幕对主轨道进行绑定。In this embodiment of the present application, the triggered subtitle processing operation may be provided in the form of an interface. For example, refer to FIG. 8 , which shows an interface diagram of a subtitle processing operation provided by the embodiment of the present application, including a subtitle processing operation for realizing subtitles. Controls for handling operations, including reminder text: "This audio recognizes subtitles, delete them together?", "Remove recognized subtitles and audio" button, and "Remove audio only" button. When the user triggers the "Delete Recognized Subtitles and Audio" button, the audio and the corresponding subtitles will be deleted together. When the user triggers the "Delete Audio Only" button, only the audio will be deleted, and the corresponding subtitles of the audio will be deleted. Bind the main track.
另外,在实际应用中,后续的字幕处理算法分为对语音识别字幕的处理部分以及对手动添加字幕的处理部分,因此,为了避免两部分算法之间相互冲突,本申请实施例中由音源片段识别得到的字幕片段的属性默认可以设置为语音识别字幕属性,使得算法中对语音识别字幕的处理部分可以仅对具有语音识别字幕属性的字幕片段进行处理。而在将字幕片段与主轨道进行绑定时,可以认为该字幕片段对应的音源片段被删除或与其严重错位,此时可以将该字幕片段的属性变更为手动添加字幕,即将该字幕片段当作用户手动添加的字幕进行处理,使得算法中对手动添加字幕的处理部分可以仅对具有手动添加字幕属性的字幕片段进行处理。In addition, in practical applications, the subsequent subtitle processing algorithm is divided into a subtitle processing part for speech recognition and a processing part for manually adding subtitles. Therefore, in order to avoid conflicts between the two parts of the algorithm, in the embodiment of the present application, the audio source segment is composed of The attribute of the recognized subtitle segment can be set as the speech recognition subtitle attribute by default, so that the processing part of the speech recognition subtitle in the algorithm can only process the subtitle segment with the speech recognition subtitle attribute. When the subtitle clip is bound to the main track, it can be considered that the audio source clip corresponding to the subtitle clip is deleted or seriously misplaced. At this time, the attribute of the subtitle clip can be changed to add subtitles manually, that is, the subtitle clip can be regarded as a subtitle clip. The subtitles manually added by the user are processed, so that the processing part of the manually added subtitles in the algorithm can only process the subtitle segments with the attribute of manually added subtitles.
进一步的,用户还可以将当前时刻对字幕片段和音源片段的调整状态设置为旧草稿,此时可以将所有的目标字幕片段与主轨道进行绑定,并建立旧草稿文件,由于所有的目标字幕片段与主轨道进行绑定,则旧草稿文件仅包含字幕信息和主轨道信息,不包含文件大小较大的音源信息,从而节省了存储资源。Further, the user can also set the adjustment status of the subtitle clips and audio source clips at the current moment to the old draft. At this time, all target subtitle clips can be bound to the main track, and an old draft file can be created. If the clip is bound to the main track, the old draft file only contains the subtitle information and the main track information, and does not contain the audio source information with a large file size, thus saving storage resources.
在一些实施例中,步骤203还可以包括:In some embodiments, step 203 may further include:
子步骤2033、在调整后的目标音源片段与目标字幕片段之间具有重叠部分的情况下,检测调整后的新的字幕头部位置和音源头部位置。Sub-step 2033: In the case that there is an overlap between the adjusted target audio source segment and the target subtitle segment, detect the adjusted new subtitle header position and audio source header position.
在本申请实施例中,用户可以基于对字幕和音源的对齐需求,对目标音源片段和/或目标字幕片段进行位置、长短等方面的调整操作,从而改变整个目标音源片段与整个目标字幕片段之间的相对位置关系,调整操作结束后,在目标音源片段与目标字幕片段之间具 有重叠部分的情况下,可以认为目标音源片段与目标字幕片段处于用户调节后的对齐状态,此时可以检测调整后的新的字幕头部位置和音源头部位置。In this embodiment of the present application, the user can adjust the position, length, etc. of the target audio source segment and/or the target subtitle segment based on the alignment requirements of the subtitle and audio source, thereby changing the relationship between the entire target audio source segment and the entire target subtitle segment. After the adjustment operation, if there is an overlap between the target audio source segment and the target subtitle segment, it can be considered that the target audio source segment and the target subtitle segment are in the user-adjusted alignment state, and the adjustment can be detected at this time. After the new subtitle head position and audio source head position.
子步骤2034、根据所述新的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述更新后的相对位置关系不变。Sub-step 2034: Update the relative positional relationship according to the new subtitle header position and audio source header position, and keep the updated relative positional relationship unchanged.
在该步骤中,基于新的字幕头部位置和音源头部位置,可以更新原有的相对位置关系,并保持更新后的相对位置关系不变,从而满足用户对字幕和音源的对齐状态的调节需求。In this step, based on the new subtitle head position and the audio source head position, the original relative positional relationship can be updated, and the updated relative positional relationship can be kept unchanged, so as to satisfy the user's adjustment of the alignment state of the subtitles and the audio source. need.
例如,参照图2,在调整操作之前,目标音源片段A和目标字幕片段a的头部位置都与00:00时刻重叠,调整操作之后,目标音源片段A的音源头部位置处于00:10时刻,目标字幕片段a的字幕头部位置处于00:00时刻的情况下,根据调整后的结果,确定目标音源片段A的新的音源头部位置(00:10)与目标字幕片段a的新的字幕头部位置(00:00)之间的相对位置关系,并维持该相对位置关系不变。For example, referring to FIG. 2 , before the adjustment operation, the head positions of the target audio source segment A and the target subtitle segment a overlap with the time 00:00, and after the adjustment operation, the audio head position of the target audio segment A is at the time 00:10 , when the subtitle head position of the target subtitle segment a is at 00:00 time, according to the adjusted result, determine the new audio source header position (00:10) of the target audio source segment A and the new audio source head position (00:10) of the target subtitle segment a The relative positional relationship between the subtitle header positions (00:00), and the relative positional relationship is maintained unchanged.
在一些实施例中,所述目标音源片段置于音源轨道进行展示;所述目标字幕片段置于字幕轨道进行展示,所述音源轨道、所述字幕轨道与所述目标视频的主轨道采用同一时序;所述目标音源片段与所述主轨道绑定。In some embodiments, the target audio source segment is displayed on an audio source track; the target subtitle segment is displayed on a subtitle track, and the audio source track, the subtitle track and the main track of the target video use the same timing sequence ; The target sound source segment is bound to the main track.
参照图3,在本申请实施例中,进行目标视频的编辑时,编辑界面可以包括三个可操作的调整轨道:目标视频的主轨道30、音源轨道10和字幕轨道20。其中,主轨道30可以按照目标视频的播放时序,将目标视频以帧序列流的形式展示,用户可以对主轨道30进行操作,从而便捷的查看、选取和编辑目标视频不同位置的内容,音源轨道10则用于承载和展示音源片段,用户可以在音源轨道10上进行音源片段的调整操作;字幕轨道20则用于承载和展示字幕片段,用户可以在字幕轨道20上进行字幕片段的调整操作,通过三种可操作的调整轨道,提供给了用户丰富的调整交互方式。3 , in this embodiment of the present application, when editing a target video, the editing interface may include three operable adjustment tracks: a main track 30 , an audio source track 10 and a subtitle track 20 of the target video. Among them, the main track 30 can display the target video in the form of a frame sequence stream according to the playback sequence of the target video, and the user can operate the main track 30 to conveniently view, select and edit the content of the target video at different positions, and the audio source track 10 is used to carry and display audio clips, and the user can adjust the audio clips on the audio track 10; the subtitle track 20 is used to carry and display subtitle clips, and the user can adjust the subtitle clips on the subtitle track 20. Through three operable adjustment tracks, it provides users with rich adjustment and interaction methods.
进一步的,目标音源片段可以与主轨道30进行绑定,在一些实施例中,可以将目标音源片段的音源头部位置,与该音源头部位置在目标视频的主轨道30上的对应时刻进行绑定,从而达到了目标音源片段与主轨道绑定的目的。Further, the target sound source segment can be bound with the main track 30. In some embodiments, the position of the sound source head of the target sound source segment can be compared with the corresponding moment of the sound source head position on the main track 30 of the target video. Binding, so as to achieve the purpose of binding the target audio clip and the main track.
另外,在目标音源片段与主轨道绑定的情况下,对主轨道的调整操作,也会影响到目标音源片段的长短和位置。针对目标音源片段为主轨原声音源的情况,在主轨道上对应目标音源片段的区域进行裁剪、变速、删除等调节操作时,目标音源片段也会随之发生改变。针对目标音源片段为非主轨原声音源的情况,在主轨道上对应目标音源片段的区域进行裁剪、变速、删除等调节操作时,只要调节操作不改变目标音源片段的头部位置在主轨道上对应时刻的位置,目标音源片段就不会发生改变,如,对在主轨道上对应目标音源片段的区域的尾部进行裁剪,目标音源片段不会发生变化,而在主轨道上对应目标音源片段的区域的头部进行裁剪,目标音源片段会被删除。In addition, when the target sound source clip is bound to the main track, the adjustment operation on the main track will also affect the length and position of the target sound source clip. For the case where the target sound source clip is the original sound source of the main track, when adjusting operations such as trimming, speed change, and deletion are performed on the area corresponding to the target sound source clip on the main track, the target sound source clip will also change accordingly. For the case where the target sound source clip is not the original sound source of the main track, when trimming, shifting, deleting and other adjustment operations are performed in the area corresponding to the target sound source clip on the main track, as long as the adjustment operation does not change the head position of the target sound source clip on the main track The position of the corresponding time, the target sound source clip will not change. For example, if the tail of the area corresponding to the target sound source clip on the main track is trimmed, the target sound source clip will not change, while the corresponding target sound source clip on the main track The head of the region is cropped, and the target audio clip is deleted.
在一些实施例中,在本申请实施例的另一种实现方式中,步骤203可以包括:In some embodiments, in another implementation manner of the embodiments of the present application, step 203 may include:
子步骤2035、在执行变更所述字幕头部位置和/或所述音源头部位置的调整操作的情况下,根据变更后的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述 更新后的相对位置关系不变。Sub-step 2035, in the case of performing the adjustment operation of changing the position of the subtitle header and/or the position of the audio source header, update the relative positional relationship according to the changed subtitle header position and the audio source header position, And keep the updated relative position relationship unchanged.
在本申请实施例中,用户可以基于对字幕和音源的对齐需求,对目标音源片段和/或目标字幕片段进行头部位置的调整操作,从而改变整个目标音源片段与整个目标字幕片段之间的相对位置关系,调整操作结束后,可以变更后的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述更新后的相对位置关系不变,从而满足用户对字幕和音源的对齐状态的调节需求。In this embodiment of the present application, the user may adjust the head position of the target audio source segment and/or the target subtitle segment based on the alignment requirements of the subtitle and the audio source, thereby changing the alignment between the entire target audio source segment and the entire target subtitle segment. The relative positional relationship, after the adjustment operation is completed, the position of the subtitle head and the audio source head position can be changed, the relative positional relationship can be updated, and the updated relative positional relationship can be kept unchanged, so as to satisfy the user's requirements for subtitles and audio source. The adjustment requirements of the alignment state.
例如,参照图2,在调整操作之前,目标音源片段A和目标字幕片段a的头部位置都与00:00时刻重叠,调整操作之后,目标音源片段A的音源头部位置处于00:10时刻,目标字幕片段a的字幕头部位置处于00:00时刻的情况下,根据调整后的结果,确定目标音源片段A的新的音源头部位置(00:10)与目标字幕片段a的新的字幕头部位置(00:00)之间的相对位置关系,并维持该相对位置关系不变。For example, referring to FIG. 2 , before the adjustment operation, the head positions of the target audio source segment A and the target subtitle segment a overlap with the time 00:00, and after the adjustment operation, the audio head position of the target audio segment A is at the time 00:10 , when the subtitle head position of the target subtitle segment a is at 00:00 time, according to the adjusted result, determine the new audio source header position (00:10) of the target audio source segment A and the new audio source head position (00:10) of the target subtitle segment a The relative positional relationship between the subtitle header positions (00:00), and the relative positional relationship is maintained unchanged.
另外,在调整操作结束后,也可以判断调节后目标音源片段与目标字幕片段之间是否具有重叠部分,并根据判断结果执行相应的操作,具体逻辑可以参照上述子步骤2031-子步骤2032的描述,此处不再赘述。In addition, after the adjustment operation is completed, it is also possible to judge whether there is an overlap between the adjusted target audio source segment and the target subtitle segment, and perform corresponding operations according to the judgment result. For the specific logic, refer to the description of the above sub-step 2031-sub-step 2032 , and will not be repeated here.
在一些实施例中,在本申请实施例的另一种实现方式中,步骤203可以包括:In some embodiments, in another implementation manner of the embodiments of the present application, step 203 may include:
子步骤2036、在执行整体移动所述目标音源片段的位置的调整操作的情况下,将所述目标字幕片段的位置跟随所述目标音源片段整体移动,并保持所述相对位置关系不变。Sub-step 2036 , in the case of performing the adjustment operation of moving the position of the target audio source segment as a whole, move the position of the target subtitle segment along with the target audio source segment as a whole, and keep the relative positional relationship unchanged.
在该步骤中,参照图2和图9,图9示出了本申请实施例提供的另一种字幕与音源的绑定界面图,在调整操作为将图2中的目标音源片段A、B的位置整体调换时,图9示出了调换后的状态,此时目标字幕片段a、b的位置跟随各自对应的目标音源片段整体移动,并保持相对位置关系不变,这样达到了一种效果,即用户对音源片段整体移动时,该音源片段对应的字幕片段也会跟随一起移动,节省了用户调整音源片段和字幕片段对齐的时间。In this step, referring to FIG. 2 and FIG. 9 , FIG. 9 shows another interface diagram for binding subtitles and audio sources provided by the embodiment of the present application. The adjustment operation is to convert the target audio source segments A and B in FIG. 2 . When the positions of the subtitles are changed as a whole, Figure 9 shows the state after the exchange. At this time, the positions of the target subtitle clips a and b move as a whole with the corresponding target audio clips, and keep the relative position relationship unchanged, which achieves an effect. , that is, when the user moves the audio source segment as a whole, the subtitle segment corresponding to the audio source segment will also move along with it, which saves the user time for adjusting the alignment of the audio source segment and the subtitle segment.
在一些实施例中,在本申请实施例的另一种实现方式中,步骤203可以包括:In some embodiments, in another implementation manner of the embodiments of the present application, step 203 may include:
子步骤2037、在执行替换所述目标音源片段的调整操作的情况下,确定所述目标字幕片段的字幕头部位置和替换后的目标音源片段的音源头部位置之间的新的相对位置关系,并保持所述新的相对位置关系不变。Sub-step 2037: In the case of performing the adjustment operation of replacing the target audio source segment, determine a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the replaced target audio source segment. , and keep the new relative position relationship unchanged.
在本申请实施例中,还可以对目标音源片段进行整段替换,并将目标字幕片段的字幕头部位置替换后的目标音源片段的音源头部位置之间的新的相对位置关系,从而保证替换后的字幕和音源之间对齐。In this embodiment of the present application, the entire segment of the target audio source segment may be replaced, and the new relative positional relationship between the audio source header positions of the target audio source segment after the subtitle header position of the target subtitle segment is replaced, thereby ensuring that Align between the replaced subtitles and the audio source.
需要说明的是,替换后的目标音源片段可以与替换前的目标音源片段的位置、时长都一致,另外,替换后的目标音源片段也可以与替换前的目标音源片段的位置和/或时长不一致,本申请实施例对此不作限定,在替换操作结束后,也可以判断替换后目标音源片段与目标字幕片段之间是否具有重叠部分,并根据判断结果执行相应的操作,具体逻辑可以参照上述子步骤2031-子步骤2032的描述,此处不再赘述。It should be noted that the replaced target sound source segment may be consistent with the position and duration of the pre-replacement target sound source segment. In addition, the replaced target sound source segment may also be inconsistent with the position and/or duration of the pre-replacement target sound source segment. , this embodiment of the present application does not limit this. After the replacement operation is completed, it is also possible to judge whether there is an overlap between the target audio source segment and the target subtitle segment after the replacement, and perform corresponding operations according to the judgment result. For the specific logic, refer to the above subsections. The description of step 2031 to sub-step 2032 will not be repeated here.
在一些实施例中,在本申请实施例的另一种实现方式中,步骤203可以包括:In some embodiments, in another implementation manner of the embodiments of the present application, step 203 may include:
子步骤2038、在根据预设的变速值,执行对所述目标音源片段的变速操作的情况下,将所述目标字幕片段按照所述变速值执行变速操作,并保持所述相对位置关系不变。Sub-step 2038: In the case of performing a speed change operation on the target audio source segment according to a preset speed change value, perform a speed change operation on the target subtitle segment according to the speed change value, and keep the relative positional relationship unchanged .
视频变速是视频/音频编辑场景中的常见功能,用户可以通过选取变速值,将视频/音频按照变速值对应的倍率快放或慢放。例如,响应于选取了2倍的变速值,则可以对视频/音频按照2倍速进行快放,使得视频/音频的时长缩短一半。Video speed change is a common function in video/audio editing scenarios. By selecting the speed change value, the user can fast or slow the video/audio according to the ratio corresponding to the speed change value. For example, in response to the selection of the speed change value of 2 times, the video/audio may be fast-forwarded at 2 times the speed, so that the duration of the video/audio is shortened by half.
在本申请实施例中,在根据预设的变速值,对目标音源片段的变速操作的情况下,可以将目标字幕片段按照该变速值执行变速操作,并保持目标音源片段的头部位置与目标字幕片段的头部位置之间的相对位置关系不变。例如,在以2倍的变速值对目标音源片段进行变速快放时,可以对目标字幕片段也以2倍的变速值进行变速快放,从而满足了变速调节操作后字幕与音源对齐的目的。In the embodiment of the present application, in the case of a variable speed operation on the target audio clip according to a preset variable speed value, the target subtitle clip can be subjected to a variable speed operation according to the variable speed value, and the head position of the target audio clip and the target audio clip can be maintained. The relative positional relationship between the head positions of the subtitle segments is unchanged. For example, when the target audio clip is played at a speed change value of 2 times, the target subtitle clip can also be played at a speed change value of 2 times, so as to meet the purpose of aligning the subtitles with the audio source after the speed change adjustment operation.
进一步的,目标音源片段与目标视频的主轨道绑定,在目标音源片段为主轨原声音源时,可以在主轨道上对与目标音源片段对应的区域进行变速操作,从而使得目标音源片段实现变速,另外,也可以直接在音源轨道上对目标音源片段进行变速操作,从而使得目标音源片段实现变速;在目标音源片段为非主轨原声音源时,在主轨道上进行的变速操作不会使得目标音源片段发生对应变速效果,因此可以直接在音源轨道上对目标音源片段进行变速操作,从而使得目标音源片段实现变速。Further, the target sound source clip is bound to the main track of the target video. When the target sound source clip is the main track original sound source, a variable speed operation can be performed on the area corresponding to the target sound source clip on the main track, so that the target sound source clip can achieve variable speed. , in addition, you can also directly perform a variable speed operation on the target sound source clip on the sound source track, so that the target sound source clip can achieve variable speed; when the target sound source clip is a non-main track original sound source, the variable speed operation on the main track will not cause the target sound source clip to change speed. The sound source clip has a corresponding variable speed effect, so you can directly perform a variable speed operation on the target sound source clip on the sound source track, so that the target sound source clip can achieve variable speed.
在一些实施例中,子步骤2038可以包括:In some embodiments, sub-step 2038 may include:
子步骤A1、在变速前的目标音源片段的第一时长大于所述目标字幕片段的第二时长的情况下,将所述目标字幕片段按照所述预设的变速值执行变速操作。Sub-step A1: In the case that the first duration of the target audio source segment before shifting is greater than the second duration of the target subtitle segment, perform a shifting operation on the target subtitle segment according to the preset shifting value.
在本申请实施例中,还可以进一步通过比较变速前的目标音源片段的第一时长与目标字幕片段的第二时长的大小,来对变速逻辑进行优化,在一些实施例中,在变速前的目标音源片段的第一时长大于目标字幕片段的第二时长的情况下,无论是对目标音源片段和目标字幕片段按照相同变速值,进行倍速快放或是倍速慢放,变速后的目标音源片段的时长都大于变速后的目标字幕片段的时长,而目标音源片段的时长大于目标字幕片段的时长的设定,会提供更优的播放效果,更加满足用户的观看习惯,因此在第一时长大于第二时长的情况下,可以将目标字幕片段按照所述预设的变速值执行变速操作。In the embodiments of the present application, the speed change logic may be optimized by further comparing the size of the first duration of the target audio source segment before the speed change with the second duration of the target subtitle segment. When the first duration of the target audio clip is greater than the second duration of the target subtitle clip, whether the target audio clip and the target subtitle clip are played at double speed or slow down according to the same variable speed value, the target audio clip after the variable speed The duration of the target subtitle clip is longer than the duration of the target subtitle clip after the variable speed, and the duration of the target audio clip is longer than the duration setting of the target subtitle clip, which will provide a better playback effect and better meet the user's viewing habits. Therefore, when the first duration is longer than In the case of the second duration, the target subtitle segment may be subjected to a speed change operation according to the preset speed change value.
例如,参照图10,图10示出了本申请实施例提供的另一种字幕与音源的绑定界面图,其示出了变速前的目标音源片段C、D,以及变速前的目标字幕片段c、d,针对目标音源片段C和目标字幕片段c,由于变速前的目标音源片段C的第一时长大于目标字幕片段c的第二时长,因此对目标音源片段C按照2倍变速值进行变速后,目标字幕片段c也可以跟随进行2倍变速。For example, referring to FIG. 10 , FIG. 10 shows another interface diagram for binding subtitles and audio sources provided by an embodiment of the present application, which shows the target audio source segments C and D before the speed change, and the target subtitle segment before the speed change. c, d, for the target audio segment C and the target subtitle segment c, since the first duration of the target audio segment C before the speed change is greater than the second duration of the target subtitle segment c, the target audio segment C is changed according to 2 times the speed change value. After that, the target subtitle segment c can also follow the 2-fold speed change.
子步骤A2、在变速前的目标音源片段的第一时长小于所述目标字幕片段的第二时长的情况下,则将所述目标字幕片段中第一时长的部分按照所述预设的变速值执行变速操作。Sub-step A2: In the case where the first duration of the target audio source segment before the speed change is less than the second duration of the target subtitle segment, the part of the first duration in the target subtitle segment is adjusted according to the preset speed change value. Perform a shifting operation.
在本申请实施例中,在变速前的目标音源片段的第一时长小于目标字幕片段的第二时长的情况下,响应于按照与目标音源片段相同的变速值,对目标字幕片段进行倍速快放或是倍速慢放,变速后的目标字幕片段的时长都大于变速后的目标音源片段的时长,从而导致目标字幕片段的长度过长的几率大大提升,而目标字幕片段的长度过长,会增加目标字幕片段与其他音源片段发生重叠的几率,目标字幕片段与其他音源片段发生重叠会降低播放效果,并与用户的观看习惯产生冲突,因此在第一时长小于第二时长的情况下,可以仅将目标字幕片段中第一时长的部分按照所述预设的变速值执行变速操作,将目标字幕片段中除第一时长之外的部分保持原播放速度不变,从而降低目标字幕片段与其他音源片段发生重叠的几率。In the embodiment of the present application, in the case where the first duration of the target audio segment before the speed change is smaller than the second duration of the target subtitle segment, in response to the same speed change value as the target audio segment, the target subtitle segment is double-speed fast playback Or double-speed slow playback, the duration of the target subtitle segment after the speed change is greater than the duration of the target audio clip after the speed change, so that the probability of the target subtitle segment being too long is greatly increased, while the target subtitle segment is too long, it will increase The probability that the target subtitle segment overlaps with other audio source segments. The overlap between the target subtitle segment and other audio source segments will reduce the playback effect and conflict with the user's viewing habits. Therefore, when the first duration is less than the second duration, you can only Perform a variable speed operation on the part of the target subtitle segment with the first duration according to the preset variable speed value, and keep the original playback speed of the part other than the first duration in the target subtitle segment unchanged, thereby reducing the speed between the target subtitle segment and other audio sources. The chance that the fragments will overlap.
例如,参照图10,其示出了变速前的目标音源片段C、D,以及变速前的目标字幕片段c、d,针对目标音源片段D和目标字幕片段d,由于变速前的目标音源片段D的第一时长小于目标字幕片段d的第二时长,则对目标音源片段D按照2倍的变速值进行变速后,目标字幕片段d中与第一时长对应的部分(00:50-01:30)也可以跟随进行2倍变速,而目标字幕片段d中除第一时长之外的部分(01:30-01:40)保持原播放速度(1倍)不变。For example, referring to FIG. 10, it shows the target audio source segments C and D before shifting, and the target subtitle segments c and d before shifting. For the target audio segment D and the target subtitle segment d, since the target audio segment D before shifting The first duration of the target subtitle segment d is less than the second duration of the target subtitle segment d, then after the target audio segment D is changed according to the speed change value of 2 times, the part of the target subtitle segment d corresponding to the first duration (00:50-01:30 ) can also be followed by a 2-fold speed change, while the part (01:30-01:40) of the target subtitle segment d other than the first duration keeps the original playback speed (1 times) unchanged.
在一些实施例中,在本申请实施例的另一种实现方式中,步骤203可以包括:In some embodiments, in another implementation manner of the embodiments of the present application, step 203 may include:
子步骤2039、在执行分割所述目标音源片段的调整操作的情况下,得到分割后的多个音源子片段。Sub-step 2039: In the case of performing the adjustment operation of dividing the target sound source segment, obtain a plurality of divided sound source sub-segments.
子步骤20310、将所述目标字幕片段的字幕头部位置与目标音源子片段的音源头部位置建立新的相对位置关系,并保持所述新的相对位置关系不变,所述目标音源子片段为所有音源子片段中处于头部位置的音源子片段。Sub-step 20310, establish a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio sub-segment, and keep the new relative positional relationship unchanged, the target audio sub-segment It is the audio subclip at the head position among all audio subclips.
在本申请实施例中,用户还可以根据实际需求,对目标音源片段进行分割,得到多个音源子片段,并且,在分割操作后,目标字幕片段的字幕头部位置与分割后处于头部位置的目标音源子片段的音源头部位置进行绑定,由于目标字幕片段为由整个目标音源片段语音识别后得到的,因此即使将目标音源片段进行分割,也不破坏目标音源片段的完整性,本申请实施例可以将目标字幕片段的字幕头部位置与分割后处于头部位置的目标音源子片段的音源头部位置进行绑定,从而满足了分割处理操作后,目标字幕片段与分割后的目标音源片段的头部绑定关系。例如,参照图11,图11示出了本申请实施例提供的另一种字幕与音源的绑定界面图,在通过分割操作,将目标音源片段A分割成了音源子片段1、音源子片段2、音源子片段3的情况下,则分割处理后,目标字幕片段a与处于头部的音源子片段1的音源头部位置进行绑定。In the embodiment of the present application, the user can also divide the target audio source segment according to actual needs to obtain multiple audio source sub-segments, and after the segmentation operation, the subtitle head position of the target subtitle segment is the same as the head position after division. The audio source header position of the target audio source sub-segment is bound. Since the target subtitle segment is obtained by speech recognition of the entire target audio source segment, even if the target audio source segment is divided, the integrity of the target audio source segment will not be destroyed. The embodiment of the application can bind the subtitle header position of the target subtitle segment and the audio source header position of the target audio source sub-segment that is at the head position after segmentation, so as to satisfy the requirement that after the segmentation processing operation, the target subtitle segment and the segmented target subtitle segment can be bound. The head binding relationship of the audio clip. For example, referring to FIG. 11 , FIG. 11 shows another interface diagram for binding subtitles and audio sources provided by an embodiment of the present application. After the segmentation operation, the target audio source segment A is divided into audio source sub-segments 1 and audio source sub-segments. 2. In the case of the audio source sub-segment 3, after the segmentation process, the target subtitle segment a is bound to the audio source header position of the audio source sub-segment 1 at the head.
另外,在分割操作后,对处于非头部的音源子片段2和/或音源子片段3进行了删除的情况下,目标字幕片段a依然与处于头部的音源子片段1的音源头部位置进行绑定。In addition, after the division operation, if the audio source sub-segment 2 and/or the audio sub-segment 3 that are not at the head are deleted, the target subtitle segment a is still the same as the audio source sub-segment 1 at the head. to bind.
在一些实施例中,在步骤201之后,所述方法还可以包括:In some embodiments, after step 201, the method may further include:
步骤204、在将所述目标字幕片段移动至超出所述目标视频的主轨道的边界的情况下,将所述目标字幕片段中超出所述边界的部分进行删除。Step 204: In the case that the target subtitle segment is moved beyond the boundary of the main track of the target video, delete the part of the target subtitle segment that exceeds the boundary.
在该步骤中,参照图12,图12示出了本申请实施例提供的另一种字幕与音源的绑定界面图,针对目标字幕片段a,响应于在字幕轨道20上对目标字幕片段a进行整体移动,并将目标字幕片段a的部分移动至超出主轨道30的边界(00:00时刻),则可以将目标字幕片段a中超出边界的部分进行删除,在所有目标字幕片段a都超出边界的情况下,则可以将整个目标字幕片段a删除。通过这种交互方式,本申请实施例提供了一种便捷的字幕删除方法,提高了用户体验度。In this step, referring to FIG. 12 , FIG. 12 shows another interface diagram for binding subtitles and audio sources provided by the embodiment of the present application. For the target subtitle segment a, in response to the target subtitle segment a on the subtitle track 20 Perform the overall movement, and move the part of the target subtitle segment a to the boundary beyond the main track 30 (time 00:00), then the part beyond the boundary in the target subtitle segment a can be deleted. In the case of the boundary, the entire target subtitle segment a can be deleted. Through this interactive manner, the embodiments of the present application provide a convenient method for deleting subtitles, which improves user experience.
综上所述,本申请实施例提供的一种字幕与音源的绑定方法,包括:确定目标视频中的目标音源片段,并由目标音源片段识别得到目标字幕片段;确定目标字幕片段的字幕头部位置,与目标音源片段的音源头部位置之间的相对位置关系;在执行对目标音源片段和/或目标字幕片段的调整操作的情况下,保持相对位置关系不变。本申请可以将目标字幕片段与目标音源片段的头部位置之间的相对位置关系进行绑定,使得对主轨道的编辑过程与对字幕和音源的编辑过程相互隔离,对主轨道的编辑操作不会影响到字幕与音源的对齐关系,从而降低了音源与字幕发生错位的几率。To sum up, a method for binding subtitles and audio sources provided by an embodiment of the present application includes: determining a target audio source segment in a target video, and identifying the target subtitle segment from the target audio source segment; determining a subtitle header of the target subtitle segment The relative positional relationship between the head position of the target audio source segment and the audio source header position of the target audio source segment; in the case of performing an adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship remains unchanged. The present application can bind the relative positional relationship between the head position of the target subtitle segment and the target audio source segment, so that the editing process of the main track and the editing process of the subtitle and audio source are isolated from each other, and the editing operation of the main track is not It will affect the alignment between the subtitles and the audio source, thereby reducing the chance of misalignment between the audio source and subtitles.
图13是本申请实施例提供的一种字幕与音源的绑定装置的框图,如图13所示,包括:识别模块301、绑定模块302、保持模块303。FIG. 13 is a block diagram of an apparatus for binding subtitles and audio sources provided by an embodiment of the present application. As shown in FIG. 13 , the apparatus includes an identification module 301 , a binding module 302 , and a holding module 303 .
识别模块301,被配置为确定目标视频中的目标音源片段,并由所述目标音源片段识别得到目标字幕片段;The identification module 301 is configured to determine the target audio clip in the target video, and identify the target subtitle clip from the target audio clip;
绑定模块302,被配置为确定所述目标字幕片段的字幕头部位置,与所述目标音源片段的音源头部位置之间的相对位置关系;The binding module 302 is configured to determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio source segment;
保持模块303,被配置为在执行对所述目标音源片段和/或目标字幕片段的调整操作的情况下,保持所述相对位置关系不变。The maintaining module 303 is configured to maintain the relative positional relationship unchanged when the adjustment operation on the target audio source segment and/or the target subtitle segment is performed.
在一种可实现方式中,所述保持模块,包括:In an implementation manner, the holding module includes:
触发子模块,被配置为在调整后的目标音源片段与目标字幕片段之间不具有重叠部分的情况下,触发字幕处理操作;a triggering submodule, configured to trigger a subtitle processing operation when there is no overlap between the adjusted target audio source segment and the target subtitle segment;
绑定子模块,被配置为响应于保留字幕的字幕处理操作,将所述目标字幕片段与所述目标视频的主轨道绑定。The binding submodule is configured to bind the target subtitle segment with the main track of the target video in response to the subtitle processing operation of the retained subtitle.
在一种可实现方式中,调整后的目标音源片段与目标字幕片段之间不具有重叠部分的情况包括:执行了删除所述目标音源片段的情况,或执行了变更所述字幕头部位置和/或所述音源头部位置的情况,或分割所述目标音源片段得到多个音源子片段,且删除了所述多个音源子片段中位于头部的音源子片段的情况。In an achievable manner, the case where there is no overlap between the adjusted target audio source segment and the target subtitle segment includes: the case where the target audio source segment is deleted, or the subtitle header position and subtitle head position and /or the position of the head of the sound source, or the case of dividing the target sound source segment to obtain multiple sound source sub-segments, and deleting the sound source sub-segment located at the head of the multiple sound source sub-segments.
在一种可实现方式中,所述装置还包括:In one implementation, the apparatus further includes:
检测模块,被配置为在调整后的目标音源片段与目标字幕片段之间具有重叠部分的情况下,检测调整后的新的字幕头部位置和音源头部位置;The detection module is configured to detect the adjusted new subtitle head position and the audio source head position when there is an overlap between the adjusted target audio source segment and the target subtitle segment;
更新模块,被配置为根据所述新的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述更新后的相对位置关系不变。The updating module is configured to update the relative positional relationship according to the new subtitle header position and the audio source header position, and keep the updated relative positional relationship unchanged.
在一种可实现方式中,所述装置还包括:In one implementation, the apparatus further includes:
删除模块,被配置为在将所述目标字幕片段移动至超出所述目标视频的主轨道的边界的情况下,将所述目标字幕片段中超出所述边界的部分进行删除。The deletion module is configured to delete the part of the target subtitle segment beyond the boundary in the case of moving the target subtitle segment beyond the boundary of the main track of the target video.
在一种可实现方式中,所述目标音源片段置于音源轨道进行展示;所述目标字幕片段置于字幕轨道进行展示,所述音源轨道、所述字幕轨道与所述目标视频的主轨道采用同一时序;所述目标音源片段与所述主轨道绑定。In an implementation manner, the target audio source segment is placed on the audio source track for display; the target subtitle segment is placed on the subtitle track for display, and the audio source track, the subtitle track and the main track of the target video are displayed using The same timing; the target audio clip is bound to the main track.
在一种可实现方式中,所述保持模块,包括:In an implementation manner, the holding module includes:
更新子模块,被配置为在执行变更所述字幕头部位置和/或所述音源头部位置的调整操作的情况下,根据变更后的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述更新后的相对位置关系不变。The update sub-module is configured to update the relative subtitle head position and the audio source head position according to the changed subtitle head position and the audio source head position when the adjustment operation of changing the subtitle head position and/or the audio source head position is performed. position relationship, and keep the updated relative position relationship unchanged.
在一种可实现方式中,所述保持模块,包括:In an implementation manner, the holding module includes:
移动子模块,被配置为在执行整体移动所述目标音源片段的位置的调整操作的情况下,将所述目标字幕片段的位置跟随所述目标音源片段整体移动,并保持所述相对位置关系不变。The moving sub-module is configured to move the position of the target subtitle segment along with the target sound source segment as a whole under the condition of performing an adjustment operation for moving the position of the target audio source segment as a whole, and keep the relative position relationship unchanged. Change.
在一种可实现方式中,所述保持模块,包括:In an implementation manner, the holding module includes:
替换子模块,被配置为在执行替换所述目标音源片段的调整操作的情况下,确定所述目标字幕片段的字幕头部位置和替换后的目标音源片段的音源头部位置之间的新的相对位置关系,并保持所述新的相对位置关系不变。The replacement submodule is configured to determine a new subtitle position between the subtitle header position of the target subtitle fragment and the audio source header position of the replaced target audio source fragment when performing the adjustment operation of replacing the target audio source fragment. relative positional relationship, and keep the new relative positional relationship unchanged.
在一种可实现方式中,所述保持模块,包括:In an implementation manner, the holding module includes:
变速子模块,被配置为在根据预设的变速值,执行对所述目标音源片段的变速操作的情况下,将所述目标字幕片段按照所述变速值执行变速操作,并保持所述相对位置关系不变。The variable speed sub-module is configured to perform a variable speed operation on the target subtitle segment according to the variable speed value and maintain the relative position in the case of performing a variable speed operation on the target audio source segment according to a preset variable speed value The relationship remains unchanged.
在一种可实现方式中,所述变速子模块,包括:In an achievable manner, the transmission sub-module includes:
第一变速单元,被配置为在变速前的目标音源片段的第一时长大于所述目标字幕片段的第二时长的情况下,则将所述目标字幕片段按照所述预设的变速值执行变速操作;The first shifting unit is configured to perform shifting of the target subtitle segment according to the preset shifting value when the first duration of the target audio segment before shifting is greater than the second duration of the target subtitle segment operate;
第二变速单元,被配置为在变速前的目标音源片段的第一时长小于所述目标字幕片段的第二时长的情况下,则将所述目标字幕片段中第一时长的部分按照所述预设的变速值执行变速操作。The second speed changing unit is configured to, when the first duration of the target audio source segment before shifting is smaller than the second duration of the target subtitle segment, change the part of the first duration in the target subtitle segment according to the preset The speed change operation is performed at the set speed change value.
在一种可实现方式中,所述保持模块,包括:In an implementation manner, the holding module includes:
分割子模块,被配置为在执行分割所述目标音源片段的调整操作的情况下,得到分割后的多个音源子片段;A segmentation sub-module, configured to obtain a plurality of divided audio sub-segments when performing an adjustment operation of segmenting the target audio segment;
裁剪子模块,被配置为将所述目标字幕片段的字幕头部位置与目标音源子片段的音源头部位置建立新的相对位置关系,并保持所述新的相对位置关系不变,所述目标音源子片段为所有音源子片段中处于头部位置的音源子片段。The cropping submodule is configured to establish a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio source subsegment, and keep the new relative positional relationship unchanged. The sound source sub-segment is the sound source sub-segment at the head position among all the sound source sub-segments.
综上所述,本申请实施例提供的一种字幕与音源的绑定装置,包括:识别模块,被配 置为确定目标视频中的目标音源片段,并由目标音源片段识别得到目标字幕片段;绑定模块,被配置为确定目标字幕片段的字幕头部位置,与目标音源片段的音源头部位置之间的相对位置关系;保持模块,被配置为在执行对目标音源片段和/或目标字幕片段的调整操作的情况下,保持相对位置关系不变。本申请可以将目标字幕片段与目标音源片段的头部位置之间的相对位置关系进行绑定,使得对主轨道的编辑过程与对字幕和音源的编辑过程相互隔离,对主轨道的编辑操作不会影响到字幕与音源的对齐关系,从而降低了音源与字幕发生错位的几率。To sum up, an apparatus for binding subtitles and audio sources provided by an embodiment of the present application includes: an identification module configured to determine a target audio source segment in a target video, and identify the target subtitle segment from the target audio source segment; The determining module is configured to determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio source segment; the maintaining module is configured to perform the target audio source segment and/or the target subtitle segment. In the case of the adjustment operation, keep the relative position relationship unchanged. The present application can bind the relative positional relationship between the head position of the target subtitle segment and the target audio source segment, so that the editing process of the main track and the editing process of the subtitle and audio source are isolated from each other, and the editing operation of the main track is not It will affect the alignment between the subtitles and the audio source, thereby reducing the chance of misalignment between the audio source and subtitles.
图14是根据一示例性实施例示出的一种电子设备600的框图。例如,电子设备600可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。FIG. 14 is a block diagram of an electronic device 600 according to an exemplary embodiment. For example, electronic device 600 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, and the like.
参照图14,电子设备600可以包括以下一个或多个组件:处理组件602,存储器604,电力组件606,多媒体组件608,音源组件610,输入/输出(I/O)的接口612,传感器组件614,以及通信组件616。14, the electronic device 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, a sound source component 610, an input/output (I/O) interface 612, a sensor component 614 , and the communication component 616 .
处理组件602通常控制电子设备600的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件602可以包括一个或多个处理器620来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件602可以包括一个或多个模块,便于处理组件602和其他组件之间的交互。例如,处理组件602可以包括多媒体模块,以方便多媒体组件608和处理组件602之间的交互。The processing component 602 generally controls the overall operation of the electronic device 600, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or some of the steps of the methods described above. Additionally, processing component 602 may include one or more modules that facilitate interaction between processing component 602 and other components. For example, processing component 602 may include a multimedia module to facilitate interaction between multimedia component 608 and processing component 602.
存储器604用于存储各种类型的数据以支持在电子设备600的操作。这些数据的示例包括用于在电子设备600上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,多媒体等。存储器604可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。 Memory 604 is used to store various types of data to support operation at electronic device 600 . Examples of such data include instructions for any application or method operating on electronic device 600, contact data, phonebook data, messages, pictures, multimedia, and the like. Memory 604 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
电源组件606为电子设备600的各种组件提供电力。电源组件606可以包括电源管理系统,一个或多个电源,及其他与为电子设备600生成、管理和分配电力相关联的组件。 Power supply assembly 606 provides power to various components of electronic device 600 . Power supply components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 600 .
多媒体组件608包括在所述电子设备600和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。在屏幕包括触摸面板的情况下,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的分界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件608包括一个前置摄像头和/或后置摄像头。当电子设备600处于操作模式,如拍摄模式或多媒体模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。 Multimedia component 608 includes a screen that provides an output interface between the electronic device 600 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). In the case where the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the demarcation of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 608 includes a front-facing camera and/or a rear-facing camera. When the electronic device 600 is in an operation mode, such as a shooting mode or a multimedia mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.
音源组件610用于输出和/或输入音源信号。例如,音源组件610包括一个麦克风(MIC),当电子设备600处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风用于接收外部音源信号。所接收的音源信号可以被进一步存储在存储器604或经由通信组件616发送。在一些实施例中,音源组件610还包括一个扬声器,用于输出音源信号。The audio component 610 is used for outputting and/or inputting audio signals. For example, the sound source assembly 610 includes a microphone (MIC) for receiving external sound source signals when the electronic device 600 is in an operation mode, such as a calling mode, a recording mode, and a voice recognition mode. The received audio source signal may be further stored in memory 604 or transmitted via communication component 616 . In some embodiments, the sound source assembly 610 further includes a speaker for outputting the sound source signal.
I/O接口612为处理组件602和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 612 provides an interface between the processing component 602 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.
传感器组件614包括一个或多个传感器,用于为电子设备600提供各个方面的状态评估。例如,传感器组件614可以检测到电子设备600的打开/关闭状态,组件的相对定位,例如所述组件为电子设备600的显示器和小键盘,传感器组件614还可以检测电子设备600或电子设备600一个组件的位置改变,用户与电子设备600接触的存在或不存在,电子设备600方位或加速/减速和电子设备600的温度变化。传感器组件614可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件614还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件614还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。 Sensor assembly 614 includes one or more sensors for providing status assessment of various aspects of electronic device 600 . For example, the sensor assembly 614 can detect the open/closed state of the electronic device 600, the relative positioning of the components, such as the display and the keypad of the electronic device 600, and the sensor assembly 614 can also detect the electronic device 600 or one of the electronic devices 600. Changes in the positions of components, presence or absence of user contact with the electronic device 600 , orientation or acceleration/deceleration of the electronic device 600 and changes in the temperature of the electronic device 600 . Sensor assembly 614 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
通信组件616用于便于电子设备600和其他设备之间有线或无线方式的通信。电子设备600可以接入基于通信标准的无线网络,如WiFi,运营商网络(如2G、3G、4G或5G),或它们的组合。在一个示例性实施例中,通信组件616经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件616还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。 Communication component 616 is used to facilitate wired or wireless communication between electronic device 600 and other devices. Electronic device 600 may access wireless networks based on communication standards, such as WiFi, carrier networks (eg, 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
在示例性实施例中,电子设备600可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于实现本申请实施例提供的一种字幕与音源的绑定方法。In an exemplary embodiment, electronic device 600 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable It is implemented by a programming gate array (FPGA), a controller, a microcontroller, a microprocessor or other electronic components, and is used to implement the method for binding a subtitle and an audio source provided by the embodiment of the present application.
在示例性实施例中,还提供了一种包括指令的非临时性计算机存储介质,例如包括指令的存储器604,上述指令可由电子设备600的处理器620执行以完成上述方法。例如,所述非临时性存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer storage medium including instructions, such as a memory 604 including instructions, executable by the processor 620 of the electronic device 600 to perform the method described above. For example, the non-transitory storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
图15是根据一示例性实施例示出的一种电子设备700的框图。例如,电子设备700可以被提供为一服务器。参照图15,电子设备700包括处理组件722,其进一步包括一个或多个处理器,以及由存储器732所代表的存储器资源,用于存储可由处理组件722的执行的指令,例如应用程序。存储器732中存储的应用程序可以包括一个或一个以上的每一 个对应于一组指令的模块。此外,处理组件722被配置为执行指令,以执行本申请实施例提供的一种字幕与音源的绑定方法。FIG. 15 is a block diagram of an electronic device 700 according to an exemplary embodiment. For example, the electronic device 700 may be provided as a server. 15, electronic device 700 includes processing component 722, which further includes one or more processors, and a memory resource, represented by memory 732, for storing instructions executable by processing component 722, such as applications. An application program stored in memory 732 may include one or more modules, each corresponding to a set of instructions. In addition, the processing component 722 is configured to execute an instruction to execute a method for binding subtitles and audio sources provided by the embodiments of the present application.
电子设备700还可以包括一个电源组件726被配置为执行电子设备700的电源管理,一个有线或无线网络接口750被配置为将电子设备700连接到网络,和一个输入输出(I/O)接口758。电子设备700可以操作基于存储在存储器732的操作系统,例如Windows ServerTM,Mac OS XTM,UniXTM,LinuXTM,FreeBSDTM或类似。The electronic device 700 may also include a power supply assembly 726 configured to perform power management of the electronic device 700, a wired or wireless network interface 750 configured to connect the electronic device 700 to a network, and an input output (I/O) interface 758 . Electronic device 700 may operate based on an operating system stored in memory 732, such as Windows Server™, Mac OS X™, UniX™, LinuX™, FreeBSD™ or the like.
本申请实施例还提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现所述的字幕与音源的绑定方法。Embodiments of the present application further provide a computer program product, including a computer program, which implements the method for binding subtitles and audio sources when the computer program is executed by a processor.
本公开的所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the protection scope required by the present disclosure.
本领域技术人员在考虑说明书及实践这里公开的申请后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求指出。Other embodiments of the present application will readily occur to those skilled in the art upon consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the present application that follow the general principles of the present application and include common knowledge or conventional techniques in the art not disclosed by this disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the application being indicated by the following claims.
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。It is to be understood that the present application is not limited to the precise structures described above and illustrated in the accompanying drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (38)

  1. 一种字幕与音源的绑定方法,包括:A method for binding subtitles and audio sources, comprising:
    确定目标视频中的目标音源片段,并由所述目标音源片段识别得到目标字幕片段;Determine the target audio clip in the target video, and identify the target subtitle clip by the target audio clip;
    确定所述目标字幕片段的字幕头部位置,与所述目标音源片段的音源头部位置之间的相对位置关系;Determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment;
    在执行对所述目标音源片段和/或目标字幕片段的调整操作的情况下,保持所述相对位置关系不变。In the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship is kept unchanged.
  2. 根据权利要求1所述的方法,其中,所述在执行对所述目标音源片段和/或目标字幕片段的调整操作的情况下,保持所述相对位置关系不变,包括:The method according to claim 1, wherein, in the case of performing an adjustment operation on the target audio source segment and/or the target subtitle segment, maintaining the relative positional relationship unchanged, comprising:
    在调整后的目标音源片段与目标字幕片段之间不具有重叠部分的情况下,触发字幕处理操作;In the case that there is no overlap between the adjusted target audio source segment and the target subtitle segment, trigger the subtitle processing operation;
    响应于保留字幕的字幕处理操作,将所述目标字幕片段与所述目标视频的主轨道绑定。The target subtitle segment is bound to the main track of the target video in response to a subtitle processing operation for retained subtitles.
  3. 根据权利要求2所述的方法,其中,调整后的目标音源片段与目标字幕片段之间不具有重叠部分的情况包括:执行了删除所述目标音源片段的情况,或执行了变更所述字幕头部位置和/或所述音源头部位置的情况,或分割所述目标音源片段得到多个音源子片段,且删除了所述多个音源子片段中位于头部的音源子片段的情况。The method according to claim 2, wherein the case where there is no overlap between the adjusted target audio source segment and the target subtitle segment includes: performing deletion of the target audio source segment, or performing changing the subtitle header part position and/or the head position of the sound source, or the target sound source segment is divided to obtain multiple sound source sub-segments, and the sound source sub-segment located at the head of the multiple sound source sub-segments is deleted.
  4. 根据权利要求2所述的方法,其中,所述方法还包括:The method of claim 2, wherein the method further comprises:
    在调整后的目标音源片段与目标字幕片段之间具有重叠部分的情况下,检测调整后的新的字幕头部位置和音源头部位置;In the case that there is an overlap between the adjusted target audio source segment and the target subtitle segment, detecting the adjusted new subtitle header position and audio source header position;
    根据所述新的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述更新后的相对位置关系不变。The relative positional relationship is updated according to the new subtitle header position and the audio source header position, and the updated relative positional relationship is kept unchanged.
  5. 根据权利要求1所述的方法,其中,所述方法还包括:The method of claim 1, wherein the method further comprises:
    在将所述目标字幕片段移动至超出所述目标视频的主轨道的边界的情况下,将所述目标字幕片段中超出所述边界的部分进行删除。When the target subtitle segment is moved beyond the boundary of the main track of the target video, the portion of the target subtitle segment that exceeds the boundary is deleted.
  6. 根据权利要求1所述的方法,其中,所述目标音源片段置于音源轨道进行展示;所述目标字幕片段置于字幕轨道进行展示,所述音源轨道、所述字幕轨道与所述目标视频的主轨道采用同一时序;所述目标音源片段与所述主轨道绑定。The method according to claim 1, wherein the target audio source segment is placed on an audio source track for display; the target subtitle segment is placed on a subtitle track for display, and the audio source track, the subtitle track and the target video are displayed. The main track adopts the same timing; the target sound source segment is bound to the main track.
  7. 根据权利要求1所述的方法,其中,所述在执行对所述目标音源片段和/或目标字幕片段的调整操作的情况下,保持所述相对位置关系不变,包括:The method according to claim 1, wherein, in the case of performing an adjustment operation on the target audio source segment and/or the target subtitle segment, maintaining the relative positional relationship unchanged, comprising:
    在执行变更所述字幕头部位置和/或所述音源头部位置的调整操作的情况下,根据变更后的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述更新后的相对位置关系不变。In the case of performing an adjustment operation for changing the subtitle header position and/or the audio source header position, the relative positional relationship is updated according to the changed subtitle header position and audio source header position, and the The updated relative position relationship remains unchanged.
  8. 根据权利要求1所述的方法,其中,所述在执行对所述目标音源片段的调整操作的情况下,保持所述相对位置关系不变,包括:The method according to claim 1, wherein, in the case of performing an adjustment operation on the target sound source segment, maintaining the relative positional relationship unchanged, comprising:
    在执行整体移动所述目标音源片段的位置的调整操作的情况下,将所述目标字幕片段的位置跟随所述目标音源片段整体移动,并保持所述相对位置关系不变。In the case of performing the adjustment operation of moving the position of the target audio source segment as a whole, the position of the target subtitle segment is moved as a whole following the target audio source segment, and the relative positional relationship is kept unchanged.
  9. 根据权利要求1所述的方法,其中,所述在执行对所述目标音源片段的调整操作的情况下,保持所述相对位置关系不变,包括:The method according to claim 1, wherein, in the case of performing an adjustment operation on the target sound source segment, maintaining the relative positional relationship unchanged, comprising:
    在执行替换所述目标音源片段的调整操作的情况下,确定所述目标字幕片段的字幕头部位置和替换后的目标音源片段的音源头部位置之间的新的相对位置关系,并保持所述新的相对位置关系不变。In the case of performing the adjustment operation of replacing the target audio source segment, determine a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the replaced target audio source segment, and keep the The new relative position relationship remains unchanged.
  10. 根据权利要求1所述的方法,其中,所述在执行对所述目标音源片段的调整操作的情况下,保持所述相对位置关系不变,包括:The method according to claim 1, wherein, in the case of performing an adjustment operation on the target sound source segment, maintaining the relative positional relationship unchanged, comprising:
    在根据预设的变速值,执行对所述目标音源片段的变速操作的情况下,将所述目标字幕片段按照所述变速值执行变速操作,并保持所述相对位置关系不变。In the case of performing a speed change operation on the target audio source segment according to a preset speed change value, the target subtitle segment is subjected to a speed change operation according to the speed change value, and the relative positional relationship is kept unchanged.
  11. 根据权利要求10所述的方法,其中,所述将所述目标字幕片段按照所述变速值执行变速操作,包括:The method according to claim 10, wherein the performing a speed change operation on the target subtitle segment according to the speed change value comprises:
    在变速前的目标音源片段的第一时长大于所述目标字幕片段的第二时长的情况下,将所述目标字幕片段按照所述预设的变速值执行变速操作;In the case that the first duration of the target audio source segment before the speed change is greater than the second duration of the target subtitle segment, perform a speed change operation on the target subtitle segment according to the preset speed change value;
    在变速前的目标音源片段的第一时长小于所述目标字幕片段的第二时长的情况下,将所述目标字幕片段中第一时长的部分按照所述预设的变速值执行变速操作。When the first duration of the target audio source segment before shifting is smaller than the second duration of the target subtitle segment, a shifting operation is performed on the part of the target subtitle segment with the first duration according to the preset shifting value.
  12. 根据权利要求1所述的方法,其中,所述在执行对所述目标音源片段的调整操作的情况下,保持所述相对位置关系不变,包括:The method according to claim 1, wherein, in the case of performing an adjustment operation on the target sound source segment, maintaining the relative positional relationship unchanged, comprising:
    在执行分割所述目标音源片段的调整操作的情况下,得到分割后的多个音源子片段;In the case of performing the adjustment operation of dividing the target sound source segment, obtain a plurality of divided sound source sub-segments;
    将所述目标字幕片段的字幕头部位置与目标音源子片段的音源头部位置建立新的相对位置关系,并保持所述新的相对位置关系不变,所述目标音源子片段为所有音源子片段中处于头部位置的音源子片段。Establish a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio sub-segment, and keep the new relative positional relationship unchanged, and the target audio sub-segment is all audio sub-segments. The sound source subclip in the clip at the head position.
  13. 一种字幕与音源的绑定装置,包括:A device for binding subtitles and audio sources, comprising:
    识别模块,被配置为确定目标视频中的目标音源片段,并由所述目标音源片段识别得到目标字幕片段;an identification module, configured to determine a target sound source segment in the target video, and identify a target subtitle segment by the target sound source segment;
    绑定模块,被配置为确定所述目标字幕片段的字幕头部位置,与所述目标音源片段的音源头部位置之间的相对位置关系;a binding module, configured to determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment;
    保持模块,被配置为在执行对所述目标音源片段和/或目标字幕片段的调整操作的情况下,保持所述相对位置关系不变。The maintaining module is configured to maintain the relative positional relationship unchanged when the adjustment operation on the target audio source segment and/or the target subtitle segment is performed.
  14. 根据权利要求13所述的装置,其中,所述保持模块,包括:The apparatus of claim 13, wherein the retention module comprises:
    触发子模块,被配置为在调整后的目标音源片段与目标字幕片段之间不具有重叠部分的情况下,触发字幕处理操作;a triggering submodule, configured to trigger a subtitle processing operation when there is no overlap between the adjusted target audio source segment and the target subtitle segment;
    绑定子模块,被配置为响应于保留字幕的字幕处理操作,将所述目标字幕片段与所述目标视频的主轨道绑定。The binding submodule is configured to bind the target subtitle segment with the main track of the target video in response to the subtitle processing operation of the retained subtitle.
  15. 根据权利要求14所述的装置,其中,调整后的目标音源片段与目标字幕片段之间不具有重叠部分的情况包括:执行了删除所述目标音源片段的情况,或执行了变更所述字幕头部位置和/或所述音源头部位置的情况,或分割所述目标音源片段得到多个音源子片段,且删除了所述多个音源子片段中位于头部的音源子片段的情况。The apparatus according to claim 14, wherein the case where there is no overlap between the adjusted target audio source segment and the target subtitle segment includes: a case where the target audio source segment is deleted, or the subtitle header is changed part position and/or the head position of the sound source, or the target sound source segment is divided to obtain multiple sound source sub-segments, and the sound source sub-segment located at the head of the multiple sound source sub-segments is deleted.
  16. 根据权利要求14所述的装置,其中,所述装置还包括:The apparatus of claim 14, wherein the apparatus further comprises:
    检测模块,被配置为在调整后的目标音源片段与目标字幕片段之间具有重叠部分的情况下,检测调整后的新的字幕头部位置和音源头部位置;The detection module is configured to detect the adjusted new subtitle head position and the audio source head position when there is an overlap between the adjusted target audio source segment and the target subtitle segment;
    更新模块,被配置为根据所述新的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述更新后的相对位置关系不变。The updating module is configured to update the relative positional relationship according to the new subtitle header position and the audio source header position, and keep the updated relative positional relationship unchanged.
  17. 根据权利要求13所述的装置,其中,所述装置还包括:The apparatus of claim 13, wherein the apparatus further comprises:
    删除模块,被配置为在将所述目标字幕片段移动至超出所述目标视频的主轨道的边界的情况下,将所述目标字幕片段中超出所述边界的部分进行删除。The deletion module is configured to delete the part of the target subtitle segment beyond the boundary in the case of moving the target subtitle segment beyond the boundary of the main track of the target video.
  18. 根据权利要求13所述的装置,其中,所述目标音源片段置于音源轨道进行展示;所述目标字幕片段置于字幕轨道进行展示,所述音源轨道、所述字幕轨道与所述目标视频的主轨道采用同一时序;所述目标音源片段与所述主轨道绑定。The device according to claim 13, wherein the target audio source segment is placed on an audio source track for presentation; the target subtitle segment is placed on a subtitle track for presentation, and the audio source track, the subtitle track and the target video are The main track adopts the same timing; the target sound source segment is bound to the main track.
  19. 根据权利要求13所述的装置,其中,所述保持模块,包括:The apparatus of claim 13, wherein the retention module comprises:
    更新子模块,被配置为在执行变更所述字幕头部位置和/或所述音源头部位置的调整操作的情况下,根据变更后的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述更新后的相对位置关系不变。The update sub-module is configured to update the relative subtitle head position and the audio source head position according to the changed subtitle head position and the audio source head position when the adjustment operation of changing the subtitle head position and/or the audio source head position is performed. position relationship, and keep the updated relative position relationship unchanged.
  20. 根据权利要求13所述的装置,其中,所述保持模块,包括:The apparatus of claim 13, wherein the retention module comprises:
    移动子模块,被配置为在执行整体移动所述目标音源片段的位置的调整操作的情况下,将所述目标字幕片段的位置跟随所述目标音源片段整体移动,并保持所述相对位置关系不变。The moving sub-module is configured to move the position of the target subtitle segment along with the target sound source segment as a whole under the condition of performing an adjustment operation for moving the position of the target audio source segment as a whole, and keep the relative position relationship unchanged. Change.
  21. 根据权利要求13所述的装置,其中,所述保持模块,包括:The apparatus of claim 13, wherein the retention module comprises:
    替换子模块,被配置为在执行替换所述目标音源片段的调整操作的情况下,确定所述目标字幕片段的字幕头部位置和替换后的目标音源片段的音源头部位置之间的新的相对位置关系,并保持所述新的相对位置关系不变。The replacement submodule is configured to determine a new subtitle position between the subtitle header position of the target subtitle fragment and the audio source header position of the replaced target audio source fragment when performing the adjustment operation of replacing the target audio source fragment. relative positional relationship, and keep the new relative positional relationship unchanged.
  22. 根据权利要求13所述的装置,其中,所述保持模块,包括:The apparatus of claim 13, wherein the retention module comprises:
    变速子模块,被配置为在根据预设的变速值,执行对所述目标音源片段的变速操作的情况下,将所述目标字幕片段按照所述变速值执行变速操作,并保持所述相对位置关系不变。The variable speed sub-module is configured to perform a variable speed operation on the target subtitle segment according to the variable speed value and maintain the relative position in the case of performing a variable speed operation on the target audio source segment according to a preset variable speed value The relationship remains unchanged.
  23. 根据权利要求22所述的装置,其中,所述变速子模块,包括:The apparatus of claim 22, wherein the transmission sub-module comprises:
    第一变速单元,被配置为在变速前的目标音源片段的第一时长大于所述目标字幕片段的第二时长的情况下,则将所述目标字幕片段按照所述预设的变速值执行变速操作;The first shifting unit is configured to perform shifting of the target subtitle segment according to the preset shifting value when the first duration of the target audio segment before shifting is greater than the second duration of the target subtitle segment operate;
    第二变速单元,被配置为在变速前的目标音源片段的第一时长小于所述目标字幕片段 的第二时长的情况下,将所述目标字幕片段中第一时长的部分按照所述预设的变速值执行变速操作。The second speed changing unit is configured to, when the first duration of the target audio segment before shifting is smaller than the second duration of the target subtitle segment, change the part of the first duration in the target subtitle segment according to the preset The shifting value of , performs shifting operation.
  24. 根据权利要求13所述的装置,其中,所述保持模块,包括:The apparatus of claim 13, wherein the retention module comprises:
    分割子模块,被配置为在执行分割所述目标音源片段的调整操作的情况下,得到分割后的多个音源子片段;A segmentation sub-module, configured to obtain a plurality of divided audio sub-segments when performing an adjustment operation of segmenting the target audio segment;
    裁剪子模块,被配置为将所述目标字幕片段的字幕头部位置与目标音源子片段的音源头部位置建立新的相对位置关系,并保持所述新的相对位置关系不变,所述目标音源子片段为所有音源子片段中处于头部位置的音源子片段。The cropping submodule is configured to establish a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio source subsegment, and keep the new relative positional relationship unchanged. The sound source sub-segment is the sound source sub-segment at the head position among all the sound source sub-segments.
  25. 一种电子设备,包括:An electronic device comprising:
    处理器;processor;
    用于存储所述处理器可执行指令的存储器;a memory for storing the processor-executable instructions;
    其中,所述处理器被配置为执行所述指令,以实现以下步骤:wherein the processor is configured to execute the instructions to implement the following steps:
    确定目标视频中的目标音源片段,并由所述目标音源片段识别得到目标字幕片段;Determine the target audio clip in the target video, and identify the target subtitle clip by the target audio clip;
    确定所述目标字幕片段的字幕头部位置,与所述目标音源片段的音源头部位置之间的相对位置关系;Determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment;
    在执行对所述目标音源片段和/或目标字幕片段的调整操作的情况下,保持所述相对位置关系不变。In the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship is kept unchanged.
  26. 根据权利要求25所述的方法,其中,所述处理器还被配置为执行所述指令,以实现以下步骤:26. The method of claim 25, wherein the processor is further configured to execute the instructions to:
    在调整后的目标音源片段与目标字幕片段之间不具有重叠部分的情况下,触发字幕处理操作;In the case that there is no overlap between the adjusted target audio source segment and the target subtitle segment, trigger the subtitle processing operation;
    响应于保留字幕的字幕处理操作,将所述目标字幕片段与所述目标视频的主轨道绑定。The target subtitle segment is bound to the main track of the target video in response to a subtitle processing operation for retained subtitles.
  27. 根据权利要求26所述的电子设备,其中,调整后的目标音源片段与目标字幕片段之间不具有重叠部分的情况包括:执行了删除所述目标音源片段的情况,或执行了变更所述字幕头部位置和/或所述音源头部位置的情况,或分割所述目标音源片段得到多个音源子片段,且删除了所述多个音源子片段中位于头部的音源子片段的情况。The electronic device according to claim 26, wherein the case where there is no overlapping part between the adjusted target audio source segment and the target subtitle segment includes a case where the target audio source segment is deleted, or the subtitle is changed The head position and/or the head position of the sound source, or the case of dividing the target sound source segment to obtain multiple sound source sub-segments, and deleting the sound source sub-segment located at the head among the multiple sound source sub-segments.
  28. 根据权利要求26所述的电子设备,其中,所述处理器还被配置为执行所述指令,以实现以下步骤:The electronic device of claim 26, wherein the processor is further configured to execute the instructions to implement the steps of:
    在调整后的目标音源片段与目标字幕片段之间具有重叠部分的情况下,检测调整后的新的字幕头部位置和音源头部位置;In the case that there is an overlap between the adjusted target audio source segment and the target subtitle segment, detecting the adjusted new subtitle header position and audio source header position;
    根据所述新的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述更新后的相对位置关系不变。The relative positional relationship is updated according to the new subtitle header position and the audio source header position, and the updated relative positional relationship is kept unchanged.
  29. 根据权利要求25所述的电子设备,其中,所述处理器还被配置为执行所述指令,以实现以下步骤:26. The electronic device of claim 25, wherein the processor is further configured to execute the instructions to:
    在将所述目标字幕片段移动至超出所述目标视频的主轨道的边界的情况下,将所述目标字幕片段中超出所述边界的部分进行删除。When the target subtitle segment is moved beyond the boundary of the main track of the target video, the portion of the target subtitle segment that exceeds the boundary is deleted.
  30. 根据权利要求25所述的电子设备,其中,所述目标音源片段置于音源轨道进行展示;所述目标字幕片段置于字幕轨道进行展示,所述音源轨道、所述字幕轨道与所述目标视频的主轨道采用同一时序;所述目标音源片段与所述主轨道绑定。The electronic device according to claim 25, wherein the target audio source segment is placed on an audio source track for presentation; the target subtitle segment is placed on a subtitle track for presentation, the audio source track, the subtitle track and the target video The main track of the target audio source segment adopts the same timing; the target audio source segment is bound to the main track.
  31. 根据权利要求25所述的电子设备,其中,所述处理器还被配置为执行所述指令,以实现以下步骤:26. The electronic device of claim 25, wherein the processor is further configured to execute the instructions to:
    在执行变更所述字幕头部位置和/或所述音源头部位置的调整操作的情况下,根据变更后的字幕头部位置和音源头部位置,更新所述相对位置关系,并保持所述更新后的相对位置关系不变。In the case of performing an adjustment operation for changing the subtitle header position and/or the audio source header position, the relative positional relationship is updated according to the changed subtitle header position and audio source header position, and the The updated relative position relationship remains unchanged.
  32. 根据权利要求25所述的电子设备,其中,所述处理器还被配置为执行所述指令,以实现以下步骤:26. The electronic device of claim 25, wherein the processor is further configured to execute the instructions to:
    在执行整体移动所述目标音源片段的位置的调整操作的情况下,将所述目标字幕片段的位置跟随所述目标音源片段整体移动,并保持所述相对位置关系不变。In the case of performing the adjustment operation of moving the position of the target audio source segment as a whole, the position of the target subtitle segment is moved as a whole following the target audio source segment, and the relative positional relationship is kept unchanged.
  33. 根据权利要求25所述的电子设备,其中,所述处理器还被配置为执行所述指令,以实现以下步骤:26. The electronic device of claim 25, wherein the processor is further configured to execute the instructions to:
    在执行替换所述目标音源片段的调整操作的情况下,确定所述目标字幕片段的字幕头部位置和替换后的目标音源片段的音源头部位置之间的新的相对位置关系,并保持所述新的相对位置关系不变。In the case of performing the adjustment operation of replacing the target audio source segment, determine a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the replaced target audio source segment, and keep the The new relative position relationship remains unchanged.
  34. 根据权利要求258所述的电子设备,其中,所述处理器还被配置为执行所述指令,以实现以下步骤:258. The electronic device of claim 258, wherein the processor is further configured to execute the instructions to:
    在根据预设的变速值,执行对所述目标音源片段的变速操作的情况下,将所述目标字幕片段按照所述变速值执行变速操作,并保持所述相对位置关系不变。In the case of performing a speed change operation on the target audio source segment according to a preset speed change value, the target subtitle segment is subjected to a speed change operation according to the speed change value, and the relative positional relationship is kept unchanged.
  35. 根据权利要求34所述的电子设备,其中,所述处理器还被配置为执行所述指令,以实现以下步骤:The electronic device of claim 34, wherein the processor is further configured to execute the instructions to:
    在变速前的目标音源片段的第一时长大于所述目标字幕片段的第二时长的情况下,则将所述目标字幕片段按照所述预设的变速值执行变速操作;In the case that the first duration of the target audio source segment before the speed change is greater than the second duration of the target subtitle segment, the speed change operation is performed on the target subtitle segment according to the preset speed change value;
    在变速前的目标音源片段的第一时长小于所述目标字幕片段的第二时长的情况下,则将所述目标字幕片段中第一时长的部分按照所述预设的变速值执行变速操作。In the case that the first duration of the target audio source segment before the speed change is smaller than the second duration of the target subtitle segment, the speed change operation is performed on the part of the target subtitle segment with the first duration according to the preset speed change value.
  36. 根据权利要求25所述的电子设备,其中,所述处理器还被配置为执行所述指令,以实现以下步骤:26. The electronic device of claim 25, wherein the processor is further configured to execute the instructions to:
    在执行分割所述目标音源片段的调整操作的情况下,得到分割后的多个音源子片段;In the case of performing the adjustment operation of dividing the target sound source segment, obtain a plurality of divided sound source sub-segments;
    将所述目标字幕片段的字幕头部位置与目标音源子片段的音源头部位置建立新的相对位置关系,并保持所述新的相对位置关系不变,所述目标音源子片段为所有音源子片段中处于头部位置的音源子片段。Establish a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio sub-segment, and keep the new relative positional relationship unchanged, and the target audio sub-segment is all audio sub-segments. The sound source subclip in the clip at the head position.
  37. 一种计算机存储介质,其中,当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行以下步骤:A computer storage medium, wherein the instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the following steps:
    确定目标视频中的目标音源片段,并由所述目标音源片段识别得到目标字幕片段;Determine the target audio clip in the target video, and identify the target subtitle clip by the target audio clip;
    确定所述目标字幕片段的字幕头部位置,与所述目标音源片段的音源头部位置之间的相对位置关系;Determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment;
    在执行对所述目标音源片段和/或目标字幕片段的调整操作的情况下,保持所述相对位置关系不变。In the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship is kept unchanged.
  38. 一种计算机程序产品,包括计算机程序,其中,所述计算机程序被处理器执行时实现以下步骤:A computer program product comprising a computer program, wherein the computer program implements the following steps when executed by a processor:
    确定目标视频中的目标音源片段,并由所述目标音源片段识别得到目标字幕片段;Determine the target audio clip in the target video, and identify the target subtitle clip by the target audio clip;
    确定所述目标字幕片段的字幕头部位置,与所述目标音源片段的音源头部位置之间的相对位置关系;Determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment;
    在执行对所述目标音源片段和/或目标字幕片段的调整操作的情况下,保持所述相对位置关系不变。In the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship is kept unchanged.
PCT/CN2021/135470 2021-04-14 2021-12-03 Method for binding subtitle with audio source, and apparatus WO2022217944A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110402833.0 2021-04-14
CN202110402833.0A CN113259776B (en) 2021-04-14 2021-04-14 Binding method and device of caption and sound source

Publications (1)

Publication Number Publication Date
WO2022217944A1 true WO2022217944A1 (en) 2022-10-20

Family

ID=77220790

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/135470 WO2022217944A1 (en) 2021-04-14 2021-12-03 Method for binding subtitle with audio source, and apparatus

Country Status (2)

Country Link
CN (1) CN113259776B (en)
WO (1) WO2022217944A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113259776B (en) * 2021-04-14 2022-11-22 北京达佳互联信息技术有限公司 Binding method and device of caption and sound source
CN116193195A (en) * 2023-02-23 2023-05-30 北京奇艺世纪科技有限公司 Video processing method, device, processing equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104104986A (en) * 2014-07-29 2014-10-15 小米科技有限责任公司 Audio frequency and subtitle synchronizing method and device
WO2017191397A1 (en) * 2016-05-03 2017-11-09 Orange Method and device for synchronising subtitles
CN108401192A (en) * 2018-04-25 2018-08-14 腾讯科技(深圳)有限公司 Video stream processing method, device, computer equipment and storage medium
CN109246472A (en) * 2018-08-01 2019-01-18 平安科技(深圳)有限公司 Video broadcasting method, device, terminal device and storage medium
CN109413475A (en) * 2017-05-09 2019-03-01 北京嘀嘀无限科技发展有限公司 Method of adjustment, device and the server of subtitle in a kind of video
US20190250803A1 (en) * 2018-02-09 2019-08-15 Nedelco, Inc. Caption rate control
CN111836062A (en) * 2020-06-30 2020-10-27 北京小米松果电子有限公司 Video playing method and device and computer readable storage medium
CN111901538A (en) * 2020-07-23 2020-11-06 北京字节跳动网络技术有限公司 Subtitle generating method, device and equipment and storage medium
CN113259776A (en) * 2021-04-14 2021-08-13 北京达佳互联信息技术有限公司 Binding method and device of caption and sound source

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101808202B (en) * 2009-02-18 2013-09-04 联想(北京)有限公司 Method, system and computer for realizing sound-and-caption synchronization in video file
CN102630017B (en) * 2012-04-10 2014-03-19 中兴通讯股份有限公司 Method and system for synchronizing mobile multi-media broadcasting and subtitles
CN112287128B (en) * 2020-10-23 2024-01-12 北京百度网讯科技有限公司 Method and device for editing multimedia file, electronic equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104104986A (en) * 2014-07-29 2014-10-15 小米科技有限责任公司 Audio frequency and subtitle synchronizing method and device
WO2017191397A1 (en) * 2016-05-03 2017-11-09 Orange Method and device for synchronising subtitles
CN109413475A (en) * 2017-05-09 2019-03-01 北京嘀嘀无限科技发展有限公司 Method of adjustment, device and the server of subtitle in a kind of video
US20190250803A1 (en) * 2018-02-09 2019-08-15 Nedelco, Inc. Caption rate control
CN108401192A (en) * 2018-04-25 2018-08-14 腾讯科技(深圳)有限公司 Video stream processing method, device, computer equipment and storage medium
CN109246472A (en) * 2018-08-01 2019-01-18 平安科技(深圳)有限公司 Video broadcasting method, device, terminal device and storage medium
CN111836062A (en) * 2020-06-30 2020-10-27 北京小米松果电子有限公司 Video playing method and device and computer readable storage medium
CN111901538A (en) * 2020-07-23 2020-11-06 北京字节跳动网络技术有限公司 Subtitle generating method, device and equipment and storage medium
CN113259776A (en) * 2021-04-14 2021-08-13 北京达佳互联信息技术有限公司 Binding method and device of caption and sound source

Also Published As

Publication number Publication date
CN113259776B (en) 2022-11-22
CN113259776A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
EP3817395A1 (en) Video recording method and apparatus, device, and readable storage medium
US9786326B2 (en) Method and device of playing multimedia and medium
CN107396177B (en) Video playing method, device and storage medium
RU2666966C2 (en) Audio playback control method and device
WO2022217944A1 (en) Method for binding subtitle with audio source, and apparatus
WO2020015334A1 (en) Video processing method and apparatus, terminal device, and storage medium
US10212386B2 (en) Method, device, terminal device, and storage medium for video effect processing
CN104639977B (en) The method and device that program plays
WO2018095252A1 (en) Video recording method and device
WO2019206243A1 (en) Material display method, terminal, and computer storage medium
US20220248083A1 (en) Method and apparatus for video playing
WO2022142871A1 (en) Video recording method and apparatus
US11580742B2 (en) Target character video clip playing method, system and apparatus, and storage medium
US20220084313A1 (en) Video processing methods and apparatuses, electronic devices, storage mediums and computer programs
WO2017054354A1 (en) Information processing method and device
CN108769769B (en) Video playing method and device and computer readable storage medium
CN111147942A (en) Video playing method and device, electronic equipment and storage medium
CN112102841A (en) Audio editing method and device for audio editing
WO2022160699A1 (en) Video processing method and video processing apparatus
WO2022105341A1 (en) Video data processing method and apparatus, computer storage medium, and electronic device
CN112584208B (en) Video browsing editing method and system based on artificial intelligence
CN113905192A (en) Subtitle editing method and device, electronic equipment and storage medium
CN106060253B (en) Information presentation method and device
CN113364999B (en) Video generation method and device, electronic equipment and storage medium
CN110809184A (en) Video processing method, device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21936810

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21936810

Country of ref document: EP

Kind code of ref document: A1