WO2022217944A1

WO2022217944A1 - Method for binding subtitle with audio source, and apparatus

Info

Publication number: WO2022217944A1
Application number: PCT/CN2021/135470
Authority: WO
Inventors: 陈圣宾
Original assignee: 北京达佳互联信息技术有限公司
Priority date: 2021-04-14
Filing date: 2021-12-03
Publication date: 2022-10-20
Also published as: CN113259776B; CN113259776A

Abstract

Provided in the present application are a method for binding a subtitle with an audio source, an apparatus, an electronic device, a computer storage medium, and a computer program product, which comprise: determining a target audio source segment in a target video, and identifying and obtaining a target subtitle segment via the target audio source segment; determining a relative positional relationship between a subtitle starting position of the target subtitle segment and an audio source starting position of the target audio source segment; and if an adjustment operation is performed on the target audio source segment and/or the target subtitle segment, preserving the relative positional relationship.

Description

Method and device for binding subtitle and audio source

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based on a Chinese patent application with an application date of April 14, 2021 and an application number of 202110402833.0, and claims the priority of the Chinese patent application, the entire contents of which are incorporated herein by reference.

technical field

The embodiments of the present application relate to the field of computer technologies, and in particular, to a method, apparatus, electronic device, computer storage medium, and computer program product for binding subtitles and audio sources.

Background technique

In the video editing scene, the video object usually contains more audio clips, such as the original sound source of the main track of the video, the imported dubbing, and the dubbing audio source, etc., and the corresponding subtitle clip can also be configured for each audio clip, so as to achieve a clearer video. expressive effect.

In the related art, a video object has a video main track, and the video main track can reflect the playback timing of the content of the entire video object. The corresponding moment of the subtitle segment is bound, and the subtitle header of the subtitle segment is bound to the moment corresponding to the subtitle header on the video main track. In addition, in order to achieve a better video expression effect, it is necessary to satisfy the audio source segment and the corresponding moment. The need for subtitle clips to be aligned in playback timing.

SUMMARY OF THE INVENTION

Embodiments of the present application provide a method, apparatus, electronic device, computer storage medium, and computer program product for binding subtitles and audio sources.

In a first aspect, an embodiment of the present application provides a method for binding subtitles and audio sources, and the method includes:

Determine the target audio clip in the target video, and identify the target subtitle clip by the target audio clip;

Determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment;

In the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship is kept unchanged.

In some embodiments, maintaining the relative positional relationship unchanged in the case of performing an adjustment operation on the target audio source segment and/or the target subtitle segment includes:

In the case that there is no overlap between the adjusted target audio source segment and the target subtitle segment, trigger the subtitle processing operation;

The target subtitle segment is bound to the main track of the target video in response to a subtitle processing operation for retained subtitles.

In some embodiments, the case where there is no overlap between the adjusted target audio source segment and the target subtitle segment includes: performing deletion of the target audio source segment, or performing changing the position of the subtitle header and/or performing The case of the head position of the sound source, or the case of dividing the target sound source segment to obtain multiple sound source sub-segments, and deleting the sound source sub-segment located at the head of the multiple sound source sub-segments.

In some embodiments, the method further includes:

In the case that there is an overlap between the adjusted target audio source segment and the target subtitle segment, detecting the adjusted new subtitle header position and audio source header position;

The relative positional relationship is updated according to the new subtitle header position and the audio source header position, and the updated relative positional relationship is kept unchanged.

In some embodiments, the method further includes:

When the target subtitle segment is moved beyond the boundary of the main track of the target video, the portion of the target subtitle segment that exceeds the boundary is deleted.

In some embodiments, the target audio source segment is displayed on an audio source track; the target subtitle segment is displayed on a subtitle track, and the audio source track, the subtitle track and the main track of the target video use the same timing sequence ; The target sound source segment is bound to the main track.

In the case of performing an adjustment operation for changing the subtitle header position and/or the audio source header position, the relative positional relationship is updated according to the changed subtitle header position and audio source header position, and the The updated relative position relationship remains unchanged.

In some embodiments, maintaining the relative positional relationship unchanged in the case of performing an adjustment operation on the target sound source segment includes:

In the case of performing the adjustment operation of moving the position of the target audio source segment as a whole, the position of the target subtitle segment is moved as a whole following the target audio source segment, and the relative positional relationship is kept unchanged.

In the case of performing the adjustment operation of replacing the target audio source segment, determine a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the replaced target audio source segment, and keep the The new relative position relationship remains unchanged.

In the case of performing a speed change operation on the target audio source segment according to a preset speed change value, the target subtitle segment is subjected to a speed change operation according to the speed change value, and the relative positional relationship is kept unchanged.

In some embodiments, performing a speed change operation on the target subtitle segment according to the speed change value includes:

Under the situation that the first duration of the target audio source segment before the variable speed is greater than the second duration of the target subtitle segment, then the target subtitle segment is performed according to the preset variable speed value.

In the case that the first duration of the target audio source segment before the speed change is smaller than the second duration of the target subtitle segment, the speed change operation is performed on the part of the target subtitle segment with the first duration according to the preset speed change value.

In the case of performing the adjustment operation of dividing the target sound source segment, obtain a plurality of divided sound source sub-segments;

Establish a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio sub-segment, and keep the new relative positional relationship unchanged, and the target audio sub-segment is all audio sub-segments. The sound source subclip in the clip at the head position.

In a second aspect, an embodiment of the present application provides an apparatus for binding subtitles and audio sources, and the apparatus includes:

an identification module, configured to determine a target sound source segment in the target video, and identify a target subtitle segment by the target sound source segment;

a binding module, configured to determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment;

The maintaining module is configured to maintain the relative positional relationship unchanged when the adjustment operation on the target audio source segment and/or the target subtitle segment is performed.

In some embodiments, the retention module includes:

a triggering submodule, configured to trigger a subtitle processing operation when there is no overlap between the adjusted target audio source segment and the target subtitle segment;

The binding submodule is configured to bind the target subtitle segment with the main track of the target video in response to the subtitle processing operation of the retained subtitle.

In some embodiments, the apparatus further includes:

The detection module is configured to detect the adjusted new subtitle head position and the audio source head position when there is an overlap between the adjusted target audio source segment and the target subtitle segment;

The updating module is configured to update the relative positional relationship according to the new subtitle header position and the audio source header position, and keep the updated relative positional relationship unchanged.

In some embodiments, the apparatus further includes:

The deletion module is configured to delete the part of the target subtitle segment beyond the boundary in the case of moving the target subtitle segment beyond the boundary of the main track of the target video.

In some embodiments, the target audio source segment is displayed on an audio source track; the target subtitle segment is displayed on a subtitle track, and the audio source track, the subtitle track and the main track of the target video use the same timing sequence ; The target audio source segment is bound to the main track.

In some embodiments, the retention module includes:

The update sub-module is configured to update the relative subtitle head position and the audio source head position according to the changed subtitle head position and the audio source head position when the adjustment operation of changing the subtitle head position and/or the audio source head position is performed. position relationship, and keep the updated relative position relationship unchanged.

In some embodiments, the retention module includes:

The moving sub-module is configured to move the position of the target subtitle segment along with the target sound source segment as a whole under the condition of performing an adjustment operation for moving the position of the target audio source segment as a whole, and keep the relative position relationship unchanged. Change.

In some embodiments, the retention module includes:

The replacement submodule is configured to determine a new subtitle position between the subtitle header position of the target subtitle fragment and the audio source header position of the replaced target audio source fragment when performing the adjustment operation of replacing the target audio source fragment. relative positional relationship, and keep the new relative positional relationship unchanged.

In some embodiments, the retention module includes:

The variable speed sub-module is configured to perform a variable speed operation on the target subtitle segment according to the variable speed value and maintain the relative position in the case of performing a variable speed operation on the target audio source segment according to a preset variable speed value The relationship remains unchanged.

In some embodiments, the transmission sub-module includes:

The first shifting unit is configured to perform shifting of the target subtitle segment according to the preset shifting value when the first duration of the target audio segment before shifting is greater than the second duration of the target subtitle segment operate;

The second speed changing unit is configured to, when the first duration of the target audio source segment before shifting is smaller than the second duration of the target subtitle segment, change the part of the first duration in the target subtitle segment according to the preset The speed change operation is performed at the set speed change value.

In some embodiments, the retention module includes:

A segmentation sub-module, configured to obtain a plurality of divided audio sub-segments when performing an adjustment operation of segmenting the target audio segment;

The cropping submodule is configured to establish a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio source subsegment, and keep the new relative positional relationship unchanged. The sound source sub-segment is the sound source sub-segment at the head position among all the sound source sub-segments.

In a third aspect, an embodiment of the present application further provides an electronic device, including a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement the subtitles Binding to the sound source.

In a fourth aspect, an embodiment of the present application further provides a storage medium, when the instructions in the computer-readable storage medium are executed by the processor of the electronic device, the electronic device can perform the binding between the subtitles and the audio source. .

In a fifth aspect, an embodiment of the present application further provides a computer program product, including a computer program, which realizes the binding of the subtitles and the audio source when the computer program is executed by the processor.

In the embodiment of the present application, the present application includes: determining the target audio source segment in the target video, and identifying the target subtitle segment by the target audio source segment; determining the position of the subtitle header of the target subtitle segment and the position of the audio source header of the target audio source segment The relative positional relationship between them; in the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship remains unchanged. The present application can bind the relative positional relationship between the head position of the target subtitle segment and the target audio source segment, so that the editing process of the main track and the editing process of the subtitle and audio source are isolated from each other, and the editing operation of the main track is not It will affect the alignment between the subtitles and the audio source, thereby reducing the chance of misalignment between the audio source and subtitles.

Description of drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the embodiments. The drawings are for illustrative purposes only and are not to be considered limiting of the application. Also, the same components are denoted by the same reference numerals throughout the drawings. In the attached image:

1 is a flowchart of steps of a method for binding subtitles and audio sources provided by an embodiment of the present application;

2 is a binding interface diagram of a subtitle and audio source provided by an embodiment of the present application;

Fig. 3 is another binding interface diagram of subtitles and audio sources provided by an embodiment of the present application;

4 is a flowchart of steps of a method for binding subtitles and audio sources provided by an embodiment of the present application;

Fig. 5 is another binding interface diagram of subtitles and audio sources provided by an embodiment of the present application;

Fig. 6 is another binding interface diagram of subtitles and audio sources provided by an embodiment of the present application;

Fig. 7 is another binding interface diagram of subtitles and audio sources provided by an embodiment of the present application;

8 is an interface diagram of a subtitle processing operation provided by an embodiment of the present application;

Fig. 9 is another binding interface diagram of subtitles and audio sources provided by an embodiment of the present application;

FIG. 10 is another interface diagram of binding between subtitles and audio sources provided by an embodiment of the present application;

11 is another interface diagram for binding subtitles and audio sources provided by an embodiment of the present application;

Fig. 12 is another binding interface diagram of subtitles and audio sources provided by an embodiment of the present application;

13 is a block diagram of an apparatus for binding subtitles and audio sources provided by an embodiment of the present application;

FIG. 14 is a logical block diagram of an electronic device according to an embodiment of the present application; and

FIG. 15 is a logical block diagram of an electronic device according to another embodiment of the present application.

Detailed ways

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the application will be more thoroughly understood, and will fully convey the scope of the application to those skilled in the art.

FIG. 1 is a flowchart of steps of a method for binding subtitles and audio sources provided by an embodiment of the present application. The method may be executed by a server, a processor, a vehicle-mounted device, a mobile device, a computing device, and the like. As shown in Figure 1, the method may include:

Step 101: Determine the target audio source segment in the target video, and identify the target subtitle segment from the target audio source segment.

A video can usually include one or more audio clips. Audio clips refer to the sound clips that appear in the video and are a kind of timbre resource. The types of audio clips can include main track original sound source, picture-in-picture original sound source, inserted music / recording / Dubbing sound source, etc., where the main track original sound source is the original sound content of the video; the picture-in-picture original sound source is the original sound content of a picture-in-picture video inserted into the video; inserting a music/recording/dubbing sound source refers to additionally inserting the video music/recording/dubbing.

In the embodiment of the present application, the target audio clip can be extracted by analyzing the content of the target video, and the target audio clip can be any clip of all the audio clips included in the target video. After the target audio clip is extracted, The target sound source segment can be subjected to speech recognition to obtain the corresponding text, and the recognized text content can be used as the target subtitle segment identified by the target audio source segment. In the process of playing the target video, the target subtitle segment can be used as the target audio source segment. display subtitles.

For example, if a video contains two parts of audio source clips, one part is the original soundtrack of the video, and the other part is the inserted narration dubbing, for the two parts of the audio source clips, the subtitle clips corresponding to the original soundtrack can be obtained through speech recognition, as well as the narration dubbing. The resulting subtitle clip.

Step 102: Determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio source segment.

In practical applications, the video has a fixed playback duration, and the corresponding playback timing can be obtained from the playback duration. For example, the video is played from the starting point 0 minutes and 0 seconds to the end point of 10 minutes and 30 seconds. Track, the main video track is composed of multiple video frames of the video, and the main video track can display the video in the form of a frame sequence stream according to the playback timing of the video. Users can operate the main track of the video, so as to conveniently view, select and edit the content in different positions of the video.

In addition, in order to achieve a better playback effect, the target audio source segment and the corresponding target subtitle segment need to be aligned. A common situation where the audio source subtitles are not aligned is that the audio source and the subtitles in the picture are misaligned, such as lyrics and music. Misalignment . Among them, the meaning of the alignment does not only include that the pronunciation moment of a word in the audio source must completely overlap with the display moment of the text corresponding to the word in the subtitle. When the duration of the subtitles is longer than the duration of the audio source, the subtitles can be displayed first, and then the corresponding audio source is played; when the duration of the subtitles is shorter than the duration of the audio source, the audio source can be played first, and then the corresponding audio source is displayed. subtitle.

In the related art, both the audio source segment and the subtitle segment are bound to the main track of the video. Therefore, with the normal editing operation of the main track by the user, the subtitle segment will inevitably be affected, resulting in the possibility of misalignment between the audio source and the subtitle. Huge improvements.

In the embodiment of the present application, in order to reduce the probability of the subtitles and the corresponding audio source being misaligned, the target subtitle segment and the corresponding target audio source segment can be bound, so that the operation on the main track will not affect the alignment between the subtitles and the audio source. Reduced the chance of misalignment between audio source and subtitles.

In some embodiments, the target subtitle segment has a subtitle header position, and the target audio source segment has an audio source header position. The position of the subtitle head can be understood as the time corresponding to the starting point of the target subtitle segment on the main track; the position of the audio source head can be understood as the time corresponding to the starting point of the target audio segment on the main track; Binding with the corresponding target audio clip can be achieved by binding the subtitle head position of the target subtitle clip with the audio head position of the target audio clip, and maintaining the relative positional relationship between the two head positions. Change.

For example, referring to FIG. 2 , which shows a binding interface diagram of a subtitle and an audio source provided by an embodiment of the present application, in the case where two target audio source segments A and B are obtained by identifying the target video, the two target The audio source segments A and B perform speech recognition to obtain a target subtitle segment a corresponding to the target audio source segment A and a target subtitle segment b corresponding to the target audio segment B. The target audio source segments A and B can be displayed in the audio source track 10, and the target subtitle segments a and b can be displayed in the subtitle track 20. The audio source track, subtitle track and the main track of the target video use the same timing sequence.

Among them, in order to achieve the purpose of explaining the target audio segment A in advance, the target subtitle segment a can be displayed when the target video starts playing, and the binding between the target audio segment A and the target subtitle segment a can be: Determine the target audio segment The relative positional relationship between the audio source head position (00:10) of A and the subtitle head position (00:00) of the target subtitle segment a, and keep the relative positional relationship unchanged, so as to achieve the purpose of aligning the two ; Aiming at the requirement of strict phonetic alignment between the target audio source fragment B and the target subtitle fragment b, the binding of the target audio source fragment A and the target subtitle fragment a can be: determine the audio source head position (00:50) of the target audio source fragment B and the target audio source fragment B. The relative positional relationship between the subtitle header positions (00:50) of the subtitle segment b, and the relative positional relationship is maintained unchanged, so as to achieve the purpose of aligning the two.

Step 103: In the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, keep the relative positional relationship unchanged.

In this embodiment of the present application, after binding the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment, the subsequent adjustment operations on the target audio segment and the target subtitle segment are as long as It does not involve the adjustment of the position of the subtitle head and the head position of the audio source, and will not affect the binding of the above-mentioned relative position relationship, and the target audio source clip and target subtitle clip will be adjusted according to actual needs, except for the head position. During operation, the above relative positional relationship can also be kept unchanged, so as to achieve the purpose of aligning subtitles and audio sources.

In the related art, for example, when the audio source segment is not the original sound source of the main track, and operations such as speed change and trimming are performed on the main track, the corresponding subtitle segment of the audio source segment will also undergo changes such as speed change and trimming. If it is the original sound source of the non-main track, it will not follow the change, resulting in serious subtitle misalignment of the audio source.

Referring to FIG. 2 and FIG. 3 , and FIG. 3 , it shows another interface diagram for binding subtitles and audio sources provided by an embodiment of the present application. The target audio source segments A and B are both non-main track original audio sources, and the user In the case where the area 31 in the main track 30 of the video is double-speeded, since the target audio clip A is a non-main track original sound source, the target audio clip A will not follow the variable speed, and because the target subtitle clip a is not The main track is bound, but the head position is bound to the target subtitle fragment A, so the target subtitle fragment a will not follow the speed change, so that the target subtitle fragment a is bound to the head position of the target subtitle fragment A, and maintains The relative positional relationship between the head positions remains unchanged, and the purpose of alignment is achieved. In addition, in the case where the target audio source segment A is the main track original sound source, in response to the user performing a 2-fold speed change on the area 31 in the main track 30 of the video, the target audio source segment A will also follow the 2-fold speed change, which is implemented in this application. For example, according to the speed change operation performed on the target audio source segment A, the target subtitle segment a can be synchronously changed according to the speed change value of 2 times, so as to achieve the purpose of alignment.

For the target audio clip B, when the user has trimmed the area 32 in the main track 30 of the video, since the target audio clip B is a non-main track original sound source, the target audio clip B will not follow the speed change, and because the target audio clip B The subtitle segment b is now not bound to the main track, but is bound to the head position of the target subtitle segment B, so the target subtitle segment B will not follow the speed change, so that the target subtitle segment b and the head of the target subtitle segment B The position is bound, and the relative positional relationship between the head positions is maintained unchanged to achieve the purpose of alignment. In addition, in the case that the target audio source segment B is the main track original sound source, in response to the user trimming the region 32 in the main track 30 of the video, the part corresponding to the region 32 of the target subtitle segment b in the related art will also be trimmed, As a result, part of the subtitle information is missing. In the embodiment of the present application, although the part corresponding to the region 32 of the target audio source segment B will be cropped, the part corresponding to the region 32 of the target subtitle segment B will not be cropped, thereby avoiding the subtitle part. The lack of information ensures the integrity of the subtitles.

To sum up, a method for binding subtitles and audio sources provided by an embodiment of the present application includes: determining a target audio source segment in a target video, and identifying the target subtitle segment from the target audio source segment; determining a subtitle header of the target subtitle segment The relative positional relationship between the head position of the target audio source segment and the audio source header position of the target audio source segment; in the case of performing an adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship remains unchanged. The present application can bind the relative positional relationship between the head position of the target subtitle segment and the target audio source segment, so that the editing process of the main track and the editing process of the subtitle and audio source are isolated from each other, and the editing operation of the main track is not It will affect the alignment between the subtitles and the audio source, thereby reducing the chance of misalignment between the audio source and subtitles.

4 is a flowchart of steps of a method for binding subtitles and audio sources provided by an embodiment of the present application, and the method may be executed by a server, a processor, a vehicle-mounted device, a mobile device, a computing device, and the like. As shown in Figure 4, the method may include:

Step 201: Determine the target audio source segment in the target video, and identify the target subtitle segment from the target audio source segment.

For this step, reference may be made to the foregoing step 101, which will not be repeated here.

Step 202: Determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio source segment.

For this step, reference may be made to the foregoing step 102, which will not be repeated here.

Step 203: In the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, keep the relative positional relationship unchanged.

For this step, reference may be made to the foregoing step 103, which will not be repeated here.

In some embodiments, in one implementation, step 203 may include:

Sub-step 2031: In the case that there is no overlap between the adjusted target audio source segment and the target subtitle segment, trigger a subtitle processing operation.

In this embodiment of the present application, the user can adjust the position, length, etc. of the target audio source segment and/or the target subtitle segment based on the alignment requirements of the subtitle and audio source, thereby changing the relationship between the entire target audio source segment and the entire target subtitle segment. After the adjustment operation is completed, if there is no overlap between the target audio source segment and the target subtitle segment, it can be considered that the target audio source segment and the target subtitle segment are completely misplaced, and the subtitle processing operation is triggered. Subtitle processing operation It is used to delete the target subtitle segment at this time, so as to avoid the influence of misplaced subtitles, or to retain the target subtitle segment so that the target subtitle segment can be reused by subsequent editing operations.

For the case where the deletion of the target audio source segment is performed, referring to FIG. 5 , which shows another interface diagram for binding subtitles and audio sources provided by the embodiment of the present application, in the state shown in FIG. 2 , the deletion of the target audio source is performed. In the case of the operation of segment A, there is no overlap between the target subtitle segment a and the target audio source segment A after deletion. At this time, the subtitle processing operation for the target subtitle segment a can be triggered, thereby realizing the deletion of the entire audio source segment. When the audio source is completely misplaced, the process of processing the misplaced subtitle segment.

For the case where the position of the subtitle head and/or the head position of the audio source is changed, refer to FIG. 6 , which shows another interface diagram for binding subtitles and audio sources provided by the embodiment of the present application. In the state shown in FIG. 2, when the operation of adjusting the head position of the target audio clip B and the target subtitle clip B is performed, the head position of the target audio clip B is adjusted to the position corresponding to the time of 01:20, and the target The head position of subtitle segment b is adjusted to the position corresponding to the time of 00:40. Therefore, there is no overlap between the target subtitle segment b and the target audio source segment B after adjustment. At this time, the subtitle processing operation for the target subtitle segment b can be triggered. In this way, the process of processing the dislocated subtitle segment in the case that the subtitle and the audio source are completely dislocated after adjusting the head position of the audio source and the subtitle is realized.

For the situation in which the target sound source segment is divided to obtain multiple sound source sub-segments, and the sound source sub-segment located at the head of the multiple sound source sub-segments is deleted, referring to FIG. The interface diagram of the binding interface between subtitles and audio sources. Based on the state shown in Figure 2, the operation of first dividing the target audio source segment A into three audio source sub-segments, and then deleting the audio source sub-segment at the head position is performed. After the adjustment, it is considered that the target subtitle segment a lacks the audio source head position with which the binding relationship is established, so that there is no overlap between the adjusted target audio source segment A and the target subtitle segment a. At this time, the target subtitle segment can be triggered. Subtitle processing operation of segment a, so as to realize the process of processing the misplaced subtitle segment when the subtitle and audio source are completely misaligned.

Sub-step 2032: Bind the target subtitle segment to the main track of the target video in response to the subtitle processing operation of the retained subtitle.

In the embodiment of the present application, in the case of responding to the subtitle processing operation of the reserved subtitle, the target subtitle segment can be bound to the main track of the target video for subsequent processing of the reserved target subtitle segment. In some embodiments, the subtitle header position of the target subtitle segment can be bound with the corresponding moment of the subtitle header position on the main track of the target video, thereby achieving the purpose of temporarily retaining the target subtitle segment. For example, for the state shown in FIG. 6, the position of the subtitle header (00:40) of the adjusted target subtitle segment b can be bound to the position corresponding to the time 00:40 on the main track.

Further, after the target subtitle segment is bound to the main track of the target video, subsequent processing operations performed on the target subtitle segment may include the following scenarios: For example, in a scenario, for a male voice source clip, the user wants to If the audio clip is replaced with the machine sound source clip, the user can delete the entire male voice source clip, and bind the subtitle clip corresponding to the male voice source clip to the main track, and then wait for the machine sound source clip to be generated, and then insert the machine sound source clip into The position of the original male voice source clip makes the subtitle clip re-establish the alignment relationship with the machine voice source clip. In another scenario, the audio source segment can also be deleted, and only the corresponding subtitle segment is kept bound to the main track, so that only the content of the segment is described in the form of text during playback.

It should be noted that, in the case of a subtitle processing operation in response to subtitle deletion, the target subtitle segment may be deleted. That is, when the user thinks that the misplaced target subtitle segment is of no use value, the subtitle processing operation of deleting subtitles can also be performed, so as to delete the target subtitle segment and avoid the interference caused by the misplaced target subtitle segment.

In this embodiment of the present application, the triggered subtitle processing operation may be provided in the form of an interface. For example, refer to FIG. 8 , which shows an interface diagram of a subtitle processing operation provided by the embodiment of the present application, including a subtitle processing operation for realizing subtitles. Controls for handling operations, including reminder text: "This audio recognizes subtitles, delete them together?", "Remove recognized subtitles and audio" button, and "Remove audio only" button. When the user triggers the "Delete Recognized Subtitles and Audio" button, the audio and the corresponding subtitles will be deleted together. When the user triggers the "Delete Audio Only" button, only the audio will be deleted, and the corresponding subtitles of the audio will be deleted. Bind the main track.

In addition, in practical applications, the subsequent subtitle processing algorithm is divided into a subtitle processing part for speech recognition and a processing part for manually adding subtitles. Therefore, in order to avoid conflicts between the two parts of the algorithm, in the embodiment of the present application, the audio source segment is composed of The attribute of the recognized subtitle segment can be set as the speech recognition subtitle attribute by default, so that the processing part of the speech recognition subtitle in the algorithm can only process the subtitle segment with the speech recognition subtitle attribute. When the subtitle clip is bound to the main track, it can be considered that the audio source clip corresponding to the subtitle clip is deleted or seriously misplaced. At this time, the attribute of the subtitle clip can be changed to add subtitles manually, that is, the subtitle clip can be regarded as a subtitle clip. The subtitles manually added by the user are processed, so that the processing part of the manually added subtitles in the algorithm can only process the subtitle segments with the attribute of manually added subtitles.

Further, the user can also set the adjustment status of the subtitle clips and audio source clips at the current moment to the old draft. At this time, all target subtitle clips can be bound to the main track, and an old draft file can be created. If the clip is bound to the main track, the old draft file only contains the subtitle information and the main track information, and does not contain the audio source information with a large file size, thus saving storage resources.

In some embodiments, step 203 may further include:

Sub-step 2033: In the case that there is an overlap between the adjusted target audio source segment and the target subtitle segment, detect the adjusted new subtitle header position and audio source header position.

In this embodiment of the present application, the user can adjust the position, length, etc. of the target audio source segment and/or the target subtitle segment based on the alignment requirements of the subtitle and audio source, thereby changing the relationship between the entire target audio source segment and the entire target subtitle segment. After the adjustment operation, if there is an overlap between the target audio source segment and the target subtitle segment, it can be considered that the target audio source segment and the target subtitle segment are in the user-adjusted alignment state, and the adjustment can be detected at this time. After the new subtitle head position and audio source head position.

Sub-step 2034: Update the relative positional relationship according to the new subtitle header position and audio source header position, and keep the updated relative positional relationship unchanged.

In this step, based on the new subtitle head position and the audio source head position, the original relative positional relationship can be updated, and the updated relative positional relationship can be kept unchanged, so as to satisfy the user's adjustment of the alignment state of the subtitles and the audio source. need.

For example, referring to FIG. 2 , before the adjustment operation, the head positions of the target audio source segment A and the target subtitle segment a overlap with the time 00:00, and after the adjustment operation, the audio head position of the target audio segment A is at the time 00:10 , when the subtitle head position of the target subtitle segment a is at 00:00 time, according to the adjusted result, determine the new audio source header position (00:10) of the target audio source segment A and the new audio source head position (00:10) of the target subtitle segment a The relative positional relationship between the subtitle header positions (00:00), and the relative positional relationship is maintained unchanged.

3 , in this embodiment of the present application, when editing a target video, the editing interface may include three operable adjustment tracks: a main track 30 , an audio source track 10 and a subtitle track 20 of the target video. Among them, the main track 30 can display the target video in the form of a frame sequence stream according to the playback sequence of the target video, and the user can operate the main track 30 to conveniently view, select and edit the content of the target video at different positions, and the audio source track 10 is used to carry and display audio clips, and the user can adjust the audio clips on the audio track 10; the subtitle track 20 is used to carry and display subtitle clips, and the user can adjust the subtitle clips on the subtitle track 20. Through three operable adjustment tracks, it provides users with rich adjustment and interaction methods.

Further, the target sound source segment can be bound with the main track 30. In some embodiments, the position of the sound source head of the target sound source segment can be compared with the corresponding moment of the sound source head position on the main track 30 of the target video. Binding, so as to achieve the purpose of binding the target audio clip and the main track.

In addition, when the target sound source clip is bound to the main track, the adjustment operation on the main track will also affect the length and position of the target sound source clip. For the case where the target sound source clip is the original sound source of the main track, when adjusting operations such as trimming, speed change, and deletion are performed on the area corresponding to the target sound source clip on the main track, the target sound source clip will also change accordingly. For the case where the target sound source clip is not the original sound source of the main track, when trimming, shifting, deleting and other adjustment operations are performed in the area corresponding to the target sound source clip on the main track, as long as the adjustment operation does not change the head position of the target sound source clip on the main track The position of the corresponding time, the target sound source clip will not change. For example, if the tail of the area corresponding to the target sound source clip on the main track is trimmed, the target sound source clip will not change, while the corresponding target sound source clip on the main track The head of the region is cropped, and the target audio clip is deleted.

In some embodiments, in another implementation manner of the embodiments of the present application, step 203 may include:

Sub-step 2035, in the case of performing the adjustment operation of changing the position of the subtitle header and/or the position of the audio source header, update the relative positional relationship according to the changed subtitle header position and the audio source header position, And keep the updated relative position relationship unchanged.

In this embodiment of the present application, the user may adjust the head position of the target audio source segment and/or the target subtitle segment based on the alignment requirements of the subtitle and the audio source, thereby changing the alignment between the entire target audio source segment and the entire target subtitle segment. The relative positional relationship, after the adjustment operation is completed, the position of the subtitle head and the audio source head position can be changed, the relative positional relationship can be updated, and the updated relative positional relationship can be kept unchanged, so as to satisfy the user's requirements for subtitles and audio source. The adjustment requirements of the alignment state.

In addition, after the adjustment operation is completed, it is also possible to judge whether there is an overlap between the adjusted target audio source segment and the target subtitle segment, and perform corresponding operations according to the judgment result. For the specific logic, refer to the description of the above sub-step 2031-sub-step 2032 , and will not be repeated here.

Sub-step 2036 , in the case of performing the adjustment operation of moving the position of the target audio source segment as a whole, move the position of the target subtitle segment along with the target audio source segment as a whole, and keep the relative positional relationship unchanged.

In this step, referring to FIG. 2 and FIG. 9 , FIG. 9 shows another interface diagram for binding subtitles and audio sources provided by the embodiment of the present application. The adjustment operation is to convert the target audio source segments A and B in FIG. 2 . When the positions of the subtitles are changed as a whole, Figure 9 shows the state after the exchange. At this time, the positions of the target subtitle clips a and b move as a whole with the corresponding target audio clips, and keep the relative position relationship unchanged, which achieves an effect. , that is, when the user moves the audio source segment as a whole, the subtitle segment corresponding to the audio source segment will also move along with it, which saves the user time for adjusting the alignment of the audio source segment and the subtitle segment.

Sub-step 2037: In the case of performing the adjustment operation of replacing the target audio source segment, determine a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the replaced target audio source segment. , and keep the new relative position relationship unchanged.

In this embodiment of the present application, the entire segment of the target audio source segment may be replaced, and the new relative positional relationship between the audio source header positions of the target audio source segment after the subtitle header position of the target subtitle segment is replaced, thereby ensuring that Align between the replaced subtitles and the audio source.

It should be noted that the replaced target sound source segment may be consistent with the position and duration of the pre-replacement target sound source segment. In addition, the replaced target sound source segment may also be inconsistent with the position and/or duration of the pre-replacement target sound source segment. , this embodiment of the present application does not limit this. After the replacement operation is completed, it is also possible to judge whether there is an overlap between the target audio source segment and the target subtitle segment after the replacement, and perform corresponding operations according to the judgment result. For the specific logic, refer to the above subsections. The description of step 2031 to sub-step 2032 will not be repeated here.

Sub-step 2038: In the case of performing a speed change operation on the target audio source segment according to a preset speed change value, perform a speed change operation on the target subtitle segment according to the speed change value, and keep the relative positional relationship unchanged .

Video speed change is a common function in video/audio editing scenarios. By selecting the speed change value, the user can fast or slow the video/audio according to the ratio corresponding to the speed change value. For example, in response to the selection of the speed change value of 2 times, the video/audio may be fast-forwarded at 2 times the speed, so that the duration of the video/audio is shortened by half.

In the embodiment of the present application, in the case of a variable speed operation on the target audio clip according to a preset variable speed value, the target subtitle clip can be subjected to a variable speed operation according to the variable speed value, and the head position of the target audio clip and the target audio clip can be maintained. The relative positional relationship between the head positions of the subtitle segments is unchanged. For example, when the target audio clip is played at a speed change value of 2 times, the target subtitle clip can also be played at a speed change value of 2 times, so as to meet the purpose of aligning the subtitles with the audio source after the speed change adjustment operation.

Further, the target sound source clip is bound to the main track of the target video. When the target sound source clip is the main track original sound source, a variable speed operation can be performed on the area corresponding to the target sound source clip on the main track, so that the target sound source clip can achieve variable speed. , in addition, you can also directly perform a variable speed operation on the target sound source clip on the sound source track, so that the target sound source clip can achieve variable speed; when the target sound source clip is a non-main track original sound source, the variable speed operation on the main track will not cause the target sound source clip to change speed. The sound source clip has a corresponding variable speed effect, so you can directly perform a variable speed operation on the target sound source clip on the sound source track, so that the target sound source clip can achieve variable speed.

In some embodiments, sub-step 2038 may include:

Sub-step A1: In the case that the first duration of the target audio source segment before shifting is greater than the second duration of the target subtitle segment, perform a shifting operation on the target subtitle segment according to the preset shifting value.

In the embodiments of the present application, the speed change logic may be optimized by further comparing the size of the first duration of the target audio source segment before the speed change with the second duration of the target subtitle segment. When the first duration of the target audio clip is greater than the second duration of the target subtitle clip, whether the target audio clip and the target subtitle clip are played at double speed or slow down according to the same variable speed value, the target audio clip after the variable speed The duration of the target subtitle clip is longer than the duration of the target subtitle clip after the variable speed, and the duration of the target audio clip is longer than the duration setting of the target subtitle clip, which will provide a better playback effect and better meet the user's viewing habits. Therefore, when the first duration is longer than In the case of the second duration, the target subtitle segment may be subjected to a speed change operation according to the preset speed change value.

For example, referring to FIG. 10 , FIG. 10 shows another interface diagram for binding subtitles and audio sources provided by an embodiment of the present application, which shows the target audio source segments C and D before the speed change, and the target subtitle segment before the speed change. c, d, for the target audio segment C and the target subtitle segment c, since the first duration of the target audio segment C before the speed change is greater than the second duration of the target subtitle segment c, the target audio segment C is changed according to 2 times the speed change value. After that, the target subtitle segment c can also follow the 2-fold speed change.

Sub-step A2: In the case where the first duration of the target audio source segment before the speed change is less than the second duration of the target subtitle segment, the part of the first duration in the target subtitle segment is adjusted according to the preset speed change value. Perform a shifting operation.

In the embodiment of the present application, in the case where the first duration of the target audio segment before the speed change is smaller than the second duration of the target subtitle segment, in response to the same speed change value as the target audio segment, the target subtitle segment is double-speed fast playback Or double-speed slow playback, the duration of the target subtitle segment after the speed change is greater than the duration of the target audio clip after the speed change, so that the probability of the target subtitle segment being too long is greatly increased, while the target subtitle segment is too long, it will increase The probability that the target subtitle segment overlaps with other audio source segments. The overlap between the target subtitle segment and other audio source segments will reduce the playback effect and conflict with the user's viewing habits. Therefore, when the first duration is less than the second duration, you can only Perform a variable speed operation on the part of the target subtitle segment with the first duration according to the preset variable speed value, and keep the original playback speed of the part other than the first duration in the target subtitle segment unchanged, thereby reducing the speed between the target subtitle segment and other audio sources. The chance that the fragments will overlap.

For example, referring to FIG. 10, it shows the target audio source segments C and D before shifting, and the target subtitle segments c and d before shifting. For the target audio segment D and the target subtitle segment d, since the target audio segment D before shifting The first duration of the target subtitle segment d is less than the second duration of the target subtitle segment d, then after the target audio segment D is changed according to the speed change value of 2 times, the part of the target subtitle segment d corresponding to the first duration (00:50-01:30 ) can also be followed by a 2-fold speed change, while the part (01:30-01:40) of the target subtitle segment d other than the first duration keeps the original playback speed (1 times) unchanged.

Sub-step 2039: In the case of performing the adjustment operation of dividing the target sound source segment, obtain a plurality of divided sound source sub-segments.

Sub-step 20310, establish a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio sub-segment, and keep the new relative positional relationship unchanged, the target audio sub-segment It is the audio subclip at the head position among all audio subclips.

In the embodiment of the present application, the user can also divide the target audio source segment according to actual needs to obtain multiple audio source sub-segments, and after the segmentation operation, the subtitle head position of the target subtitle segment is the same as the head position after division. The audio source header position of the target audio source sub-segment is bound. Since the target subtitle segment is obtained by speech recognition of the entire target audio source segment, even if the target audio source segment is divided, the integrity of the target audio source segment will not be destroyed. The embodiment of the application can bind the subtitle header position of the target subtitle segment and the audio source header position of the target audio source sub-segment that is at the head position after segmentation, so as to satisfy the requirement that after the segmentation processing operation, the target subtitle segment and the segmented target subtitle segment can be bound. The head binding relationship of the audio clip. For example, referring to FIG. 11 , FIG. 11 shows another interface diagram for binding subtitles and audio sources provided by an embodiment of the present application. After the segmentation operation, the target audio source segment A is divided into audio source sub-segments 1 and audio source sub-segments. 2. In the case of the audio source sub-segment 3, after the segmentation process, the target subtitle segment a is bound to the audio source header position of the audio source sub-segment 1 at the head.

In addition, after the division operation, if the audio source sub-segment 2 and/or the audio sub-segment 3 that are not at the head are deleted, the target subtitle segment a is still the same as the audio source sub-segment 1 at the head. to bind.

In some embodiments, after step 201, the method may further include:

Step 204: In the case that the target subtitle segment is moved beyond the boundary of the main track of the target video, delete the part of the target subtitle segment that exceeds the boundary.

In this step, referring to FIG. 12 , FIG. 12 shows another interface diagram for binding subtitles and audio sources provided by the embodiment of the present application. For the target subtitle segment a, in response to the target subtitle segment a on the subtitle track 20 Perform the overall movement, and move the part of the target subtitle segment a to the boundary beyond the main track 30 (time 00:00), then the part beyond the boundary in the target subtitle segment a can be deleted. In the case of the boundary, the entire target subtitle segment a can be deleted. Through this interactive manner, the embodiments of the present application provide a convenient method for deleting subtitles, which improves user experience.

FIG. 13 is a block diagram of an apparatus for binding subtitles and audio sources provided by an embodiment of the present application. As shown in FIG. 13 , the apparatus includes an identification module 301 , a binding module 302 , and a holding module 303 .

The identification module 301 is configured to determine the target audio clip in the target video, and identify the target subtitle clip from the target audio clip;

The binding module 302 is configured to determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio source segment;

The maintaining module 303 is configured to maintain the relative positional relationship unchanged when the adjustment operation on the target audio source segment and/or the target subtitle segment is performed.

In an implementation manner, the holding module includes:

In an achievable manner, the case where there is no overlap between the adjusted target audio source segment and the target subtitle segment includes: the case where the target audio source segment is deleted, or the subtitle header position and subtitle head position and /or the position of the head of the sound source, or the case of dividing the target sound source segment to obtain multiple sound source sub-segments, and deleting the sound source sub-segment located at the head of the multiple sound source sub-segments.

In one implementation, the apparatus further includes:

In an implementation manner, the target audio source segment is placed on the audio source track for display; the target subtitle segment is placed on the subtitle track for display, and the audio source track, the subtitle track and the main track of the target video are displayed using The same timing; the target audio clip is bound to the main track.

In an implementation manner, the holding module includes:

In an achievable manner, the transmission sub-module includes:

In an implementation manner, the holding module includes:

To sum up, an apparatus for binding subtitles and audio sources provided by an embodiment of the present application includes: an identification module configured to determine a target audio source segment in a target video, and identify the target subtitle segment from the target audio source segment; The determining module is configured to determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio source segment; the maintaining module is configured to perform the target audio source segment and/or the target subtitle segment. In the case of the adjustment operation, keep the relative position relationship unchanged. The present application can bind the relative positional relationship between the head position of the target subtitle segment and the target audio source segment, so that the editing process of the main track and the editing process of the subtitle and audio source are isolated from each other, and the editing operation of the main track is not It will affect the alignment between the subtitles and the audio source, thereby reducing the chance of misalignment between the audio source and subtitles.

FIG. 14 is a block diagram of an electronic device 600 according to an exemplary embodiment. For example, electronic device 600 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, and the like.

14, the electronic device 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, a sound source component 610, an input/output (I/O) interface 612, a sensor component 614 , and the communication component 616 .

The processing component 602 generally controls the overall operation of the electronic device 600, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or some of the steps of the methods described above. Additionally, processing component 602 may include one or more modules that facilitate interaction between processing component 602 and other components. For example, processing component 602 may include a multimedia module to facilitate interaction between multimedia component 608 and processing component 602.

Memory 604 is used to store various types of data to support operation at electronic device 600 . Examples of such data include instructions for any application or method operating on electronic device 600, contact data, phonebook data, messages, pictures, multimedia, and the like. Memory 604 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.

Power supply assembly 606 provides power to various components of electronic device 600 . Power supply components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 600 .

Multimedia component 608 includes a screen that provides an output interface between the electronic device 600 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). In the case where the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the demarcation of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 608 includes a front-facing camera and/or a rear-facing camera. When the electronic device 600 is in an operation mode, such as a shooting mode or a multimedia mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

The audio component 610 is used for outputting and/or inputting audio signals. For example, the sound source assembly 610 includes a microphone (MIC) for receiving external sound source signals when the electronic device 600 is in an operation mode, such as a calling mode, a recording mode, and a voice recognition mode. The received audio source signal may be further stored in memory 604 or transmitted via communication component 616 . In some embodiments, the sound source assembly 610 further includes a speaker for outputting the sound source signal.

The I/O interface 612 provides an interface between the processing component 602 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

Sensor assembly 614 includes one or more sensors for providing status assessment of various aspects of electronic device 600 . For example, the sensor assembly 614 can detect the open/closed state of the electronic device 600, the relative positioning of the components, such as the display and the keypad of the electronic device 600, and the sensor assembly 614 can also detect the electronic device 600 or one of the electronic devices 600. Changes in the positions of components, presence or absence of user contact with the electronic device 600 , orientation or acceleration/deceleration of the electronic device 600 and changes in the temperature of the electronic device 600 . Sensor assembly 614 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communication component 616 is used to facilitate wired or wireless communication between electronic device 600 and other devices. Electronic device 600 may access wireless networks based on communication standards, such as WiFi, carrier networks (eg, 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

In an exemplary embodiment, electronic device 600 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable It is implemented by a programming gate array (FPGA), a controller, a microcontroller, a microprocessor or other electronic components, and is used to implement the method for binding a subtitle and an audio source provided by the embodiment of the present application.

In an exemplary embodiment, there is also provided a non-transitory computer storage medium including instructions, such as a memory 604 including instructions, executable by the processor 620 of the electronic device 600 to perform the method described above. For example, the non-transitory storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

FIG. 15 is a block diagram of an electronic device 700 according to an exemplary embodiment. For example, the electronic device 700 may be provided as a server. 15, electronic device 700 includes processing component 722, which further includes one or more processors, and a memory resource, represented by memory 732, for storing instructions executable by processing component 722, such as applications. An application program stored in memory 732 may include one or more modules, each corresponding to a set of instructions. In addition, the processing component 722 is configured to execute an instruction to execute a method for binding subtitles and audio sources provided by the embodiments of the present application.

The electronic device 700 may also include a power supply assembly 726 configured to perform power management of the electronic device 700, a wired or wireless network interface 750 configured to connect the electronic device 700 to a network, and an input output (I/O) interface 758 . Electronic device 700 may operate based on an operating system stored in memory 732, such as Windows Server™, Mac OS X™, UniX™, LinuX™, FreeBSD™ or the like.

Embodiments of the present application further provide a computer program product, including a computer program, which implements the method for binding subtitles and audio sources when the computer program is executed by a processor.

All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the protection scope required by the present disclosure.

Other embodiments of the present application will readily occur to those skilled in the art upon consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the present application that follow the general principles of the present application and include common knowledge or conventional techniques in the art not disclosed by this disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the application being indicated by the following claims.

It is to be understood that the present application is not limited to the precise structures described above and illustrated in the accompanying drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

A method for binding subtitles and audio sources, comprising:

Determine the target audio clip in the target video, and identify the target subtitle clip by the target audio clip;

Determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment;

In the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship is kept unchanged.
The method according to claim 1, wherein, in the case of performing an adjustment operation on the target audio source segment and/or the target subtitle segment, maintaining the relative positional relationship unchanged, comprising:

In the case that there is no overlap between the adjusted target audio source segment and the target subtitle segment, trigger the subtitle processing operation;

The target subtitle segment is bound to the main track of the target video in response to a subtitle processing operation for retained subtitles.
The method according to claim 2, wherein the case where there is no overlap between the adjusted target audio source segment and the target subtitle segment includes: performing deletion of the target audio source segment, or performing changing the subtitle header part position and/or the head position of the sound source, or the target sound source segment is divided to obtain multiple sound source sub-segments, and the sound source sub-segment located at the head of the multiple sound source sub-segments is deleted.
The method of claim 2, wherein the method further comprises:

In the case that there is an overlap between the adjusted target audio source segment and the target subtitle segment, detecting the adjusted new subtitle header position and audio source header position;

The relative positional relationship is updated according to the new subtitle header position and the audio source header position, and the updated relative positional relationship is kept unchanged.
The method of claim 1, wherein the method further comprises:

When the target subtitle segment is moved beyond the boundary of the main track of the target video, the portion of the target subtitle segment that exceeds the boundary is deleted.
The method according to claim 1, wherein the target audio source segment is placed on an audio source track for display; the target subtitle segment is placed on a subtitle track for display, and the audio source track, the subtitle track and the target video are displayed. The main track adopts the same timing; the target sound source segment is bound to the main track.
The method according to claim 1, wherein, in the case of performing an adjustment operation on the target audio source segment and/or the target subtitle segment, maintaining the relative positional relationship unchanged, comprising:

In the case of performing an adjustment operation for changing the subtitle header position and/or the audio source header position, the relative positional relationship is updated according to the changed subtitle header position and audio source header position, and the The updated relative position relationship remains unchanged.
The method according to claim 1, wherein, in the case of performing an adjustment operation on the target sound source segment, maintaining the relative positional relationship unchanged, comprising:

In the case of performing the adjustment operation of moving the position of the target audio source segment as a whole, the position of the target subtitle segment is moved as a whole following the target audio source segment, and the relative positional relationship is kept unchanged.
The method according to claim 1, wherein, in the case of performing an adjustment operation on the target sound source segment, maintaining the relative positional relationship unchanged, comprising:

In the case of performing the adjustment operation of replacing the target audio source segment, determine a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the replaced target audio source segment, and keep the The new relative position relationship remains unchanged.
The method according to claim 1, wherein, in the case of performing an adjustment operation on the target sound source segment, maintaining the relative positional relationship unchanged, comprising:

In the case of performing a speed change operation on the target audio source segment according to a preset speed change value, the target subtitle segment is subjected to a speed change operation according to the speed change value, and the relative positional relationship is kept unchanged.
The method according to claim 10, wherein the performing a speed change operation on the target subtitle segment according to the speed change value comprises:

In the case that the first duration of the target audio source segment before the speed change is greater than the second duration of the target subtitle segment, perform a speed change operation on the target subtitle segment according to the preset speed change value;

When the first duration of the target audio source segment before shifting is smaller than the second duration of the target subtitle segment, a shifting operation is performed on the part of the target subtitle segment with the first duration according to the preset shifting value.
The method according to claim 1, wherein, in the case of performing an adjustment operation on the target sound source segment, maintaining the relative positional relationship unchanged, comprising:

In the case of performing the adjustment operation of dividing the target sound source segment, obtain a plurality of divided sound source sub-segments;

Establish a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio sub-segment, and keep the new relative positional relationship unchanged, and the target audio sub-segment is all audio sub-segments. The sound source subclip in the clip at the head position.
A device for binding subtitles and audio sources, comprising:

an identification module, configured to determine a target sound source segment in the target video, and identify a target subtitle segment by the target sound source segment;

a binding module, configured to determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment;

The maintaining module is configured to maintain the relative positional relationship unchanged when the adjustment operation on the target audio source segment and/or the target subtitle segment is performed.
The apparatus of claim 13, wherein the retention module comprises:

a triggering submodule, configured to trigger a subtitle processing operation when there is no overlap between the adjusted target audio source segment and the target subtitle segment;

The binding submodule is configured to bind the target subtitle segment with the main track of the target video in response to the subtitle processing operation of the retained subtitle.
The apparatus according to claim 14, wherein the case where there is no overlap between the adjusted target audio source segment and the target subtitle segment includes: a case where the target audio source segment is deleted, or the subtitle header is changed part position and/or the head position of the sound source, or the target sound source segment is divided to obtain multiple sound source sub-segments, and the sound source sub-segment located at the head of the multiple sound source sub-segments is deleted.
The apparatus of claim 14, wherein the apparatus further comprises:

The detection module is configured to detect the adjusted new subtitle head position and the audio source head position when there is an overlap between the adjusted target audio source segment and the target subtitle segment;

The updating module is configured to update the relative positional relationship according to the new subtitle header position and the audio source header position, and keep the updated relative positional relationship unchanged.
The apparatus of claim 13, wherein the apparatus further comprises:

The deletion module is configured to delete the part of the target subtitle segment beyond the boundary in the case of moving the target subtitle segment beyond the boundary of the main track of the target video.
The device according to claim 13, wherein the target audio source segment is placed on an audio source track for presentation; the target subtitle segment is placed on a subtitle track for presentation, and the audio source track, the subtitle track and the target video are The main track adopts the same timing; the target sound source segment is bound to the main track.
The apparatus of claim 13, wherein the retention module comprises:

The update sub-module is configured to update the relative subtitle head position and the audio source head position according to the changed subtitle head position and the audio source head position when the adjustment operation of changing the subtitle head position and/or the audio source head position is performed. position relationship, and keep the updated relative position relationship unchanged.
The apparatus of claim 13, wherein the retention module comprises:

The moving sub-module is configured to move the position of the target subtitle segment along with the target sound source segment as a whole under the condition of performing an adjustment operation for moving the position of the target audio source segment as a whole, and keep the relative position relationship unchanged. Change.
The apparatus of claim 13, wherein the retention module comprises:

The replacement submodule is configured to determine a new subtitle position between the subtitle header position of the target subtitle fragment and the audio source header position of the replaced target audio source fragment when performing the adjustment operation of replacing the target audio source fragment. relative positional relationship, and keep the new relative positional relationship unchanged.
The apparatus of claim 13, wherein the retention module comprises:

The variable speed sub-module is configured to perform a variable speed operation on the target subtitle segment according to the variable speed value and maintain the relative position in the case of performing a variable speed operation on the target audio source segment according to a preset variable speed value The relationship remains unchanged.
The apparatus of claim 22, wherein the transmission sub-module comprises:

The first shifting unit is configured to perform shifting of the target subtitle segment according to the preset shifting value when the first duration of the target audio segment before shifting is greater than the second duration of the target subtitle segment operate;

The second speed changing unit is configured to, when the first duration of the target audio segment before shifting is smaller than the second duration of the target subtitle segment, change the part of the first duration in the target subtitle segment according to the preset The shifting value of , performs shifting operation.
The apparatus of claim 13, wherein the retention module comprises:

A segmentation sub-module, configured to obtain a plurality of divided audio sub-segments when performing an adjustment operation of segmenting the target audio segment;

The cropping submodule is configured to establish a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio source subsegment, and keep the new relative positional relationship unchanged. The sound source sub-segment is the sound source sub-segment at the head position among all the sound source sub-segments.
An electronic device comprising:

processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the following steps:

Determine the target audio clip in the target video, and identify the target subtitle clip by the target audio clip;

Determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment;

In the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship is kept unchanged.
26. The method of claim 25, wherein the processor is further configured to execute the instructions to:

In the case that there is no overlap between the adjusted target audio source segment and the target subtitle segment, trigger the subtitle processing operation;

The target subtitle segment is bound to the main track of the target video in response to a subtitle processing operation for retained subtitles.
The electronic device according to claim 26, wherein the case where there is no overlapping part between the adjusted target audio source segment and the target subtitle segment includes a case where the target audio source segment is deleted, or the subtitle is changed The head position and/or the head position of the sound source, or the case of dividing the target sound source segment to obtain multiple sound source sub-segments, and deleting the sound source sub-segment located at the head among the multiple sound source sub-segments.
The electronic device of claim 26, wherein the processor is further configured to execute the instructions to implement the steps of:

In the case that there is an overlap between the adjusted target audio source segment and the target subtitle segment, detecting the adjusted new subtitle header position and audio source header position;

The relative positional relationship is updated according to the new subtitle header position and the audio source header position, and the updated relative positional relationship is kept unchanged.
26. The electronic device of claim 25, wherein the processor is further configured to execute the instructions to:

When the target subtitle segment is moved beyond the boundary of the main track of the target video, the portion of the target subtitle segment that exceeds the boundary is deleted.
The electronic device according to claim 25, wherein the target audio source segment is placed on an audio source track for presentation; the target subtitle segment is placed on a subtitle track for presentation, the audio source track, the subtitle track and the target video The main track of the target audio source segment adopts the same timing; the target audio source segment is bound to the main track.
26. The electronic device of claim 25, wherein the processor is further configured to execute the instructions to:

In the case of performing an adjustment operation for changing the subtitle header position and/or the audio source header position, the relative positional relationship is updated according to the changed subtitle header position and audio source header position, and the The updated relative position relationship remains unchanged.
26. The electronic device of claim 25, wherein the processor is further configured to execute the instructions to:

In the case of performing the adjustment operation of moving the position of the target audio source segment as a whole, the position of the target subtitle segment is moved as a whole following the target audio source segment, and the relative positional relationship is kept unchanged.
26. The electronic device of claim 25, wherein the processor is further configured to execute the instructions to:

In the case of performing the adjustment operation of replacing the target audio source segment, determine a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the replaced target audio source segment, and keep the The new relative position relationship remains unchanged.
258. The electronic device of claim 258, wherein the processor is further configured to execute the instructions to:

In the case of performing a speed change operation on the target audio source segment according to a preset speed change value, the target subtitle segment is subjected to a speed change operation according to the speed change value, and the relative positional relationship is kept unchanged.
The electronic device of claim 34, wherein the processor is further configured to execute the instructions to:

In the case that the first duration of the target audio source segment before the speed change is greater than the second duration of the target subtitle segment, the speed change operation is performed on the target subtitle segment according to the preset speed change value;

In the case that the first duration of the target audio source segment before the speed change is smaller than the second duration of the target subtitle segment, the speed change operation is performed on the part of the target subtitle segment with the first duration according to the preset speed change value.
26. The electronic device of claim 25, wherein the processor is further configured to execute the instructions to:

In the case of performing the adjustment operation of dividing the target sound source segment, obtain a plurality of divided sound source sub-segments;

Establish a new relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio sub-segment, and keep the new relative positional relationship unchanged, and the target audio sub-segment is all audio sub-segments. The sound source subclip in the clip at the head position.
A computer storage medium, wherein the instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the following steps:

Determine the target audio clip in the target video, and identify the target subtitle clip by the target audio clip;

Determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment;

In the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship is kept unchanged.
A computer program product comprising a computer program, wherein the computer program implements the following steps when executed by a processor:

Determine the target audio clip in the target video, and identify the target subtitle clip by the target audio clip;

Determine the relative positional relationship between the subtitle header position of the target subtitle segment and the audio source header position of the target audio segment;

In the case of performing the adjustment operation on the target audio source segment and/or the target subtitle segment, the relative positional relationship is kept unchanged.