CN107517406B

CN107517406B - Video editing and translating method

Info

Publication number: CN107517406B
Application number: CN201710781783.5A
Authority: CN
Inventors: 郑丽华
Original assignee: Language Networking (wuhan) Information Technology Co Ltd
Current assignee: Language Networking (wuhan) Information Technology Co Ltd
Priority date: 2017-09-05
Filing date: 2017-09-05
Publication date: 2020-02-14
Anticipated expiration: 2037-09-05
Also published as: CN107517406A

Abstract

The invention firstly provides a video clipping and translation method. By utilizing the video clipping method provided by the invention, the storyline segment and the non-storyline segment in the video can be effectively divided, so that the video is clipped and combined into a movie work with more compact plot, and the movie work is more suitable for overseas audiences to watch; by the video translation method provided by the invention, the process of converting the sound file into the text file is not required to be completed during translation, and meanwhile, the video part which does not need to be translated can be skipped, so that the video translation workload is greatly reduced, and the translation quality is ensured.

Description

Video editing and translating method

Technical Field

The invention belongs to the field of video processing, and particularly relates to a video editing and translating method.

Background

With the improvement of the production quality of video works such as inland TV plays/movie and TV plays, the video works not only are broadcasted inland hot, but also gradually draw the attention of Hongkong and Macau and overseas markets; related film and television works are introduced into overseas related markets.

However, according to survey, many of the film and television works whose audience ratings are frequently innovative in the domestic world are declining all the way after entering the overseas market. The reason for this is mainly two points: firstly, the translation quality of the film and television works is poor, and the connotation of the works cannot be expressed; secondly, most of the television plays broadcasted inland hot are accompanied by too many non-story scenes, so that the episodes are long, and the interest of pursuing the plays is weakened for the audience in the harbor and Australian platforms and overseas markets only paying attention to the stories.

Therefore, how to improve the translation level of the film and television works, and how to edit the film and television works to highlight the story nature and accord with the taste of overseas audiences is a problem to be solved urgently.

Disclosure of Invention

Highlighting the storyliness of movie works to meet the tastes of overseas viewers first needs to be clear what factors overseas viewers are attracted to when watching a tv show. The inventors paid attention to the following facts through investigations: for domestic audiences, the excellence of film and television works is reflected not only in the stories described by the film and television works, but also in the historical background, artistic atmosphere and human atmosphere contained in the film works.

However, for overseas audiences, most of the historical backgrounds, artistic atmospheres and human temperaments contained in the film works cannot be perceived due to cultural differences, so that the promotion of the story property of the works is hindered, and the works are long. Therefore, the introduction of overseas video works requires a certain degree of editing, emphasizing their storyline, keeping them compact in the plot, and thus maintaining the appeal to the audience.

Based on the above facts, the present invention provides a video clipping method. By adopting the method, the automatic segmentation of the video can be realized, and a plurality of video sub-segments are formed; then selecting video sub-segments which can highlight the story and keep the story line consistent from the video sub-segments for combination, thereby obtaining the film and television works which can meet the taste of overseas audiences;

meanwhile, the invention also provides a video translation method by combining the video editing process, so that the translation quality is ensured.

Specifically, the video clipping method provided by the invention comprises the following steps:

first, a video that needs to be clipped is imported. The video may be for a single movie work or for a television series. Here, taking a television series as an example, a set of drama works is imported at a time.

Then, the imported video is segmented to obtain a plurality of video sub-segments.

Specifically, a video segmentation algorithm is adopted to identify a slice head part and a slice tail part and segment the slice head part and the slice tail part, so that the video is at least divided into three parts: a leader part, a trailer part and a video text content part except the leader and the trailer;

this step can be implemented by most of the existing video segmentation algorithms, and the present invention is not limited to this.

Next, the essential part of the technical solution of the present invention relates to the segmentation, clipping and combination of the content of the text video part.

The video text content is segmented, and various segmentation algorithms also exist in the prior art. However, most of these segmentation methods depend on the attributes of the video itself, such as picture recognition, scene recognition, character recognition, etc., and the segmentation result mostly segments a continuous picture of a certain scene without considering the fact that the segmented scene represents the accident or the artistic quality, and to a great extent, a certain segmented scene segment may contain both a narrative scenario and an artistic scenario, because a character story scene is usually accompanied by a corresponding artistic scene, and in the existing segmentation algorithms, they usually form a scene picture that can be segmented together.

Therefore, the conventional segmentation method cannot distinguish the storyline segment from the artistic-plot segment, and cannot achieve the effect of video clipping. The video segmentation and editing method provided by the invention can solve the problems.

The video clipping method provided by the invention can be used for respectively segmenting the story segment and the non-story segment in the text video. The video segmentation method is different from various existing video segmentation algorithms, a plurality of time nodes are obtained by detecting sound streams in videos, and the video files are automatically segmented through the time nodes, so that a plurality of video sub-segments are obtained; then, the story segments and the non-story segments are identified, and the story segments are combined and edited to obtain an edited video work.

Specifically, for the text video part, identifying a sound stream in the text video part, and starting to detect a first specific time node, a second specific time node, a third specific time node and a fourth specific time node of the sound stream;

the first specific time node is a time node for detecting the sound stream for the first time in the text video;

the second specific time node is a time node in the text video file: playing pictures in a first preset time period after the time node, but detecting no sound stream;

the third specific time node is a time node which detects the sound stream file again after the second specific time node;

the fourth specific time node is a time node at which the sound stream is detected for the last time by the video file.

After all the first specific time node, the second specific time node, the third specific time node and the fourth specific time node are detected, the video file is cut into a plurality of video sub-segments according to the first specific time node, the second specific time node, the third specific time node and the fourth specific time node.

The inventor has noted that although various video segmentation algorithms exist in the prior art, the segmentation of the video is mostly based on the attributes of the video itself, such as picture recognition, scene recognition, character recognition, etc., and the segmented video usually has an incomplete phenomenon in the sound stream. However, the story performance of a movie work can be realized by the person to white, therefore, in order to achieve the above-mentioned editing effect, the integrity of the sound stream should be considered first during editing, and therefore, the inventor creatively proposes to use the sound stream file to perform video segmentation;

on the other hand, in a video file, there are a large number of scenes without dialogue. For these non-dialogue scenes, there is little impact on the story quality of the movie. In order to keep the compact sense of the promotion of the story of the film and television works, partial deletion parts of the film and television works can be reserved or even completely deleted when the film and television works are edited.

Therefore, it should be separated out separately for this part of the scene. However, if the conventional video segmentation algorithm is adopted, the non-dialogue scene and the dialogue scene cannot be completely separated, because the continuity of the scene picture is more considered by the conventional segmentation algorithm.

The segmentation algorithm provided by the invention can solve the problem.

For example, according to the foregoing process of obtaining a first specific time node, a second specific time node, a third specific time node and a fourth specific time node, it can be known that a dialogue scene with sound, that is, a story segment, is obtained from the first specific time node to the next second specific time node; and in the period from a certain second specific time node to a next third specific time node, no sound stream is detected, although a playing picture still exists, the part of the video usually belongs to a non-story segment.

To this end, the text video file has been split into a plurality of video sub-segments. As previously mentioned, these video sub-segments can be divided into a story segment and a non-story segment, depending on their temporal node composition.

And then, selecting the story segments, and editing and combining the story segments to form a edited video work. The video works have stronger storyline and more compact plot, and are more suitable for overseas audiences.

It is understood that the sound stream of the present invention refers to the human-to-white sound appearing in the video. In general, there may be multiple sounds present in a video, such as a dialogue as a character, background music rendered as an environmental background, and various environmental sound manifestations such as bird calls, wind sounds, water sounds, and so on. However, for the storyline, only the character dialog is sufficient, and other types of sounds have little effect on the progress of the storyline. Therefore, when editing, the editing personnel focus on the sound segments related to the human face.

Therefore, the recognition of the sound stream in the invention refers to recognition of the human dialogue sound in the video.

Meanwhile, based on the video segmentation algorithm, the invention also provides a video translation method, which comprises a video segmentation step, wherein the video segmentation step adopts the video segmentation method to segment the video to be translated to obtain a plurality of video sub-segments; and selecting the video sub-segment needing to be translated from the plurality of video sub-segments for translation.

The video sub-segment to be translated means that the video sub-segment contains sound to be translated.

Therefore, the video translation is carried out by adopting the steps, the work of converting the video sound file into the text file is avoided, and meanwhile, only part of video segments need to be translated, so that the work load of video translation is reduced; in addition, the translation process is different from a traditional method of converting video and sound into a pure text and then translating the pure text, and in the translation process, a translator combines a related scene to translate, so that the translation quality is higher.

The invention has the advantages of

By adopting the video clipping method provided by the invention, the storyline segment and the non-storyline segment in the video can be effectively divided, so that the video is clipped and combined into a movie work with more compact plot, and the movie work is more suitable for overseas audiences to watch; by the video translation method provided by the invention, the process of converting the sound file into the text file is not required to be completed during translation, and meanwhile, the video part which does not need to be translated can be skipped, so that the video translation workload is greatly reduced, and the translation quality is ensured.

Drawings

FIG. 1 is a schematic diagram showing the result of the video segmentation method of the present invention

FIG. 2 is a schematic diagram of the selection of the second and third specific time nodes of the present invention

FIG. 3 is a schematic diagram of another alternative of the second and third specific time nodes of the present invention

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

In fig. 1, for a text video (1) portion, a sound stream (2) therein is identified, and a first specific time node (20), a second specific time node (21), a third specific time node (22) and a fourth specific time node (23) of the sound stream are started to be detected;

the first specific time node (20) refers to a time point when the sound stream is detected for the first time by the video file; usually, this point is detected after the text video (1) starts playing;

it is understood that there is only one of said first specific time nodes (20) for a single video file;

the second specific time node (21) means that a playing picture exists in the video file within a first preset time period after the time node, but no sound stream is detected;

usually, there are several dialog scenes in the text video, and there are long picture transitions or other silent scenes between different dialog scenes. During the period after the end of the previous dialog and before the start of the next dialog, no sound stream is detected.

Thus, a time node is defined as the second specific time node (21) of the invention if a sound stream can be detected before the time node, and a sound stream is not detected within a certain period of time after the time node.

The certain time period refers to a preset time period. The size of the preset time period can be determined according to the character conversation rhythm, shooting style and other video attributes of the text video.

For example, if the video is observed to have a faster conversation rhythm, a faster speech rate when the characters in the drama talk, a fast connection between different character conversations, a smaller pause in character conversation, etc., the preset time period may be set to a shorter value, for example, 1 to 8 seconds; on the contrary, if the photographing style of the video file itself determines that the character's dialogue rhythm is slow, the character's dialogue speed is slow, the pause is large, etc., the preset time period may be set to a slightly longer value, for example, 5 to 10 seconds.

Therefore, the second specific time node (21) defined in the present invention can also be understood as a time point when a certain scene session ends.

The third specific time node (22) is a point at which the audio stream file is detected again since the second specific time node (21).

As described above, after the previous dialog is ended, no sound stream is detected for a certain period of time. After this time, the next session is continued. The start of the next session is the third specific time node (22) defined by the invention.

It is understood that there may be more than one of the second specific time node (21) and the third specific time node (22) for a single video file. In fig. 1, the same reference numerals denote the same features, and therefore, as can be seen from fig. 1, the video file can detect a plurality of second specific time nodes (21), third specific time nodes (22), although not labeled one by one in the figure.

The fourth specific time node (23) refers to a time point when the sound stream is detected for the last time by the video file. It will be appreciated that there is only one of the fourth specific time nodes (23) for a single video file.

After detecting all of the first specific time node (20), the second specific time node (21), the third specific time node (22) and the fourth specific time node (23), the video file is cut into a plurality of video sub-segments.

With reference to fig. 1, the video can be divided into the following segments by using the segmentation method of the present invention:

fragment 1: first specific time node (20) -second specific time node (21);

fragment 2: second specific time node (21) -third specific time node (22);

……

the other segments are not shown one by one.

Referring to fig. 2, a schematic diagram of the detection of the plurality of second specific time nodes and the plurality of third specific time nodes according to the present invention is shown.

As mentioned above, for a complete text video content, only one first specific time node according to the present invention and one fourth specific time node according to the present invention can be detected, which are usually located at the beginning and the end of the video; however, there are usually a plurality of the second specific time node and the third specific time node.

Figure 2 shows a schematic diagram of a second specific time node and a third specific time node. In fig. 2, a plurality of quadrangle stars represents a plurality of second specific time nodes; the plurality of pentagons represents a plurality of third specific time nodes. In fig. 2, at a second specific time node (211), it means that a dialog scene ends; and at an immediately third specific time node (311) the next dialog scene starts. Between the second specific time node (211) and the third specific time node (311), the video is still playing, but there are no dialogue scenes.

At this time, the video may be sliced into a plurality of sub-segments according to the above-described plurality of specific times (211,311,212,312,213, … …):

fragment 1: from node 211 to node 311;

fragment 2: from node 311 to node 212;

fragment 3: from node 212 to node 312;

fragment 4: from node 312 to node 213;

……

and so on.

After the segments are obtained, the videos are edited and combined. In the embodiment of fig. 2, the segments that need to be combined last are segment 2 and segment 4, while segment 1 and segment 3 may be discarded.

Fig. 2 is a relatively compact video clipping scheme because the second specific time node (211,212,213) is selected to be exactly the point in time when a certain scene dialog ends, and all segments of the silent stream are deleted.

However, in the actual clipping and combining process, the transitional effect between the individual combined segments also needs to be considered. Therefore, after a session ends, there should be some transition before the next session begins.

To this end, fig. 3 schematically shows another clipping scheme that preserves transitional effects.

In fig. 3, the body content exists in scenes (11), (12) containing two dialogs, where scene (11) starts at time node (301) and ends at time node (201); the scene (12) begins at time node (302) and ends at time node (202).

If the method of fig. 2 is followed, the time node (201) is now defined as the second specific time node.

However, in order to guarantee the transition effect between the scene (11) and the scene (12), in fig. 3, the time node (201) is not defined as the second specific time node, but a certain intermediate time point after the time node (201) and before the time node (302) is defined as the second specific time node, such as the node (200) of fig. 3.

By adopting the technical scheme of fig. 3, a certain transitional effect is reserved between the clipped video segments, so that the audience does not feel as sharp when watching the clipped video.

It is to be understood that the previous definition of the second specific time point (the second specific time node refers to a time node in the text video file, after which the playing picture exists within a first preset time period but no sound stream is detected) of the present invention already summarizes both the situations of fig. 2 and fig. 3.

Claims

1. A sound stream-based video clipping method for clipping a video containing a sound stream, the method comprising the steps of:

(1) importing a video needing to be edited;

(2) segmenting the imported video to obtain a plurality of video sub-segments, specifically comprising:

for the imported video, a slice head part and a slice tail part are identified and cut out, so that the video is divided into at least three parts: a leader part, a trailer part and a video text content part except the leader and the trailer;

the method is characterized in that:

aiming at the video text content part, identifying a sound stream in the video text content part, and starting to detect a first specific time node, a second specific time node, a third specific time node and a fourth specific time node of the sound stream;

the first specific time node is a time node for detecting a sound stream for the first time in the video text content;

the second specific time node is a time node in the video text content: playing pictures in a first preset time period after the time node, but detecting no sound stream;

the fourth specific time node is a time node when the video file detects a sound stream for the last time;

and after detecting all the first specific time node, the second specific time node, the third specific time node and the fourth specific time node, cutting the video text content part into a plurality of video sub-segments according to the first specific time node, the second specific time node, the third specific time node and the fourth specific time node.

2. The method of claim 1, wherein slicing the imported video into a plurality of video sub-segments further comprises: dividing the plurality of video sub-segments into a story segment and a non-story segment according to different time node compositions; the story segment is a video sub-segment containing a character dialogue scene; the non-story segment is a video sub-segment that does not contain character dialog scenes.

3. The method of claim 2, further comprising: the story segments are selected, clipped and combined.

4. A video translation method is used for translating a video file containing sound stream; the method comprises a video clipping step, wherein the video clipping step adopts the video clipping method of any one of claims 1 to 3 to divide the video to be translated to obtain a plurality of video sub-segments; and selecting the video sub-segment needing to be translated from the plurality of video sub-segments for translation.

5. The translation method according to claim 4, wherein the video sub-segment to be translated includes a sound to be translated.