CN118366071A

CN118366071A - Film segmentation method, device, computer equipment and computer readable storage medium

Info

Publication number: CN118366071A
Application number: CN202410274433.XA
Authority: CN
Inventors: 柯凡晖; 李圳; 郭尚锋
Original assignee: Shenzhen Coocaa Network Technology Co Ltd
Current assignee: Shenzhen Coocaa Network Technology Co Ltd
Priority date: 2024-03-11
Filing date: 2024-03-11
Publication date: 2024-07-19

Abstract

The present invention relates to video data processing, and more particularly, to a method, apparatus, computer device, and computer readable storage medium for dividing video. According to the film segmentation method, each frame of image in continuous target video, corresponding time code information and shot boundary categories are obtained, shot segmentation is conducted on the target video according to the time code information and the shot boundary categories, segmented video under each shot is obtained, feature extraction is conducted on each segmented video respectively, segmented features are obtained, all segmented features are arranged according to the time sequence of each segmented video, fusion sequence features are obtained, the trained classifier is used for classifying the plot features of the fusion sequence features, segmentation time corresponding to each plot is obtained, the target video is segmented according to the segmentation time, segmented video is obtained, films are segmented according to the plot automatically, and film segmentation efficiency is improved.

Description

Film segmentation method, device, computer equipment and computer readable storage medium

Technical Field

The present invention relates to the field of video data processing, and in particular, to a film segmentation method, apparatus, computer device, and computer readable storage medium.

Background

With the rapid development of multimedia technology, long videos (such as movies, tv episodes, and shows) have become an integral part of people's daily life, and these long videos are typically composed of a series of episodes, each of which represents a particular stage of a story or an independent event, for example, an urban-type episode may include episodes such as a meeting of friends after a business, a business trip, and a rest at home. The method has the advantages that one long video is divided into a plurality of short videos according to story lines, the method has important significance in the fields of short video analysis, streaming pushing, video automatic editing and the like, at present, the long videos are divided according to the plot, each plot in the long video is distinguished by manpower, the long video is divided by adopting a relevant editing tool, the operation process of the method is relatively complicated, and a great deal of manpower is consumed to distinguish and divide each plot.

Therefore, how to automatically divide the film according to the story line and improve the film dividing efficiency is a problem to be solved.

Disclosure of Invention

Based on the above, a film segmentation method, a device, a computer device and a computer readable storage medium are provided to solve the problem of how to automatically segment films according to a story line and improve the film segmentation efficiency.

In a first aspect, an embodiment of the present invention provides a film segmentation method, including the steps of:

acquiring each frame of image in continuous target video and corresponding time code information and shot boundary category;

Performing shot slicing on the target video according to the time code information and the shot boundary category to obtain sliced video under each shot;

extracting the characteristics of each segmented video to obtain segmented characteristics, and arranging all segmented characteristics according to the time sequence of each segmented video to obtain fusion sequence characteristics;

and classifying the scenario features of the fusion sequence features by using a trained classifier to obtain the segmentation time corresponding to each scenario, and segmenting the target video according to the segmentation time to obtain a segmented video.

In a second aspect, an embodiment of the present invention provides a film dividing apparatus, including:

the acquisition module is used for acquiring each frame of image in the continuous target video, the corresponding time code information and the shot boundary category;

The shot segmentation module is used for performing shot segmentation on the target video according to the time code information and the shot boundary category to obtain segmented video under each shot;

The feature extraction module is used for extracting features of each segmented video to obtain segmented features, and arranging all segmented features according to the time sequence of each segmented video to obtain fusion sequence features;

and the plot segmentation module is used for classifying plot characteristics of the fusion sequence characteristics by using a trained classifier to obtain segmentation time corresponding to each plot, and segmenting the target video according to the segmentation time to obtain segmented video.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the film segmentation method of the first aspect described above when the processor executes the computer program.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the film segmentation method of the first aspect described above.

The invention is different from the technical effects obtained by the scheme in the prior art: according to the film segmentation method, each frame of image in the target video, the corresponding time code information and the corresponding shot boundary category are obtained, shot segmentation is carried out on the target video according to the time code information and the shot boundary category, segmented video under each shot is obtained, feature extraction is carried out on each segmented video respectively, segmented features are obtained, all segmented features are arranged according to the time sequence of each segmented video, fusion sequence features are obtained, a classifier is used for classifying the scenario features of the fusion sequence features, segmentation time corresponding to each scenario is obtained, and the target video is segmented according to the segmentation time, so that segmented video is obtained. The target video can be automatically segmented according to the story line and the processing flow by acquiring the target video, a large amount of human resources are not required to be consumed, the segmentation efficiency is improved, and the accuracy of segmentation is also improved by segmenting the target video based on time code information, lens boundary types, segmentation features and classifiers.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an application environment of a film segmentation method according to a first embodiment of the present invention;

fig. 2 is a flowchart of a film segmentation method according to a second embodiment of the present invention;

Fig. 3 is an overall relation diagram of a film segmentation method according to a second embodiment of the present invention;

fig. 4 is a flowchart of a film segmentation method according to a third embodiment of the present invention;

fig. 5 is a flowchart of a film segmentation method according to a fourth embodiment of the present invention;

fig. 6 is a flowchart of a film segmentation method according to a fifth embodiment of the present invention;

fig. 7 is a flowchart of a film segmentation method according to a sixth embodiment of the present invention;

Fig. 8 is a flowchart of a film segmentation method according to a seventh embodiment of the present invention;

Fig. 9 is a flowchart of a film segmentation method according to an eighth embodiment of the present invention;

Fig. 10 is a schematic structural view of a film dividing apparatus according to a ninth embodiment of the present invention;

fig. 11 is a schematic structural diagram of a computer device according to a tenth embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the sequence numbers of the steps in the following embodiments do not mean the order of execution, and the execution order of the processes should be determined by the functions and the internal logic, and should not be construed as limiting the implementation process of the embodiments of the present invention.

The film segmentation method provided by the embodiment of the application can be applied to an application environment as shown in fig. 1, wherein a server communicates with a client, the server provides film segmentation service, and the client triggers a film segmentation task to the server. The client includes, but is not limited to, a palm computer, a desktop computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cloud computer device, a Personal Digital Assistant (PDA), and the like. The computer device corresponding to the server may be implemented by an independent server or a server cluster formed by a plurality of servers.

As shown in fig. 2, a flow chart of a film segmentation method according to a second embodiment of the present invention may include the following steps:

step S201: and acquiring each frame of image in the continuous target video and corresponding time code information and shot boundary categories.

In this embodiment, the target video may refer to a video that is segmented, the image may refer to an image frame in the target video, the time code information may refer to playing time information of each frame image, and the shot boundary category may refer to a shot boundary category of each frame image, where the shot boundary category may include two categories of shot boundary points and non-shot boundary points.

For example, each frame of image in the target video can be read frame by using a video processing tool, a data structure corresponding to each frame of image is obtained, corresponding time code information is extracted from the data structure corresponding to each frame of image by analyzing the data structure corresponding to each frame of image, and the extracted each frame of image is subjected to shot boundary classification by a manual frame by frame labeling mode or a corresponding shot boundary classification model to obtain the shot boundary class of each frame of image.

Step S202: and performing shot slicing on the target video according to the time code information and the shot boundary category to obtain sliced video under each shot.

In this embodiment, the slicing video may refer to a video obtained by slicing a target video according to a lens.

And in the process of obtaining the segmented video under each shot, acquiring all images with shot boundary categories as shot boundary points and time code information corresponding to the images with shot boundary points, and taking the time code information of the images with shot boundary points as the segmentation time, and performing shot segmentation on the target video to obtain the segmented video under each shot.

Step S203: and respectively extracting the characteristics of each segmented video to obtain segmented characteristics, and arranging all segmented characteristics according to the time sequence of each segmented video to obtain fusion sequence characteristics.

In this embodiment, the slicing feature may refer to a multi-mode fusion feature of the slicing video, and the fusion sequence feature may refer to a feature sequence obtained by arranging the slicing features.

In the process of obtaining the fusion sequence characteristics, firstly, preprocessing operations such as cutting, scaling, denoising and the like are carried out on each segmented video, the feature extraction is carried out on the preprocessed segmented video through a machine learning technology, the segmented characteristics corresponding to each segmented video are obtained, then, the time code information corresponding to the segmented video is determined according to the time code information of the first frame image in each segmented video, and finally, all segmented characteristics are ordered according to the time code information corresponding to all segmented videos, so that the fusion sequence characteristics are obtained.

Step S204: and classifying the scenario features of the fusion sequence features by using the trained classifier to obtain the segmentation time corresponding to each scenario, and segmenting the target video according to the segmentation time to obtain the segmented video.

In this embodiment, the classifier may refer to a machine learning model for classifying the episode feature of the segmented feature, the segmentation time may refer to the segmentation time of each episode, and the segmented video may refer to a video obtained by segmenting the target video according to the episode.

In the process of obtaining the segmented video, firstly, inputting each segmented feature in the fused sequence features into a trained classifier, classifying the segmented features through the classifier to obtain the plot boundary category of each segmented feature, obtaining the plot boundary category of each segmented video according to the plot boundary category of each segmented feature, then determining the segmentation time corresponding to each plot according to the time code information of each frame image in the segmented video and the plot boundary category of the segmented video, and finally segmenting the target video according to the segmentation time to obtain the segmented video.

For example, as shown in fig. 3, an overall relationship diagram of a film segmentation method provided in the second embodiment of the present invention, a target video is a video of an urban episode type, and a process of segmenting the target video according to a scenario may be: 1) Acquiring all image frames and corresponding time code information in the target video; arranging all the acquired image frames into an image frame sequence according to a time sequence; dividing the image frames in the image frame sequence into image frame groups through a preset first sliding window; performing shot boundary classification on each image frame in the image frame group through a trained shot boundary classifier to obtain a shot boundary class of each image frame; according to the shot boundary category and the time code information of the image frame, shot slicing is carried out on the target video to respectively obtain sliced video under shot A, shot B, shot C, shot D, shot E and shot F; 2) Extracting image features of each segmented video through a trained image feature extractor to obtain segmented image features; extracting the audio characteristics of each segmented video through a trained audio characteristic extractor to obtain segmented audio characteristics; carrying out feature fusion on the image features and the audio features of each segmented video through a trained feature fusion device to obtain fused segmented features; arranging all the fragment features according to a time sequence to obtain fusion sequence features; dividing the slicing features in the fusion sequence features into various slicing feature groups through a preset second sliding window; carrying out plot boundary classification on each slicing feature in the slicing feature set through a trained classifier to obtain plot boundary categories of each slicing video; according to the time code information and the plot boundary category of the segmented video, plot segmentation is carried out on the target video, so that segmented video under each plot is obtained, namely small plot video with finer division of station shift, meeting room, bedroom rest and living room watching television or large plot video with larger division of shift and home.

In this embodiment, each frame of image in the target video and the corresponding time code information and shot boundary category are acquired, shot slicing is performed on the target video according to the time code information and the shot boundary category, the sliced video under each shot is obtained, feature extraction is performed on each sliced video to obtain sliced features, all the sliced features are arranged according to the time sequence of each sliced video to obtain fusion sequence features, a classifier is used for classifying the scenario features of the fusion sequence features to obtain the corresponding segmentation time of each scenario, and the target video is segmented according to the segmentation time to obtain segmented video. The target video can be automatically segmented according to the story line and the processing flow by acquiring the target video, a large amount of human resources are not required to be consumed, the segmentation efficiency is improved, and the accuracy of segmentation is improved by segmenting the target video based on time code information, lens boundary types, segmentation features and classifiers.

As shown in fig. 4, a flowchart of a film segmentation method according to a third embodiment of the present invention is shown, and the step S201 of obtaining the shot boundary category of each frame image in the continuous target video in the second embodiment may include the following steps:

step S401: and arranging all the frame images according to the time code information corresponding to each frame image to obtain an image frame sequence.

Step S402: and carrying out grouping processing on the image frame sequence through a preset first sliding window to obtain an image frame group.

In this embodiment, the image frame sequence may refer to a sequence obtained by arranging each frame of image, the first sliding window may refer to a preset window for grouping the image frame sequence, and the image frame group may refer to an image frame set obtained by dividing the image frame group in units of groups.

In the process of obtaining the image frame group, time code information of all frame images is obtained, all frame images are arranged according to the increasing sequence of the time codes, an image frame sequence is obtained, sliding is carried out in the image frame sequence through a preset first sliding window in a sliding step length, and grouping processing is carried out on the image frame sequence, so that the image frame group is obtained.

For example, if the preset first sliding window is W, the size of the preset first sliding window is size _w, the sliding step length is stride, W _j is the jth image frame group, f _i is the ith image in the jth image frame group, each image in the jth image frame group may be expressed as:

Step S403: and aiming at any image frame group, carrying out shot boundary classification on each frame of image in the image frame group through a trained shot boundary classifier to obtain the shot boundary class of each frame of image.

Step S404: and obtaining the shot boundary category of each frame image in the continuous target video according to the shot boundary category of each frame image in all the image frame groups.

In this embodiment, the shot boundary classifier may refer to a machine learning model for performing shot boundary classification on image frames.

In the process of obtaining the shot boundary category of each frame image in the target video, each frame image in the image frame group is input into a trained shot boundary classifier, and the shot boundary classifier determines the shot boundary category of each frame image by judging whether each frame image is a shot boundary point, so that the shot boundary category of each frame image in all the image frame groups is obtained.

For example, if label _j is the shot boundary classification result of the jth image frame group, classifer _frame (·) is the shot boundary classifier, f _i is the ith frame image in the jth image frame group, and Z _i is the shot boundary classification result of the ith frame image in the jth image frame group, the classification result of each frame image in the jth image frame group may be expressed as: Wherein Z _i =1 indicates that the i-th frame image in the j-th image frame group is a shot boundary point, and Z _i =0 indicates that the i-th frame image in the j-th image frame group is not a shot boundary point.

In this embodiment, all frame images are arranged according to time code information corresponding to each frame image to obtain an image frame sequence, the image frame sequence is subjected to grouping processing through a first sliding window to obtain an image frame group, each frame image in all image frame groups is subjected to shot boundary classification through a shot boundary classifier to obtain a shot boundary class of each frame image in a target video, wherein each frame image is automatically classified through a trained shot boundary classifier to obtain a shot boundary class of an image, time and cost of manual processing are saved, and efficiency of shot boundary classification of each frame image is improved.

As shown in fig. 5, a flowchart of a film segmentation method according to a fourth embodiment of the present invention is provided, in the second embodiment, in step S202, according to time code information and a shot boundary category, shot segmentation is performed on a target video to obtain segmented video under each shot, and the method may include the following steps:

step S501: and traversing the image frame sequence, and determining all target images with the shot boundary categories as shot boundary points.

Step S502: and determining the slicing time corresponding to each lens according to the time code information of each frame of target image.

Step S503: and performing lens slicing on the target video according to the slicing time to obtain sliced video under each lens.

In this embodiment, the shot boundary point may refer to a switching point between two shots, the target image may refer to an image with a shot boundary class as the shot boundary point, and the slicing time may refer to the slicing end time of each shot.

In the process of obtaining the segmented video under each shot, determining all target images serving as shot boundary points by traversing an image frame sequence in sequence according to a time sequence, wherein all the target images comprise a first frame image, a last frame image and images serving as the shot boundary points between the first frame image and the last frame image of the target video, acquiring time code information corresponding to other target images except the first frame image in the target images, taking the time code information as the corresponding segmentation time of each shot, and performing shot segmentation on the target video according to the segmentation time to obtain the segmented video under each shot.

For example, if ten frames of images are shared in the target video, each frame of image corresponds to a time code S ₁、S₂、S₃、S₄、S₅、S₆、S₇、S₈、S₉ and S ₁₀, each frame of image corresponds to a first frame of image, a third frame of image, a sixth frame of image, an eighth frame of image, and a tenth frame of image in the target video, the target video is divided into 4 shots, by acquiring the time code information of the first frame of image, the third frame of image, the sixth frame of image, the eighth frame of image, and the tenth frame of image, respectively, S ₁、S₃、S₆、S₈ and S ₁₀, the time code S ₃ corresponding to the third frame of image is taken as the end time of the first shot, the time code S ₆ corresponding to the sixth frame of image is taken as the end time of the second shot, the time code S ₈ corresponding to the eighth frame of image is taken as the end time of the third shot, and the time of the fourth shot is taken as the end time of the third shot, the target video is sliced according to the end time of each shot, and the end time of the first shot is taken as the end time of the first shot, and the end time of the second shot is taken as the end time of the third shot end time of the slice, and the start time of the fourth shot is taken as the start time of the second shot and the end time of video is taken as the start time of the fourth shot and the end time of video is taken (S can be taken).

In this embodiment, an image frame sequence is traversed, all target images serving as shot boundary points are determined, a slicing time corresponding to each shot is determined according to time code information of each frame of target image, and shot slicing is performed on a target video according to the slicing time to obtain a sliced video under each shot, wherein the time code information of the target images serving as the shot boundary points is used as the slicing end time of each shot to perform slicing processing on the target video, so that the processing procedure of shot slicing is simplified, and the efficiency of shot slicing is improved.

As shown in fig. 6, a flow chart of a film segmentation method according to a fifth embodiment of the present invention is shown, in the second embodiment, feature extraction is performed on each segmented video in step S203 to obtain segmented features, which may include the following steps:

step S601: and respectively extracting the image characteristics of each segmented video through the trained image characteristic extractor to obtain segmented image characteristics.

In this embodiment, the image feature extractor may refer to a machine learning model for extracting an image feature for each lens, and the tile image feature may refer to an image feature for each lens.

In the process of obtaining the segmented image features, the image frame sequences in each lens are sampled at equal intervals to obtain image frame sub-sequences of each lens, the image frame sub-sequences are input into a trained image feature extractor, and the image feature extractor is used for extracting the image features of each frame image in the image frame sub-sequences to obtain the image features of each lens.

For example, if S _k is the kth shot, f _i is the ith frame image in the kth shot, the sequence of image frames for the kth shot may be S _k＝{f₁,f₂,···,f_I, the sequence of image frames is sampled at equal intervals, the image is taken every h frames, the sequence of image frames for the kth shot may be S' _k＝{f₁,f_1+h,···,f_1+m×h, where,For the length of the image frame sub-sequence of the kth shot, if the image feature extractor is featureExtractor _image (·), the slicing image feature extraction of the kth shot by the image feature extractor may be F _k,image＝featureExtractor_image(S'_k).

Step S602: and respectively extracting the audio characteristics of each segmented video through the trained audio characteristic extractor to obtain segmented audio characteristics.

In this embodiment, the audio feature extractor may refer to a machine learning model for extracting audio features for each shot, and the sliced audio features may refer to audio features for each shot.

In the process of obtaining the segmented audio features, the audio data in each lens are arranged according to time sequence, the audio data are segmented at fixed time intervals, the segmented audio data are input into a trained audio feature extractor according to time sequence, and the audio feature extractor is used for extracting the audio features of the audio data to obtain the audio features of each lens.

For example, if the audio feature extractor is featureExtractor _audio (·) and the audio of the kth shot is a _k, the slicing audio feature extraction of the kth shot by the audio feature extractor may be F _k,audio＝featureExtractor_audio(A_k).

Step S603: and respectively carrying out feature fusion on the segmented image features and the segmented audio features of each segmented video through the trained feature fusion device to obtain segmented features.

In this embodiment, the feature fusion device may refer to a machine learning model for fusing audio features and image features of each lens, and the slice feature may refer to a fusion feature of the audio features and the image features of each lens.

In the process of obtaining the slicing characteristics, the audio characteristics and the image characteristics of each lens are input into a trained characteristic fusion device, and the characteristic fusion device is used for carrying out characteristic fusion on the audio characteristics and the image characteristics to obtain the slicing characteristics of each lens. For example, if the feature fusion device is featureFuser (·), the slicing feature fusion performed by the kth lens may be

In this embodiment, image feature extraction is performed on the segmented video through the image feature extractor to obtain segmented image features, audio feature extraction is performed on the segmented video through the audio feature extractor to obtain segmented audio features, feature fusion is performed on the segmented image features and the segmented audio features of the segmented video through the feature fusion device to obtain segmented features, wherein the image features and the audio features of each lens are fused, multi-mode features of the lenses are aggregated, and accuracy of plot segmentation according to the segmented features is improved.

As shown in fig. 7, a flowchart of a film segmentation method according to a sixth embodiment of the present invention may further include the following steps before arranging all the segment features according to the time sequence of each segment video in step S203 in the second embodiment:

step S701: and acquiring time code information corresponding to each frame of image in each segmented video.

Step S702: and determining the time code information of each segmented video according to the time code information corresponding to each frame of image in each segmented video.

Step S703: and determining the time sequence of each piece of video according to the time code information of each piece of video.

In this embodiment, by acquiring time code information corresponding to each frame image in the segmented video, using the time code information corresponding to the first frame image in each segmented video as the time code information of the segmented video, arranging all the segmented videos according to the time code size, and obtaining the time sequence of each segmented video.

For example, if the target video is divided into three shot videos, the time code corresponding to the first frame image in the first shot video is S ₁, the time code corresponding to the first frame image in the second shot video is S ₃, the time code corresponding to the first frame image in the third shot video is S ₆, it may be determined that the time code of the first shot video is S ₁, the time code of the first shot video is S ₃, and the time code of the first shot video is S ₆,S₁＜S₃＜S₆, and it may be determined that the time sequence of each of the first shot video, the second shot video, and the third shot video is.

In this embodiment, the time code information corresponding to each frame image in each segmented video is obtained, the time code information of each segmented video is determined according to the time code information corresponding to each frame image in each segmented video, and the time sequence of each segmented video is determined according to the time code information of each segmented video, so that the segmented videos can be arranged according to the correct sequence, and convenience is provided for the subsequent segmented video to perform scenario segmentation.

As shown in fig. 8, a flow chart of a film segmentation method according to a seventh embodiment of the present invention is shown, in the second embodiment, in step S204, a trained classifier is used to classify scenario features of a fusion sequence feature, so as to obtain a segmentation time corresponding to each scenario, and the method may include the following steps:

Step S801: and grouping the fusion sequence features through a preset second sliding window to obtain a fragment feature group.

In this embodiment, the second sliding window may be a preset window for grouping the features of the fusion sequence, and the group of slice features may be a set of slice features obtained by dividing the group into units.

And in the process of obtaining the segmentation feature set, sliding in the fusion sequence features through a preset second sliding window in a sliding step length, and grouping the fusion sequence features to obtain the segmentation feature set.

For example, if the preset second sliding window is W ', the size of the preset second sliding window is size ' _w, the sliding step length is stride ', W _j' is _j ' th slice feature set, f _i' is the i ' th slice feature in the j ' th slice feature set, and each slice feature in the j ' th slice feature set may be expressed as:

step S802: and aiming at any segmented feature group, classifying the scenario features of each segmented feature in the segmented feature group through a trained classifier to obtain the scenario boundary category of each segmented feature.

Step S803: and obtaining the plot boundary category of each piece of video according to the plot boundary category of each piece of feature in all piece of feature groups.

In this embodiment, the scenario boundary category may refer to a scenario boundary category of each shot, where the scenario boundary category may include two categories of scenario boundary points and non-scenario boundary points.

In the process of obtaining the plot boundary category of each sliced video, each sliced feature in the sliced feature group is input into a trained classifier, and the classifier determines the plot boundary category of each sliced video by judging whether the sliced feature corresponding to each sliced video is a plot boundary point or not.

For example, if label _j' is the scenario boundary classification result of the j ' th sliced feature set, classifer _shot (·) is a scenario boundary classifier, f _i' is the i ' th sliced feature in the j ' th sliced feature set, Z ' _i′ is the scenario boundary classification result of the i ' th sliced feature in the j ' th sliced feature set, the classification result of each sliced feature in the j ' th sliced feature set can be expressed as: Where Z '_i′ =1 indicates that the i' th sliced feature in the j 'th sliced feature set is a plot boundary point, and Z' _i′ =0 indicates that the i 'th sliced feature in the j' th sliced feature set is not a plot boundary point.

Step S804: and obtaining the corresponding segmentation time of each plot according to the time code information and the plot boundary category of each segmented video.

And in the process of obtaining the segmentation time corresponding to each plot, acquiring the segmented video with the plot boundary category as the plot boundary point, and acquiring the time code information of the segmented video with the plot boundary point, wherein the time code information of the segmented video is used as the segmentation time corresponding to each plot.

In this embodiment, the fused sequence features are subjected to grouping processing through a preset second sliding window to obtain a segmented feature group, each segmented feature in the segmented feature group is subjected to plot feature classification through a classifier to obtain a plot boundary category of each segmented feature, the plot boundary category of each segmented video is obtained according to the plot boundary category of each segmented feature, and the segmentation time corresponding to each plot is obtained according to the time code information and the plot boundary category of each segmented video, wherein the plot boundary category of the segmented video corresponding to the segmented feature is obtained by automatically classifying each segmented feature through a trained classifier, so that the time and cost of manual processing are saved, and the efficiency of plot boundary classification of each segmented video is improved.

As shown in fig. 9, a flow chart of a film segmentation method according to an eighth embodiment of the present invention is shown, and in step S804 of the seventh embodiment, the obtaining the segmentation time corresponding to each scenario according to the time code information and the scenario boundary category of each segmented video may include the following steps:

step S901: and traversing the fusion sequence characteristics, and determining all target segmented videos with the plot boundary categories as plot boundary points.

Step S902: and obtaining the segmentation time corresponding to each plot according to the time code information of each target segmented video.

In this embodiment, a scenario boundary point may refer to a switching point between two scenarios, a target slice video may refer to a slice video with a scenario boundary category being a scenario boundary point, and a split time may refer to a split end time of each scenario.

In the process of obtaining the segmentation time corresponding to each plot, determining all target segmented videos serving as plot boundary points by traversing the fusion sequence features in sequence according to a time sequence, wherein all target segmented videos comprise a first segmented video and a last segmented video of the target videos and segmented videos serving as plot boundary points between the first segmented video and the last segmented video, acquiring time code information corresponding to other target segmented videos except the first segmented video in the target segmented videos, taking the time code information as the segmentation ending time corresponding to each plot, and performing plot segmentation on the target videos according to the segmentation ending time to obtain videos under each plot, and particularly referring to the content in step S503.

In this embodiment, the fusion sequence feature is traversed, all the target segmented videos with the plot boundary category as plot boundary points are determined, the segmentation time corresponding to each plot is obtained according to the time code information of each target segmented video, and the time code information of the target segmented video with the plot boundary points is used as the segmentation time of each plot to segment the target video, so that the processing process of plot segmentation is simplified, and the plot segmentation efficiency is improved.

As shown in fig. 10, a schematic structural diagram of a film segmentation apparatus according to a ninth embodiment of the present application is provided, where the film segmentation apparatus corresponds to the film segmentation method in the foregoing embodiment one by one, and the film segmentation apparatus includes an obtaining module 1001, a lens segmentation module 1002, a feature extraction module 1003, and a scenario segmentation module 1004. The functional modules are described in detail as follows:

an obtaining module 1001, configured to obtain each frame image in the continuous target video, corresponding time code information, and a shot boundary class;

the shot slicing module 1002 is configured to slice the target video according to the time code information and the shot boundary category, to obtain a slice video under each shot;

the feature extraction module 1003 is configured to perform feature extraction on each segmented video to obtain segmented features, and arrange all the segmented features according to a time sequence of each segmented video to obtain a fusion sequence feature;

And the scenario segmentation module 1004 is configured to classify the scenario features of the fusion sequence features by using a trained classifier, obtain a segmentation time corresponding to each scenario, and segment the target video according to the segmentation time to obtain a segmented video.

Optionally, the acquiring module 1001 includes:

The image arrangement unit is used for arranging all the frame images according to the time code information corresponding to each frame image to obtain an image frame sequence;

The image grouping unit is used for grouping the image frame sequence through a preset first sliding window to obtain an image frame group;

The first image classification unit is used for carrying out shot boundary classification on each frame of image in the image frame group through a trained shot boundary classifier for any image frame group to obtain the shot boundary class of each frame of image;

And the second image classification unit is used for obtaining the shot boundary category of each frame image in the continuous target video according to the shot boundary category of each frame image in all the image frame groups.

Optionally, the lens slicing module 1002 includes:

the image traversing unit is used for traversing the image frame sequence and determining all target images with the shot boundary categories as shot boundary points;

the slicing time determining unit is used for determining the slicing time corresponding to each lens according to the time code information of each frame of target image;

And the lens acquisition unit is used for performing lens slicing on the target video according to the slicing time to obtain sliced video under each lens.

Optionally, the feature extraction module 1003 includes:

The image feature extraction unit is used for extracting the image features of each segmented video through the trained image feature extractor to obtain segmented image features;

the audio feature extraction unit is used for extracting the audio features of each segmented video through the trained audio feature extractor to obtain segmented audio features;

and the feature fusion unit is used for carrying out feature fusion on the segmented image features and the segmented audio features of each segmented video through the trained feature fusion device to obtain segmented features.

Optionally, the film splitting apparatus further includes:

the time code acquisition module is used for acquiring time code information corresponding to each frame of image in each segmented video;

The time code determining module is used for determining the time code information of each segmented video according to the time code information corresponding to each frame of image in each segmented video;

And the time sequence determining module is used for determining the time sequence of each segmented video according to the time code information of each segmented video.

Optionally, the scenario segmentation module 1004 includes:

The characteristic grouping unit is used for grouping the fusion sequence characteristics through a preset second sliding window to obtain a fragment characteristic group;

the first plot classification unit is used for classifying plot characteristics of each of the partitioned characteristic groups according to any partitioned characteristic group through the trained classifier to obtain plot boundary categories of each partitioned characteristic;

the second plot classification unit is used for obtaining the plot boundary category of each segmented video according to the plot boundary category of each segmented feature in all segmented feature groups;

And the plot determining unit is used for obtaining the segmentation time corresponding to each plot according to the time code information of each segmented video and the plot boundary category.

Optionally, the above-mentioned division time determining unit includes:

The feature traversing subunit is used for traversing the fusion sequence features and determining all target segmented videos with the plot boundary category as plot boundary points;

And the dividing time determining subunit is used for obtaining the dividing time corresponding to each plot according to the time code information of each target segmented video.

For specific limitations of the film segmentation apparatus, reference may be made to the limitations of the film segmentation method hereinabove, and no further description is given here. The respective modules in the film dividing apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

Fig. 11 is a schematic structural diagram of a computer device according to a tenth embodiment of the present invention. As shown in fig. 11, the computer device of this embodiment includes: at least one processor (only one shown in fig. 11), a memory, and a computer program stored in the memory and executable on the at least one processor, the processor executing the computer program to perform the steps of any of the various movie splitting method embodiments described above.

The computer device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that fig. 11 is merely an example of a computer device and is not intended to be limiting, and that a computer device may include more or fewer components than shown, or may combine certain components, or different components, such as may also include a network interface, a display screen, an input device, and the like.

The Processor may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory includes a readable storage medium, an internal memory, etc., where the internal memory may be the memory of the computer device, the internal memory providing an environment for the execution of an operating system and computer-readable instructions in the readable storage medium. The readable storage medium may be a hard disk of a computer device, and in other embodiments may be an external storage device of a computer device, for example, a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), etc. that are provided on a computer device. Further, the memory may also include both internal storage units and external storage devices of the computer device. The memory is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs such as program codes of computer programs, and the like. The memory may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working process of the units and modules in the above device may refer to the corresponding process in the foregoing method embodiment, which is not described herein again. The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above-described embodiment, and may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiment described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The present invention may also be implemented as a computer program product for implementing all or part of the steps of the method embodiments described above, when the computer program product is run on a computer device, causing the computer device to execute the steps of the method embodiments described above.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus/computer device and method may be implemented in other manners. For example, the apparatus/computer device embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical aspects of the present invention, not for limiting the same, and although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may be modified or some technical features may be replaced with other technical solutions, and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and all the modifications or replacements are included in the protection scope of the present invention.

Claims

1. A film segmentation method, characterized in that the film segmentation method comprises the steps of:

2. A film segmentation method according to claim 1, wherein said acquiring shot boundary categories for each frame of image in successive target video comprises:

arranging all the frame images according to the time code information corresponding to each frame image to obtain an image frame sequence;

Grouping the image frame sequences through a preset first sliding window to obtain an image frame group;

for any image frame group, performing shot boundary classification on each frame image in the image frame group through a trained shot boundary classifier to obtain a shot boundary class of each frame image;

And obtaining the shot boundary category of each frame image in the continuous target video according to the shot boundary category of each frame image in all the image frame groups.

3. The film segmentation method according to claim 2, wherein the performing shot segmentation on the target video according to the time code information and the shot boundary category to obtain segmented video under each shot comprises:

traversing the image frame sequence, and determining all target images with the shot boundary categories as shot boundary points;

Determining the slicing time corresponding to each lens according to the time code information of each frame of target image;

and performing lens slicing on the target video according to the slicing time to obtain sliced video under each lens.

4. A film segmentation method according to claim 1, wherein the feature extraction is performed on each segmented video to obtain segmented features, respectively, and the method comprises:

respectively extracting image characteristics of each segmented video through a trained image characteristic extractor to obtain segmented image characteristics;

respectively extracting the audio characteristics of each segmented video through the trained audio characteristic extractor to obtain segmented audio characteristics;

And respectively carrying out feature fusion on the segmented image features and the segmented audio features of each segmented video through the trained feature fusion device to obtain segmented features.

5. A film segmentation method according to claim 1, further comprising, prior to said arranging all of the slice features in the temporal order of each slice video:

Acquiring time code information corresponding to each frame of image in each segmented video;

Determining the time code information of each segmented video according to the time code information corresponding to each frame image in each segmented video;

and determining the time sequence of each piece of video according to the time code information of each piece of video.

6. A film segmentation method according to claim 5, wherein the classifying the scenario features using a trained classifier to obtain a corresponding segmentation time for each scenario comprises:

grouping the fusion sequence features through a preset second sliding window to obtain a fragment feature group;

aiming at any segmented feature group, classifying the scenario features of each segmented feature in the segmented feature group through the trained classifier to obtain the scenario boundary category of each segmented feature;

Obtaining the plot boundary category of each piece of video according to the plot boundary category of each piece of feature in all piece of feature groups;

And obtaining the corresponding segmentation time of each plot according to the time code information of each segmented video and the plot boundary category.

7. A film segmentation method according to claim 6, wherein said obtaining a corresponding segmentation time for each episode based on the time code information of each segmented video and the episode boundary category comprises:

Traversing the fusion sequence characteristics, and determining all target segmented videos with the plot boundary category as plot boundary points;

And obtaining the segmentation time corresponding to each plot according to the time code information of each target segmented video.

8. A film cutting apparatus, characterized in that the film cutting apparatus comprises:

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the film segmentation method according to any one of claims 1 to 7 when the computer program is executed.

10. A computer-readable storage medium storing a computer program, which when executed by a processor implements the steps of the film segmentation method as claimed in any one of claims 1 to 7.