CN111428589A

CN111428589A - Identification method and system for transition

Info

Publication number: CN111428589A
Application number: CN202010165457.3A
Authority: CN
Inventors: 王灿进
Original assignee: Xinhua Zhiyun Technology Co ltd
Current assignee: Xinhua Fusion Media Technology Development Beijing Co ltd; Xinhua Zhiyun Technology Co ltd
Priority date: 2020-03-11
Filing date: 2020-03-11
Publication date: 2020-07-17
Anticipated expiration: 2040-03-11
Also published as: CN111428589B

Abstract

The invention discloses a method and a system for identifying transition, wherein the identification method comprises the following steps: acquiring a video to be detected, and traversing the video to be detected by using a preset sliding window to obtain a first video clip; carrying out transition recognition on the first video clip based on a preset transition recognition model, and extracting the first video clip with the transition recognized to obtain a second video clip; predicting the transition type of each second video segment based on a preset type prediction model to obtain the transition type of each second video segment; and determining a transition interval in the video to be detected based on the transition type of each second video clip, and determining the transition type of the transition interval. Compared with the existing transition recognition method based on color analysis, the method provided by the invention has the advantages that the gradual transition in the video to be detected is recognized by utilizing the deep learning technology, so that the error recognition under the adverse conditions of lens shaking, virtual focus and the like can be overcome, and the accuracy is improved.

Description

Identification method and system for transition

Technical Field

The invention relates to the field of image recognition, in particular to a method and a system for recognizing a gradual transition.

Background

The method and apparatus for identifying shot cuts, with application number CN201610687298.7, proposes: extracting key frames of a video to be detected at equal intervals, dividing the key frames into a plurality of sub-regions, and judging whether lens switching exists or not by calculating the weighted distance of color or brightness histograms of the sub-regions of different key frames;

the application number CN201410831291.9 provides a video shot switching detection method and device based on frame difference clustering: calculating the gray value difference of every two images in three continuous frames to generate a three-dimensional vector, mapping the three-dimensional vector into a point in a space coordinate system through a clustering device, setting a radius parameter to generate a ball, and switching the point in the ball with a lens;

as can be seen from the above, the transition detection is realized based on color analysis, but this detection method is not suitable for identifying and positioning the transition from frame to frame, and is susceptible to the influence of the shooting quality, such as mistaken determination of the lens shake as the transition.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a method and a system for identifying a transition.

In order to solve the technical problem, the invention is solved by the following technical scheme:

a method for identifying a gradual transition comprises the following steps:

acquiring a video to be detected, and traversing the video to be detected by using a preset sliding window to obtain a first video clip;

carrying out transition recognition on the first video clip based on a preset transition recognition model, and extracting the first video clip with the transition recognized to obtain a second video clip;

predicting the transition type of each second video segment based on a preset type prediction model to obtain the transition type of each second video segment;

and determining a transition interval in the video to be detected based on the transition type of each second video clip, and determining the transition type of the transition interval.

As an implementation manner, the method for acquiring the transition recognition model includes:

collecting a sample video clip, judging whether the sample video clip contains transition, taking the sample video clip containing the transition as a training positive sample, and taking the sample video clip not containing the transition as a training negative sample;

and training by using the training positive sample and the training negative sample to obtain a transition recognition model, wherein the transition recognition model is used for recognizing whether the input first video clip is a transition or not.

As an implementable manner, the type prediction model obtaining method includes:

labeling the training positive sample based on the transition type to generate prediction training data;

and training by utilizing the prediction training data to obtain a type prediction model, wherein the input of the type prediction model is a second video segment, and the output is the transition type of the second video segment or the characteristic of the second video segment.

As an implementable manner, when the output of the type prediction model is a feature, predicting the transition type of each second video segment based on a preset type prediction model, and obtaining the transition type of each second video segment specifically comprises the following steps:

taking the feature output by the type prediction model as a first feature;

similarity matching is carried out on the first features and second features in a preset transition feature library to obtain matching results, the transition type of the first features is determined according to the matching results, and the transition type is used as the transition type corresponding to the corresponding second video clip;

as an implementable manner, the specific steps of performing similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, determining a transition type of the first feature according to the matching result, and using the transition type as a transition type corresponding to a corresponding second video clip are as follows:

respectively calculating cosine distances between the first features and the second features, and generating feature distance values according to the cosine distances;

and acquiring an optimal characteristic distance value, judging whether the optimal characteristic distance value meets a preset check condition, taking a transition type of a second characteristic corresponding to the optimal characteristic distance value as a transition type of the first characteristic when the optimal characteristic distance value meets the preset check condition, and judging whether the transition type of the first characteristic is transition-free if not.

As an implementation manner, the specific steps of determining the transition interval in the video to be detected based on the transition type of each second video segment and determining the transition type of the transition interval include:

generating a prediction result array according to the transition type, wherein the prediction result array represents the transition type of each first video segment;

and eliminating the data with the transition type of no transition from the prediction result array to obtain at least one continuous interval, taking the continuous interval as the transition interval, and extracting the transition type with the maximum number in the transition interval as the transition type of the transition interval.

The invention also provides a system for identifying transition, which comprises:

the system comprises a preprocessing module, a first video clip and a second video clip, wherein the preprocessing module is used for acquiring a video to be detected and traversing the video to be detected by utilizing a preset sliding window to acquire a first video clip;

the transition recognition module is used for carrying out transition recognition on the first video clip based on a preset transition recognition model and extracting the first video clip with the transition recognized to obtain a second video clip;

the type prediction module is used for predicting the transition type of each second video segment based on a preset type prediction model to obtain the transition type of each second video segment;

and the positioning identification module is used for determining a transition interval in the video to be detected based on the transition type of each second video clip and determining the transition type of the transition interval.

As an implementable manner, the system further comprises a model building module, wherein the model building module comprises a first data processing unit, a second data processing unit, a recognition model training unit and a prediction model training unit;

the first data processing unit is used for collecting sample video clips, judging whether the sample video clips contain transitions or not, taking the sample video clips containing the transitions as training positive samples, and taking the sample video clips not containing the transitions as training negative samples;

the identification model training unit is used for training by using the training positive sample and the training negative sample to obtain a transition identification model, and the transition identification model is used for identifying whether the input first video clip is a transition or not;

the second data processing unit is used for labeling the training positive sample based on the transition type to generate prediction training data;

and the prediction model training unit is used for training by utilizing the prediction training data to obtain a type prediction model, wherein the input of the type prediction model is a second video segment, and the output of the type prediction model is the transition type of the second video segment or the characteristic of the second video segment.

As an implementable embodiment, the location identification module is configured to:

The invention also proposes a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of any of the methods described above.

Due to the adoption of the technical scheme, the invention has the remarkable technical effects that:

1. according to the method, a transition recognition model for recognizing whether the input video is in transition is preset by utilizing a deep learning technology, and a type prediction model for further predicting the transition type corresponding to the input video is also preset, so that transition can be effectively recognized, the specific type of transition can be further determined, the requirement of post-processing personnel on transition generation is met, and the requirement of the post-processing personnel on the basis of transition is met.

2. According to the method and the device, the features of the second video clip are extracted, and the similarity calculation is carried out on the obtained first features and the second features in the preset transition feature library, so that the transition type of the second video clip is determined, the transition type can be effectively identified, and the transition features can be conveniently expanded according to actual needs to realize the expansion of the transition type.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic workflow diagram of a method for identifying a transition point according to the present invention;

FIG. 2 is a schematic block diagram showing the connection of modules of the system for identifying a transition in embodiment 3;

FIG. 3 is a block diagram of the module connection of the model training module 500 of FIG. 2;

fig. 4 is a block diagram showing the connection of the type prediction block 300 according to embodiment 4.

In the figure:

100 is a preprocessing module, 200 is a transition recognition module, 300 is a type prediction module, 310 is a feature extraction unit, 320 is a similarity matching unit, 400 is a location recognition module, 500 is a model training module, 510 is a first data processing unit, 520 is a second data processing unit, 530 is a recognition model training unit, and 540 is a prediction model training unit.

Detailed Description

The present invention will be described in further detail with reference to examples, which are illustrative of the present invention and are not to be construed as being limited thereto.

Embodiment 1, a method for identifying a gradual transition, as shown in fig. 1, includes the following steps:

s100, acquiring a video to be detected, and traversing the video to be detected by using a preset sliding window to obtain a first video clip;

s200, transition recognition is carried out on the first video clip based on a preset transition recognition model, and the first video clip with the identified transition is extracted to obtain a second video clip; that is, the first video segment containing the transition is screened out using the transition recognition model.

S300, predicting the transition type of each second video clip based on a preset type prediction model to obtain the transition type of each second video clip;

the transition types include no transition and specific transition types (fade in and fade out, checkerboard, zoom, wipe, jaggy, etc.); that is, the first video segment including the transition obtained by filtering in step S200 is further predicted, whether it includes the transition is determined, and a specific transition type is predicted.

S400, determining a transition interval in the video to be detected based on the transition type of each second video clip, and determining the transition type of the transition interval.

The existing transition identification method is usually based on image gray data or color difference data to judge whether lens switching occurs, so that the identification effect of gradual transition of frame-by-frame transition aiming at color/gray is poor, the existing transition identification method depends on a preset judgment threshold, and the method for setting double thresholds is usually adopted for gradual transition.

In this embodiment, a deep learning technology is utilized, a transition recognition model for recognizing whether an input video is a gradual transition is preset, and a type prediction model for further predicting a transition type corresponding to the input video is also preset, so that not only can a gradual transition be recognized effectively, but also a specific type of the gradual transition can be further determined, and when the requirement for recognizing the gradual transition is met, the requirement of post-processing personnel based on the gradual transition is further facilitated, such as special effect design or application analysis of the gradual transition.

In the step S100, the specific steps of obtaining a video to be detected, traversing the video to be detected by using a preset sliding window, and obtaining a first video clip include:

decoding a video to be detected to obtain a video sequence with the total length of L;

the length N and the step length S of the sliding window are configured according to actual needs, the constructed sliding window is utilized to traverse the video sequence, and a segment C consisting of N video frames, namely a first video segment, is taken out from the video sequence each time.

One skilled in the relevant art may actually need to configure the length N and the step size S of the sliding window by itself.

In this embodiment, a video to be detected is segmented in advance, and then transition recognition is performed on each segment of the first video segment, so that on one hand, the recognition accuracy is improved, and on the other hand, the transition interval in the video to be detected is conveniently located.

Further, the method for acquiring the transition recognition model in step S200 includes:

a1, collecting sample video clips, judging whether the sample video clips contain transitions, taking the sample video clips containing the transitions as training positive samples, and taking the sample video clips not containing the transitions as training negative samples;

and A2, training by using the training positive samples and the training negative samples to obtain a transition recognition model, wherein the transition recognition model is used for recognizing whether the input first video clip is a transition.

The transition recognition model is a two-classification model, and the output result is whether transition exists or not.

In this embodiment, an initial transition recognition model may be determined based on any one of the video classification networks, the initial transition recognition model is trained by using the training positive samples and the training negative samples, a plurality of corresponding intermediate models (models output when a loss curve tends to be stable) are obtained, and the intermediate model with the highest accuracy is selected as the transition recognition model to be output. Those skilled in the art can freely select a video classification model according to actual needs, and can implement binary classification of video segments, for example, a two-stream network or a 3D CNN network that are already disclosed in the prior art can be used.

Further, the method for collecting the sample video clips in the step a1 is as follows:

collecting an original video (e.g., a video downloaded from the internet) containing transitions, labeling a start frame, an end frame and a corresponding transition type of the transition in the original video, and obtaining a gradual transition sample.

Randomly sampling N frames of continuous frames in training video to obtain segment C_train，Obtaining a fragment C_train，Namely the sample video clip;

and calculating the contact ratio of the sample video clip and the corresponding transition sample, judging the sample video clip to be a training positive sample when the contact ratio reaches a preset contact ratio threshold value, and otherwise, judging the sample video clip to be a training negative sample.

The overlap ratio is calculated by the sample video segment including the first non-overlapping segment L1 and the overlapping segment L2, and the corresponding transition sample includes the second non-overlapping segment L03 and the overlapping segment L12, and then the ratio of the overlapping segment L2 to the total length (L1 + L2 + L3) of the sample video segment and the transition sample is calculated as the overlap ratio, that is, the overlap ratio is L2/(L1 + L2 + L3).

A person skilled in the relevant art can set the overlap ratio threshold according to actual needs, where the overlap ratio threshold is 50% in this embodiment, that is, when the overlap ratio calculated by a certain sample video clip reaches 50%, it is determined that the sample is a training positive sample.

Further, an original video can be generated according to a preset writing rule, and the specific method comprises the following steps:

acquiring a first shot section and a second shot section, taking the last frame of the first shot section as the starting frame of transition, and taking the first frame of the second shot section as the ending frame of transition;

and acquiring a transition generation rule, wherein the transition generation rule comprises a transition length and a transition type, generating a filling frame based on the transition generation rule, and adding the filling frame between the initial frame and the end frame to acquire an original video. The above fade transition types include, but are not limited to, fade, checkerboard, zoom, wipe, and jaggy.

Further, the sample video segment not containing the transition can be directly used as the training negative sample.

In order to enhance the stability and generalization capability of the transition recognition model, the training negative samples should be diversified enough, including various complex scenes such as conventional shooting, fast moving of a lens, virtual focus, dim light, strong light and the like, so as to effectively reduce the false alarm of difficult scenes therein.

Further, the method for acquiring the transition recognition model in step S300 includes:

a3, labeling the training positive sample based on the transition type to generate prediction training data;

namely, the transition type of the gradual transition sample corresponding to the training positive sample is used as the transition type of the training positive sample, that is, the training positive sample only labels the gradual transition type, such as fade-in fade-out, checkerboard, zoom, wipe, and sawtooth.

And A4, training by using the prediction training data to obtain a type prediction model, wherein the input of the type prediction model is a second video segment, and the output is the transition type of the second video segment.

The type prediction model is a multi-classification model, and the output result is a gradual transition type.

In this embodiment, an initial type prediction model may be determined based on any video classification network, the initial type prediction model may be trained by using prediction training data, a plurality of corresponding intermediate models (output models when a loss curve tends to be stable) may be obtained, and the intermediate model with the highest accuracy may be selected as the type prediction model to be output. Those skilled in the art can freely select the video classification according to actual needs, and can realize multi-classification of video segments, for example, a two stream network or a 3D CNN network which are available for disclosure.

Note that in the process of training the transition recognition model and the type prediction model, classification accuracy can be improved by adopting methods such as cascading grouping and increasing the error weight of the difficultly-classified samples, wherein the method of cascading grouping is to divide input into a plurality of large classes with obvious class boundary, then each large class is subdivided into a plurality of small classes, the method of increasing the weight of the difficultly-classified samples is to give a larger weight to the wrongly-classified samples with smaller loss values in the training process, and the error weights of cascading grouping and increasing the difficultly-classified samples are all the existing conventional technologies, so detailed description thereof is not given in the specification.

According to the method, the transition recognition model and the type prediction model are trained, the transition recognition is realized by utilizing a machine learning-based technology, the mistaken recognition of the transition under the adverse conditions of lens shaking, virtual focus and the like can be effectively avoided, and the accuracy and the recall rate of the transition recognition are effectively improved.

Further, the step S400 of determining the transition interval in the video to be detected based on the transition type of each second video segment includes the specific steps of:

s401, generating a prediction result array according to the transition types, wherein the prediction result array represents the transition types of the first video segments, namely, for the video sequence with the length of L, the prediction result array RES comprises ((L-N)/S) +1 transition types.

In this embodiment, the transition type corresponding to the training positive sample is labeled in an encoding manner (for example, one-hot encoding may be adopted), and at this time, the type prediction model outputs the encoding corresponding to the predicted transition type.

For example, the non-transition is recorded as 0, and each transition type is coded sequentially (fade-in and fade-out is recorded as 1, checkerboard is recorded as 2, and zoom is recorded as 3);

and sequentially acquiring the codes of the transition types corresponding to the first video segments, and generating a prediction result array RES, wherein each data in the prediction result array represents the transition type of the first video segment corresponding to the data one by one.

Note that, since the second video segment is a video segment including a transition in the first video segment, the prediction result array RES can be generated only according to the transition type of the second video segment.

S402, eliminating the data with the transition type of no transition from the prediction result array, obtaining at least one continuous interval, taking the continuous interval as the transition interval, and extracting the transition type with the maximum number in the transition interval as the transition type of the transition interval.

Namely, traversing the prediction result array RES according to the sequence from front to back to reach a first non-zero position i, continuously searching backwards until the first zero position j, wherein the zero position indicates that the transition type of the corresponding first video segment is no transition, sequence numbers i to j-1 are transition intervals, and merging the first video segments corresponding to the sequence numbers i to j-1 to obtain the corresponding gradual transition segments.

Counting the values from RES [ i ] to RES [ j-1], and obtaining the transition type with the largest number as the transition type of the transition section, for example, if i is 5, j is 9, RES [ i ] to RES [ j-1] is [3,3,2,3], then the transition type of the transition section is the transition type (zoom) corresponding to number 3.

As can be seen from the above, the present embodiment can locate the start and end positions of the gradual transition in the video to be detected, and can also identify the specific type of the gradual transition.

Embodiment 2, the output of the type prediction model in embodiment 1 is changed from "transition type of the second video segment" to "feature of the second video segment", and the rest is the same as embodiment 1.

That is, in the present embodiment, the type prediction model is changed from the multi-classification model to the feature extractor, but since the multi-classification model includes the feature extractor, and the probability that the input second video clip belongs to each transition type is calculated according to the features extracted by the feature extractor, so as to output the corresponding transition type, in the present embodiment, only the softmax layer of the type prediction model in embodiment 1 needs to be removed.

In this embodiment, the step S300 of predicting the transition type of each second video segment based on a preset type prediction model includes the specific steps of:

taking the feature output by the type prediction model as a first Feature (FEA);

carrying out similarity matching on the first Feature (FEA) and a second feature in a preset transition feature library (FEA-L IB) to obtain a matching result, determining the transition type of the first feature according to the matching result, and taking the transition type as the transition type corresponding to the corresponding second video clip;

in this embodiment, the method for obtaining the transition feature library (FEA-L IB) includes inputting the gradual transition samples in embodiment 1 into a type prediction model, obtaining features of each gradual transition sample, using the obtained features as second features, and constructing the transition feature library (FEA-L IB) based on the second features.

Furthermore, a transition feature library (FEA-L IB) can be subjected to class expansion and data expansion by collecting transition samples according to actual needs;

the method for the category expansion comprises the steps of collecting corresponding transition samples based on categories to be expanded, inputting the transition samples to a type prediction model to obtain corresponding features, and adding the features serving as second features into a transition feature library (FEA-L IB), so that the expansion of the transition categories is realized.

The data expansion method comprises the following steps: and collecting a gradual transition sample according to actual needs, and extracting the characteristics of the gradual transition sample as second characteristics.

Extension of the transition feature library (FEA-L IB) may also be achieved by adding a first feature as a second feature to the transition feature library (FEA-L IB) when the transition type of the first feature is of a gradual transition type.

Further, the specific steps of performing similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, determining a transition type of the first feature according to the matching result, and using the transition type as a transition type corresponding to a corresponding second video clip are as follows:

The characteristic distance value may be a cosine distance value or a (1-cosine distance value).

When the characteristic distance value is a cosine distance value, the optimal characteristic distance value is a maximum characteristic distance value, and the verification condition is that the cosine distance value is greater than a preset distance threshold (such as 0.5).

When the feature distance value is (1-cosine distance value), the optimal feature distance value is the minimum feature distance value, and the verification condition is that the cosine distance value is smaller than a preset distance threshold (such as 0.5).

In this embodiment, the features of the second video clip are extracted, the transition type of the second video clip is determined based on the obtained features, the transition feature library can be expanded according to actual needs, and in actual use, the recognizable transition type can be expanded according to needs, so that the expansibility is strong, and the application range is wide.

Embodiment 3, a system for identifying a gradual transition, as shown in fig. 2, includes a preprocessing module 100, a transition identifying module 200, a type predicting module 300, a location identifying module 400, and a model training module 500;

the preprocessing module 100 is configured to acquire a video to be detected, and traverse the video to be detected by using a preset sliding window to acquire a first video clip;

the transition recognition module 200 is configured to perform transition recognition on the first video segment based on a preset transition recognition model, and further configured to extract the first video segment with a transition recognized, so as to obtain a second video segment;

the type prediction module 300 is configured to predict a transition type of each second video segment based on a preset type prediction model, and obtain the transition type of each second video segment;

the positioning and identifying module 400 is configured to determine a transition interval in the video to be detected based on the transition type of each second video segment, and determine the transition type of the transition interval.

As shown in fig. 3, the model construction module 500 includes a first data processing unit 510, a second data processing unit 520, a recognition model training unit 530, and a prediction model training unit 540;

the first data processing unit 510 is configured to collect a sample video clip, determine whether the sample video clip contains a transition, use the sample video clip containing the transition as a training positive sample, and use the sample video clip not containing the transition as a training negative sample;

the identification model training unit 530 is configured to train with the training positive sample and the training negative sample to obtain a transition identification model, where the transition identification model is configured to identify whether the input first video segment is a transition;

the second data processing unit 520 is configured to label the training positive sample based on a transition type to generate predicted training data;

the prediction model training unit 540 is configured to train with the prediction training data to obtain a type prediction model, where an input of the type prediction model is a second video segment, and an output of the type prediction model is a transition type of the second video segment.

Further, the location identification module 400 is configured to:

This embodiment is an embodiment of the apparatus corresponding to embodiment 1, and since it is basically similar to the embodiment of the method (embodiment 1), the description is relatively simple, and for the relevant points, refer to the partial description of the embodiment of the method (embodiment 1).

Embodiment 4, the prediction model training unit 540 in embodiment 3 is modified to "obtain a type prediction model by training using the prediction training data, the input of the type prediction model is a second video segment, and the output is a feature of the second video segment", and the rest is the same as in embodiment 3.

As shown in fig. 4, the type prediction module 300 in the present embodiment includes a feature extraction unit 310 and a similarity matching unit 320:

the feature extraction unit 310 is configured to perform feature extraction on the input second video segment by using a type prediction model to obtain a first feature;

the similarity matching unit 320 is configured to perform similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, and is further configured to determine a transition type of the first feature according to the matching result, and use the transition type as a transition type corresponding to a corresponding second video clip;

further, the similarity matching unit 320 is configured to:

This embodiment is an embodiment of an apparatus corresponding to embodiment 2, and since it is basically similar to the method embodiment (embodiment 2), the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment (embodiment 2).

Embodiment 5 is a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method of embodiment 1 or embodiment 2.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that:

reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, the appearances of the phrase "one embodiment" or "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

In addition, it should be noted that the specific embodiments described in the present specification may differ in the shape of the components, the names of the components, and the like. All equivalent or simple changes of the structure, the characteristics and the principle of the invention which are described in the patent conception of the invention are included in the protection scope of the patent of the invention. Various modifications, additions and substitutions for the specific embodiments described may be made by those skilled in the art without departing from the scope of the invention as defined in the accompanying claims.

Claims

1. A method for identifying a gradual transition is characterized by comprising the following steps:

2. The identification method of gradual transition according to claim 1, wherein the transition identification model is obtained by:

3. The method for identifying gradual transitions according to claim 2, wherein the type prediction model is obtained by:

4. The method for identifying gradual transition as claimed in claim 3, wherein when the output of the type prediction model is a feature, the step of predicting the transition type of each second video segment based on the preset type prediction model to obtain the transition type of each second video segment comprises:

taking the feature output by the type prediction model as a first feature;

and performing similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, determining the transition type of the first feature according to the matching result, and taking the transition type as the transition type corresponding to the corresponding second video clip.

5. The gradual transition recognition method according to claim 4, wherein the specific steps of performing similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, determining a transition type of the first feature according to the matching result, and using the transition type as a transition type corresponding to a corresponding second video segment are as follows:

6. The method for identifying gradual transition according to any one of claims 1 to 5, wherein the specific steps of determining the transition interval in the video to be detected based on the transition type of each second video segment and determining the transition type of the transition interval are as follows:

7. A system for identifying a gradual transition, comprising:

8. The system for identifying gradual transitions of claim 7, further comprising a model building module comprising a first data processing unit, a second data processing unit, an identification model training unit, and a prediction model training unit;

9. The system for identifying gradual transitions as claimed in claim 7 or 8, wherein the location identification module is configured to:

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.