CN111428589B

CN111428589B - Gradual transition identification method and system

Info

Publication number: CN111428589B
Application number: CN202010165457.3A
Authority: CN
Inventors: 王灿进
Original assignee: Xinhua Fusion Media Technology Development Beijing Co ltd; Xinhua Zhiyun Technology Co ltd
Current assignee: Xinhua Fusion Media Technology Development Beijing Co ltd; Xinhua Zhiyun Technology Co ltd
Priority date: 2020-03-11
Filing date: 2020-03-11
Publication date: 2023-05-30
Anticipated expiration: 2040-03-11
Also published as: CN111428589A

Abstract

The invention discloses a gradual transition identification method and a gradual transition identification system, wherein the identification method comprises the following steps: acquiring a video to be detected, traversing the video to be detected by using a preset sliding window, and acquiring a first video segment; performing transition recognition on the first video segment based on a preset transition recognition model, and extracting the first video segment with the recognized transition to obtain a second video segment; predicting the transition type of each second video segment based on a preset type prediction model to obtain the transition type of each second video segment; and determining a transition interval in the video to be detected based on the transition type of each second video segment, and determining the transition type of the transition interval. Compared with the existing transition identification method based on color analysis, the method provided by the invention can be used for identifying the gradual transition in the video to be detected by utilizing the deep learning technology, so that the false identification under adverse conditions such as lens shake, virtual focus and the like can be overcome, and the accuracy is improved.

Description

Gradual transition identification method and system

Technical Field

The invention relates to the field of image recognition, in particular to a gradual transition recognition method and system.

Background

The application number CN201610687298.7 "a method and apparatus for identifying shot switching" proposes: extracting key frames of a video to be detected at equal intervals, dividing the key frames into a plurality of subareas, and judging whether shot switching exists or not by calculating weighted distances of color or brightness histograms of the subareas of different key frames;

the application number is CN201410831291.9, and the method and the device for detecting video shot switching based on frame difference clustering are as follows: calculating the gray value difference of every two frames of images in three continuous frames to generate a three-dimensional vector, mapping the three-dimensional vector into points in a space coordinate system through a clustering device, setting radius parameters to generate a sphere, and switching the points in the sphere by using a lens;

as can be seen from the above, the detection of transition is realized based on color analysis nowadays, but such detection method is not suitable for identifying and locating gradual transition of transition from frame to frame, and is easily affected by shooting quality, such as misunderstanding lens shake as gradual transition.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a gradual transition identification method and a gradual transition identification system.

In order to solve the technical problems, the invention is solved by the following technical scheme:

an identification method for gradual transition comprises the following steps:

acquiring a video to be detected, traversing the video to be detected by using a preset sliding window, and acquiring a first video segment;

performing transition recognition on the first video segment based on a preset transition recognition model, and extracting the first video segment with the recognized transition to obtain a second video segment;

predicting the transition type of each second video segment based on a preset type prediction model to obtain the transition type of each second video segment;

and determining a transition interval in the video to be detected based on the transition type of each second video segment, and determining the transition type of the transition interval.

As an embodiment, the method for acquiring the transition identification model includes:

collecting a sample video fragment, judging whether the sample video fragment contains transition, taking the sample video fragment containing the transition as a training positive sample, and taking the sample video fragment not containing the transition as a training negative sample;

and training by utilizing the training positive sample and the training negative sample to obtain a transition identification model, wherein the transition identification model is used for identifying whether the input first video segment is in transition.

As one embodiment, the method for obtaining the type prediction model includes:

labeling the training positive samples based on transition types to generate predicted training data;

and training by utilizing the prediction training data to obtain a type prediction model, wherein the input of the type prediction model is a second video segment, and the output is the transition type of the second video segment or the characteristics of the second video segment.

As an implementation manner, when the output of the type prediction model is characteristic, predicting the transition type of each second video segment based on the preset type prediction model, and the specific steps of obtaining the transition type of each second video segment are as follows:

taking the characteristics output by the type prediction model as first characteristics;

performing similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, determining a transition type of the first feature according to the matching result, and taking the transition type as a transition type corresponding to a corresponding second video segment;

as an implementation manner, the specific steps of performing similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, determining a transition type of the first feature according to the matching result, and taking the transition type as a transition type corresponding to a corresponding second video segment are as follows:

respectively calculating cosine distances between the first feature and each second feature, and generating a feature distance value according to the cosine distances;

and acquiring an optimal characteristic distance value, judging whether the optimal characteristic distance value meets a preset verification condition, taking a transition type of a second characteristic corresponding to the optimal characteristic distance value as a transition type of the first characteristic when the optimal characteristic distance value meets the preset verification condition, and otherwise, judging that the transition type of the first characteristic is no transition.

As an implementation manner, the specific steps of determining the transition interval in the video to be detected based on the transition type of each second video segment, and determining the transition type of the transition interval are as follows:

generating a prediction result array according to the obtained transition types, wherein the prediction result array represents the transition types of the first video clips;

and eliminating the data with the transition type of no transition from the prediction result array, obtaining at least one continuous interval, taking the continuous interval as a transition interval, and extracting the transition type with the largest number in the transition interval as the transition type of the transition interval.

The invention also provides a gradual transition identification system, which comprises:

the preprocessing module is used for acquiring a video to be detected and traversing the video to be detected by utilizing a preset sliding window to acquire a first video segment;

the transition identification module is used for carrying out transition identification on the first video segment based on a preset transition identification model, and extracting the first video segment with the transition identified, so as to obtain a second video segment;

the type prediction module is used for predicting the transition type of each second video segment based on a preset type prediction model to obtain the transition type of each second video segment;

the positioning identification module is used for determining a transition zone in the video to be detected based on the transition type of each second video segment and determining the transition type of the transition zone.

As an embodiment, the method further comprises a model building module, wherein the model building module comprises a first data processing unit, a second data processing unit, an identification model training unit and a prediction model training unit;

the first data processing unit is used for collecting sample video fragments, judging whether the sample video fragments contain transition, taking the sample video fragments containing the transition as training positive samples, and taking the sample video fragments not containing the transition as training negative samples;

the recognition model training unit is used for obtaining a transition recognition model through training by utilizing the training positive sample and the training negative sample, and the transition recognition model is used for recognizing whether the input first video segment is in transition;

the second data processing unit is used for marking the training positive sample based on the transition type and generating predicted training data;

the prediction model training unit is used for obtaining a type prediction model through training by using the prediction training data, wherein the input of the type prediction model is a second video segment, and the output of the type prediction model is the transition type of the second video segment or the characteristics of the second video segment.

As one implementation, the location identification module is configured to:

The invention also proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of any of the methods described above.

The invention has the remarkable technical effects due to the adoption of the technical scheme:

1. according to the invention, a deep learning technology is utilized, a transition identification model for identifying whether the input video is in gradual transition is preset, and a type prediction model for further predicting the type of transition corresponding to the input video is preset, so that gradual transition can be effectively identified, the specific type of gradual transition can be further determined, the gradual transition identification requirement is met, and meanwhile, the requirement generated by post-processing personnel based on gradual transition is facilitated.

2. According to the invention, the characteristics of the second video segment are extracted, and the similarity calculation is carried out on the obtained first characteristics and the second characteristics in the preset transition characteristic library, so that the transition type of the second video segment is determined, the gradual transition can be effectively identified, and the expansion of the transition type is conveniently realized according to the actual requirement expansion transition characteristics.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a schematic workflow diagram of a gradual transition identification method of the present invention;

FIG. 2 is a schematic diagram showing the modular connection of the identification system for gradual transition in example 3;

FIG. 3 is a schematic diagram of a module connection of the model training module 500 of FIG. 2;

fig. 4 is a schematic diagram of module connection of the type prediction module 300 in embodiment 4.

In the figure:

100 is a preprocessing module, 200 is a transition identification module, 300 is a type prediction module, 310 is a feature extraction unit, 320 is a similarity matching unit, 400 is a positioning identification module, 500 is a model training module, 510 is a first data processing unit, 520 is a second data processing unit, 530 is a recognition model training unit, and 540 is a prediction model training unit.

Detailed Description

The present invention will be described in further detail with reference to the following examples, which are illustrative of the present invention and are not intended to limit the present invention thereto.

Embodiment 1, a gradual transition identification method, as shown in fig. 1, includes the following steps:

s100, acquiring a video to be detected, traversing the video to be detected by using a preset sliding window, and acquiring a first video segment;

s200, carrying out transition recognition on the first video segment based on a preset transition recognition model, and extracting the first video segment with the recognized transition to obtain a second video segment; that is, a first video segment containing a transition is screened out using a transition identification model.

S300, predicting the transition type of each second video segment based on a preset type prediction model to obtain the transition type of each second video segment;

the transition types include no transition and specific gradual transition types (fade-in fade-out, checkerboard, zoom, wipe, saw-tooth, etc.); that is, the first video segment including transition obtained by screening in step S200 is further predicted, and whether it includes transition is confirmed, and a specific gradual transition type is predicted.

S400, determining a transition interval in the video to be detected based on the transition type of each second video segment, and determining the transition type of the transition interval.

The existing transition identification method is often used for judging whether shot switching occurs or not based on image gray data or color difference data, so that the identification effect of gradual transition for color/gray frame-by-frame transition is poor, the existing transition identification method is dependent on a preset judgment threshold value, the gradual transition is often realized by adopting a method for setting double thresholds, the scheme is excessively dependent on setting of the thresholds, if the threshold value is excessively small, the gray value difference between continuous frames or key frames is large under the conditions of large lens shaking, virtual focus and the like, misjudgment is easily caused, and if the threshold value is excessively large, the recall rate is lower.

In this embodiment, a deep learning technology is used to preset a transition recognition model for recognizing whether an input video is a gradual transition, and a type prediction model for further predicting a transition type corresponding to the input video is also preset, so that not only can the gradual transition be recognized effectively, but also the specific type of the gradual transition can be determined further, and the gradual transition recognition requirement is met, but also requirements generated by post-processing personnel based on the gradual transition, such as special effect design or application analysis of gradual transition, are facilitated.

In the step S100, the specific steps of obtaining the video to be detected, traversing the video to be detected by using a preset sliding window, and obtaining the first video segment are as follows:

decoding the video to be detected to obtain a video sequence with the total length L;

and configuring the length N and the step length S of the sliding window according to actual needs, traversing the video sequence by utilizing the constructed sliding window, and taking out a fragment C consisting of N video frames from the video sequence each time, namely a first video fragment.

One skilled in the relevant art may actually need to configure the length N and the step size S of the sliding window by himself.

In the embodiment, the video to be detected is segmented in advance, and then the transition identification is carried out on each segment of the first video segment, so that on one hand, the accuracy of identification is improved, and on the other hand, the positioning of the transition area in the video to be detected is facilitated.

Further, the method for obtaining the transition identification model in step S200 is as follows:

a1, collecting sample video fragments, judging whether the sample video fragments contain transition, taking the sample video fragments containing the transition as training positive samples, and taking the sample video fragments not containing the transition as training negative samples;

a2, training by using the training positive sample and the training negative sample to obtain a transition identification model, wherein the transition identification model is used for identifying whether the input first video segment is in transition.

The transition recognition model is a classification model, and the output result is transition or non-transition.

In this embodiment, an initial transition recognition model may be determined based on any video classification network, and the initial transition recognition model is trained by using the training positive sample and the training negative sample to obtain a plurality of corresponding intermediate models (models output when the loss curve tends to be stable), and the intermediate model with the highest accuracy is selected to be output as the transition recognition model. The person skilled in the art can freely select the video classification model according to actual needs, and can realize classification of video clips, for example, an existing published two stream network or a 3D CNN network can be used.

Further, the method for collecting the sample video clips in the step A1 is as follows:

and collecting the original video containing the transition (such as downloading the video from the Internet), and marking the start frame, the end frame and the corresponding transition type of the transition in the original video to obtain a gradual transition sample.

Randomly sampling N continuous frames in a training video to obtain a fragment C _train， The resulting fragment C _train， Namely, a sample video fragment;

and calculating the coincidence ratio of the sample video segment and the corresponding gradual transition sample, judging that the sample video segment is a training positive sample when the coincidence ratio reaches a preset coincidence ratio threshold value, and judging that the sample video segment is a training negative sample otherwise.

The method for calculating the coincidence ratio is that the sample video segment comprises a first non-coincident segment L1 and a coincident segment L2, the corresponding gradual transition sample comprises a second non-coincident segment L3 and a coincident segment L2, and the ratio of the coincident segment L2 in the total length (L1+L2+L3) of the sample video segment and the gradual transition sample is calculated as the coincidence ratio, namely, the coincidence ratio is L2/(L1+L2+L3).

A person skilled in the relevant art can set the coincidence degree threshold according to actual needs, and in the embodiment, the coincidence degree threshold is 50%, namely, when the coincidence degree calculated by a certain sample video segment reaches 50%, the training positive sample is judged.

Further, the original video can be generated according to a preset writing rule, and the specific method comprises the following steps:

acquiring a first shot segment and a second shot segment, taking the last frame of the first shot segment as a transition starting frame, and taking the first frame of the second shot segment as a transition ending frame;

and acquiring a transition generation rule, wherein the transition generation rule comprises a gradual transition length and a gradual transition type, generating a filling frame based on the transition generation rule, and adding the filling frame between the starting frame and the ending frame to obtain an original video. The transition types include, but are not limited to, fade-in and fade-out, checkerboard, zoom, wipe, and saw-tooth.

Further, a sample video segment that does not contain a transition may also be directly taken as a training negative sample.

In order to enhance the stability and generalization capability of the transition recognition model, the training negative sample should be sufficiently diversified, including various complex scenes such as regular shooting, rapid lens movement, virtual focus, dim light, strong light and the like, so as to effectively reduce false alarm of difficult scenes therein.

Further, the method for obtaining the transition identification model in step S300 is as follows:

a3, marking the training positive sample based on the transition type, and generating predicted training data;

the transition type of the gradual transition sample corresponding to the training positive sample is used as the transition type of the training positive sample, namely, the training positive sample only marks the gradual transition type, such as fade-in fade-out, checkerboard, scaling, scratching and jaggy.

A4, training by utilizing the prediction training data to obtain a type prediction model, wherein the input of the type prediction model is a second video segment, and the output of the type prediction model is the transition type of the second video segment.

The type prediction model is a multi-classification model, and the output result is a gradual transition type.

In this embodiment, an initial type prediction model may be determined based on any video classification network, and the initial type prediction model is trained by using prediction training data to obtain a plurality of corresponding intermediate models (models output when a loss curve tends to be stable), and the intermediate model with the highest accuracy is selected to be output as the type prediction model. The person skilled in the art can freely select the video classification according to actual needs, and can realize multi-classification of video clips, for example, the existing two-stream network or 3D CNN network can be used.

In the process of training a transition recognition model and a type prediction model, the classification accuracy can be improved by adopting methods of cascading grouping, increasing the error weight of a difficult-to-divide sample and the like, wherein the cascading grouping method is to divide input into a plurality of major classes with obvious inter-class boundaries, then subdivide each major class into a plurality of minor classes, the method of increasing the weight of the difficult-to-divide sample is to give a larger weight to the error-divided sample with a smaller loss value in the training process, and the cascading grouping and the increasing of the error weight of the difficult-to-divide sample are both conventional technologies, so the method is not described in detail in the specification.

According to the embodiment, through training the transition identification model and the type prediction model, gradual transition identification is realized based on a machine learning technology, so that error identification of gradual transition under adverse conditions such as lens shaking and virtual focus can be effectively avoided, and the accuracy rate and recall rate of gradual transition identification are effectively improved.

Further, in the step S400, a transition interval in the video to be detected is determined based on the transition type of each second video segment, and the specific steps of determining the transition type of the transition interval are as follows:

s401, generating a prediction result array according to the obtained transition types, wherein the prediction result array represents the transition types of all the first video clips; that is, for a video sequence of length L, the resulting predictor array RES contains ((L-N)/S) +1 transition types.

In this embodiment, the transition type corresponding to the training positive sample is marked by means of coding (for example, one-hot coding may be adopted), and at this time, the type prediction model outputs the coding of the corresponding predicted transition type.

For example, the no-transition is marked as 0, and the gradual transition types are coded in turn (fade-in fade-out is marked as 1, checkerboard is marked as 2, and zoom is marked as 3);

and sequentially acquiring codes of transition types corresponding to the first video segments, and generating a prediction result array RES, wherein each data in the prediction result array represents the transition type of the first video segment corresponding to each data.

Note that, since the second video segment is a video segment including transition in the first video segment, the prediction result array RES can be generated only according to the transition type of the second video segment.

S402, eliminating data with transition types of no transition from the prediction result array, obtaining at least one continuous interval, taking the continuous interval as a transition interval, and extracting the transition type with the largest number in the transition interval as the transition type of the transition interval.

That is, the predicted result array RES is traversed according to the sequence from front to back, the first non-zero position i is reached, the backward search is continued until the first zero position j is reached, the zero position indicates that the transition type of the corresponding first video segment is no transition, the sequence numbers i to j-1 are a transition interval, and the corresponding gradual transition segments can be obtained by combining the first video segments corresponding to the sequence numbers i to j-1.

And counting the values of RES [ i ] to RES [ j-1], and acquiring the transition type with the largest number as the transition type of the transition zone, for example, i= 5,j =9, and RES [ i ] to RES [ j-1] are [3,3,2,3], wherein the transition type of the transition zone is the transition type (scaling) corresponding to the number 3.

From the above, the present embodiment can locate the start and end positions of the gradual transition in the video to be detected, and can also identify the specific type of the gradual transition.

Embodiment 2, the output of the type prediction model in embodiment 1 is changed from "transition type of the second video segment" to "feature of the second video segment", and the rest is the same as embodiment 1.

That is, the type prediction model in this embodiment is changed from the multi-classification model to the feature extractor, but since the multi-classification model includes the feature extractor, the probability that the second video segment input by the feature extraction of the feature extractor belongs to each transition type is calculated, so as to output the corresponding transition type, so that the softmax layer of the type prediction model in embodiment 1 only needs to be removed in this embodiment.

In this embodiment, in the step S300, the predicting the transition type of each second video segment based on the preset type prediction model, and the specific steps of obtaining the transition type of each second video segment are as follows:

taking the characteristic output by the type prediction model as a first characteristic (FEA);

performing similarity matching on the first Feature (FEA) and a second feature in a preset transition feature library (FEA-LIB), obtaining a matching result, determining a transition type of the first feature according to the matching result, and taking the transition type as a transition type corresponding to a corresponding second video segment;

in this embodiment, the method for acquiring the transition feature library (FEA-LIB) comprises the following steps: the gradient transition samples in example 1 were input to a type prediction model, the features of each gradient transition sample were obtained, the obtained features were used as second features, and a transition feature library (FEA-LIB) was constructed based on the second features.

Further, a gradual transition sample can be collected according to actual needs to conduct category expansion and data expansion on a transition feature library (FEA-LIB);

the category expansion method comprises the following steps: and collecting corresponding gradual transition samples based on the class to be expanded, inputting the gradual transition samples into a type prediction model to obtain corresponding features, and adding the features as second features into a transition feature library (FEA-LIB) so as to realize the expansion of the gradual transition class.

The data expansion method comprises the following steps: and collecting a gradual transition sample according to actual requirements, and extracting the characteristics of the gradual transition sample as second characteristics.

The method can further comprise the step of adding the first feature as a second feature to a transition feature library (FEA-LIB) when the transition type of the first feature belongs to the gradual transition type, so that the expansion of the transition feature library (FEA-LIB) is realized.

Further, performing similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, determining a transition type of the first feature according to the matching result, and taking the transition type as a transition type corresponding to a corresponding second video segment, wherein the specific steps are as follows:

The characteristic distance value may be a cosine distance value or a (1-cosine distance value).

When the characteristic distance value is a cosine distance value, the optimal characteristic distance value is a maximum characteristic distance value, and the verification condition is that the cosine distance value is larger than a preset distance threshold (such as 0.5).

When the characteristic distance value is (1-cosine distance value), the optimal characteristic distance value is the minimum characteristic distance value, and the verification condition is that the cosine distance value is smaller than a preset distance threshold (such as 0.5).

In the embodiment, the characteristics of the second video segment are extracted, the transition type of the second video segment is judged based on the obtained characteristics, the transition characteristic library can be expanded according to actual needs, in actual use, the identifiable transition type can be expanded according to needs, the expansibility is strong, and the application range is wide.

Embodiment 3, a gradual transition recognition system, as shown in fig. 2, includes a preprocessing module 100, a transition recognition module 200, a type prediction module 300, a positioning recognition module 400, and a model training module 500;

the preprocessing module 100 is configured to obtain a video to be detected, and traverse the video to be detected by using a preset sliding window to obtain a first video segment;

the transition identification module 200 is configured to perform transition identification on the first video segment based on a preset transition identification model, and further is configured to extract the first video segment that is identified as being transitioned to obtain a second video segment;

the type prediction module 300 is configured to predict a transition type of each second video segment based on a preset type prediction model, so as to obtain a transition type of each second video segment;

the positioning identification module 400 is configured to determine a transition zone in the video to be detected based on the transition type of each second video segment, and determine the transition type of the transition zone.

As shown in fig. 3, the model construction module 500 includes a first data processing unit 510, a second data processing unit 520, an identification model training unit 530, and a prediction model training unit 540;

the first data processing unit 510 is configured to collect sample video segments, determine whether the sample video segments include transition, take the sample video segments including transition as training positive samples, and take the sample video segments not including transition as training negative samples;

the recognition model training unit 530 is configured to obtain a transition recognition model by using the training positive sample and the training negative sample, where the transition recognition model is used to recognize whether the input first video segment is a transition;

the second data processing unit 520 is configured to label the training positive sample based on a transition type, and generate predicted training data;

the prediction model training unit 540 is configured to obtain a type prediction model by using the prediction training data, where an input of the type prediction model is a second video segment, and an output of the type prediction model is a transition type of the second video segment.

Further, the location identification module 400 is configured to:

This embodiment is an embodiment of the apparatus corresponding to embodiment 1, and since it is substantially similar to the method embodiment (embodiment 1), the description is relatively simple, and reference is made to a part of the description of the method embodiment (embodiment 1) for the relevant points.

Embodiment 4, the prediction model training unit 540 in embodiment 3 is modified to "obtain a type prediction model by training with the prediction training data, where the input of the type prediction model is a second video segment, and the output is a feature of the second video segment", and the rest is the same as embodiment 3.

As shown in fig. 4, the type prediction module 300 in this embodiment includes a feature extraction unit 310 and a similarity matching unit 320:

the feature extraction unit 310 is configured to perform feature extraction on the input second video segment by using a type prediction model, so as to obtain a first feature;

the similarity matching unit 320 is configured to perform similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, and determine a transition type of the first feature according to the matching result, and use the transition type as a transition type corresponding to a corresponding second video segment;

further, the similarity matching unit 320 is configured to:

This embodiment is an embodiment of the apparatus corresponding to embodiment 2, and since it is substantially similar to the method embodiment (embodiment 2), the description is relatively simple, and reference is made to a part of the description of the method embodiment (embodiment 2) for the relevant points.

Embodiment 5, a computer readable storage medium storing a computer program which when executed by a processor performs the steps of the method described in embodiment 1 or embodiment 2.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that:

reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, the appearances of the phrase "one embodiment" or "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

In addition, the specific embodiments described in the present specification may differ in terms of parts, shapes of components, names, and the like. All equivalent or simple changes of the structure, characteristics and principle according to the inventive concept are included in the protection scope of the present invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions in a similar manner without departing from the scope of the invention as defined in the accompanying claims.

Claims

1. The gradual transition identification method is characterized by comprising the following steps of:

determining a transition interval in the video to be detected based on the transition type of each second video segment, and determining the transition type of the transition interval, wherein the specific steps are as follows:

2. The method for recognizing a transition from gradual transition according to claim 1, wherein the method for acquiring the transition recognition model is as follows:

3. The gradual transition identification method according to claim 2, wherein the type prediction model is obtained by the following steps:

4. The method for recognizing a gradual transition according to claim 3, wherein when the output of the type prediction model is characteristic, predicting the transition type of each second video segment based on a preset type prediction model, the specific step of obtaining the transition type of each second video segment is as follows:

and performing similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, determining a transition type of the first feature according to the matching result, and taking the transition type as a transition type corresponding to a corresponding second video segment.

5. The gradual transition identification method according to claim 4, wherein the specific steps of performing similarity matching between the first feature and a second feature in a preset transition feature library to obtain a matching result, determining a transition type of the first feature according to the matching result, and taking the transition type as a transition type corresponding to a corresponding second video segment are as follows:

6. A gradual transition identification system, comprising:

the positioning identification module is used for determining a transition zone in the video to be detected based on the transition type of each second video segment and determining the transition type of the transition zone;

the location identification module is configured to:

7. The system for progressively changing over a recognition system according to claim 6, further comprising a model building module including a first data processing unit, a second data processing unit, a recognition model training unit, and a predictive model training unit;

8. A computer readable storage medium storing a computer program, characterized in that the program when executed by a processor implements the steps of the method of any one of claims 1 to 5.