CN111428589A - Identification method and system for transition - Google Patents

Identification method and system for transition Download PDF

Info

Publication number
CN111428589A
CN111428589A CN202010165457.3A CN202010165457A CN111428589A CN 111428589 A CN111428589 A CN 111428589A CN 202010165457 A CN202010165457 A CN 202010165457A CN 111428589 A CN111428589 A CN 111428589A
Authority
CN
China
Prior art keywords
transition
type
video
training
video clip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010165457.3A
Other languages
Chinese (zh)
Other versions
CN111428589B (en
Inventor
王灿进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinhua Fusion Media Technology Development Beijing Co ltd
Xinhua Zhiyun Technology Co ltd
Original Assignee
Xinhua Zhiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinhua Zhiyun Technology Co ltd filed Critical Xinhua Zhiyun Technology Co ltd
Priority to CN202010165457.3A priority Critical patent/CN111428589B/en
Publication of CN111428589A publication Critical patent/CN111428589A/en
Application granted granted Critical
Publication of CN111428589B publication Critical patent/CN111428589B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The invention discloses a method and a system for identifying transition, wherein the identification method comprises the following steps: acquiring a video to be detected, and traversing the video to be detected by using a preset sliding window to obtain a first video clip; carrying out transition recognition on the first video clip based on a preset transition recognition model, and extracting the first video clip with the transition recognized to obtain a second video clip; predicting the transition type of each second video segment based on a preset type prediction model to obtain the transition type of each second video segment; and determining a transition interval in the video to be detected based on the transition type of each second video clip, and determining the transition type of the transition interval. Compared with the existing transition recognition method based on color analysis, the method provided by the invention has the advantages that the gradual transition in the video to be detected is recognized by utilizing the deep learning technology, so that the error recognition under the adverse conditions of lens shaking, virtual focus and the like can be overcome, and the accuracy is improved.

Description

Identification method and system for transition
Technical Field
The invention relates to the field of image recognition, in particular to a method and a system for recognizing a gradual transition.
Background
The method and apparatus for identifying shot cuts, with application number CN201610687298.7, proposes: extracting key frames of a video to be detected at equal intervals, dividing the key frames into a plurality of sub-regions, and judging whether lens switching exists or not by calculating the weighted distance of color or brightness histograms of the sub-regions of different key frames;
the application number CN201410831291.9 provides a video shot switching detection method and device based on frame difference clustering: calculating the gray value difference of every two images in three continuous frames to generate a three-dimensional vector, mapping the three-dimensional vector into a point in a space coordinate system through a clustering device, setting a radius parameter to generate a ball, and switching the point in the ball with a lens;
as can be seen from the above, the transition detection is realized based on color analysis, but this detection method is not suitable for identifying and positioning the transition from frame to frame, and is susceptible to the influence of the shooting quality, such as mistaken determination of the lens shake as the transition.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a method and a system for identifying a transition.
In order to solve the technical problem, the invention is solved by the following technical scheme:
a method for identifying a gradual transition comprises the following steps:
acquiring a video to be detected, and traversing the video to be detected by using a preset sliding window to obtain a first video clip;
carrying out transition recognition on the first video clip based on a preset transition recognition model, and extracting the first video clip with the transition recognized to obtain a second video clip;
predicting the transition type of each second video segment based on a preset type prediction model to obtain the transition type of each second video segment;
and determining a transition interval in the video to be detected based on the transition type of each second video clip, and determining the transition type of the transition interval.
As an implementation manner, the method for acquiring the transition recognition model includes:
collecting a sample video clip, judging whether the sample video clip contains transition, taking the sample video clip containing the transition as a training positive sample, and taking the sample video clip not containing the transition as a training negative sample;
and training by using the training positive sample and the training negative sample to obtain a transition recognition model, wherein the transition recognition model is used for recognizing whether the input first video clip is a transition or not.
As an implementable manner, the type prediction model obtaining method includes:
labeling the training positive sample based on the transition type to generate prediction training data;
and training by utilizing the prediction training data to obtain a type prediction model, wherein the input of the type prediction model is a second video segment, and the output is the transition type of the second video segment or the characteristic of the second video segment.
As an implementable manner, when the output of the type prediction model is a feature, predicting the transition type of each second video segment based on a preset type prediction model, and obtaining the transition type of each second video segment specifically comprises the following steps:
taking the feature output by the type prediction model as a first feature;
similarity matching is carried out on the first features and second features in a preset transition feature library to obtain matching results, the transition type of the first features is determined according to the matching results, and the transition type is used as the transition type corresponding to the corresponding second video clip;
as an implementable manner, the specific steps of performing similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, determining a transition type of the first feature according to the matching result, and using the transition type as a transition type corresponding to a corresponding second video clip are as follows:
respectively calculating cosine distances between the first features and the second features, and generating feature distance values according to the cosine distances;
and acquiring an optimal characteristic distance value, judging whether the optimal characteristic distance value meets a preset check condition, taking a transition type of a second characteristic corresponding to the optimal characteristic distance value as a transition type of the first characteristic when the optimal characteristic distance value meets the preset check condition, and judging whether the transition type of the first characteristic is transition-free if not.
As an implementation manner, the specific steps of determining the transition interval in the video to be detected based on the transition type of each second video segment and determining the transition type of the transition interval include:
generating a prediction result array according to the transition type, wherein the prediction result array represents the transition type of each first video segment;
and eliminating the data with the transition type of no transition from the prediction result array to obtain at least one continuous interval, taking the continuous interval as the transition interval, and extracting the transition type with the maximum number in the transition interval as the transition type of the transition interval.
The invention also provides a system for identifying transition, which comprises:
the system comprises a preprocessing module, a first video clip and a second video clip, wherein the preprocessing module is used for acquiring a video to be detected and traversing the video to be detected by utilizing a preset sliding window to acquire a first video clip;
the transition recognition module is used for carrying out transition recognition on the first video clip based on a preset transition recognition model and extracting the first video clip with the transition recognized to obtain a second video clip;
the type prediction module is used for predicting the transition type of each second video segment based on a preset type prediction model to obtain the transition type of each second video segment;
and the positioning identification module is used for determining a transition interval in the video to be detected based on the transition type of each second video clip and determining the transition type of the transition interval.
As an implementable manner, the system further comprises a model building module, wherein the model building module comprises a first data processing unit, a second data processing unit, a recognition model training unit and a prediction model training unit;
the first data processing unit is used for collecting sample video clips, judging whether the sample video clips contain transitions or not, taking the sample video clips containing the transitions as training positive samples, and taking the sample video clips not containing the transitions as training negative samples;
the identification model training unit is used for training by using the training positive sample and the training negative sample to obtain a transition identification model, and the transition identification model is used for identifying whether the input first video clip is a transition or not;
the second data processing unit is used for labeling the training positive sample based on the transition type to generate prediction training data;
and the prediction model training unit is used for training by utilizing the prediction training data to obtain a type prediction model, wherein the input of the type prediction model is a second video segment, and the output of the type prediction model is the transition type of the second video segment or the characteristic of the second video segment.
As an implementable embodiment, the location identification module is configured to:
generating a prediction result array according to the transition type, wherein the prediction result array represents the transition type of each first video segment;
and eliminating the data with the transition type of no transition from the prediction result array to obtain at least one continuous interval, taking the continuous interval as the transition interval, and extracting the transition type with the maximum number in the transition interval as the transition type of the transition interval.
The invention also proposes a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of any of the methods described above.
Due to the adoption of the technical scheme, the invention has the remarkable technical effects that:
1. according to the method, a transition recognition model for recognizing whether the input video is in transition is preset by utilizing a deep learning technology, and a type prediction model for further predicting the transition type corresponding to the input video is also preset, so that transition can be effectively recognized, the specific type of transition can be further determined, the requirement of post-processing personnel on transition generation is met, and the requirement of the post-processing personnel on the basis of transition is met.
2. According to the method and the device, the features of the second video clip are extracted, and the similarity calculation is carried out on the obtained first features and the second features in the preset transition feature library, so that the transition type of the second video clip is determined, the transition type can be effectively identified, and the transition features can be conveniently expanded according to actual needs to realize the expansion of the transition type.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic workflow diagram of a method for identifying a transition point according to the present invention;
FIG. 2 is a schematic block diagram showing the connection of modules of the system for identifying a transition in embodiment 3;
FIG. 3 is a block diagram of the module connection of the model training module 500 of FIG. 2;
fig. 4 is a block diagram showing the connection of the type prediction block 300 according to embodiment 4.
In the figure:
100 is a preprocessing module, 200 is a transition recognition module, 300 is a type prediction module, 310 is a feature extraction unit, 320 is a similarity matching unit, 400 is a location recognition module, 500 is a model training module, 510 is a first data processing unit, 520 is a second data processing unit, 530 is a recognition model training unit, and 540 is a prediction model training unit.
Detailed Description
The present invention will be described in further detail with reference to examples, which are illustrative of the present invention and are not to be construed as being limited thereto.
Embodiment 1, a method for identifying a gradual transition, as shown in fig. 1, includes the following steps:
s100, acquiring a video to be detected, and traversing the video to be detected by using a preset sliding window to obtain a first video clip;
s200, transition recognition is carried out on the first video clip based on a preset transition recognition model, and the first video clip with the identified transition is extracted to obtain a second video clip; that is, the first video segment containing the transition is screened out using the transition recognition model.
S300, predicting the transition type of each second video clip based on a preset type prediction model to obtain the transition type of each second video clip;
the transition types include no transition and specific transition types (fade in and fade out, checkerboard, zoom, wipe, jaggy, etc.); that is, the first video segment including the transition obtained by filtering in step S200 is further predicted, whether it includes the transition is determined, and a specific transition type is predicted.
S400, determining a transition interval in the video to be detected based on the transition type of each second video clip, and determining the transition type of the transition interval.
The existing transition identification method is usually based on image gray data or color difference data to judge whether lens switching occurs, so that the identification effect of gradual transition of frame-by-frame transition aiming at color/gray is poor, the existing transition identification method depends on a preset judgment threshold, and the method for setting double thresholds is usually adopted for gradual transition.
In this embodiment, a deep learning technology is utilized, a transition recognition model for recognizing whether an input video is a gradual transition is preset, and a type prediction model for further predicting a transition type corresponding to the input video is also preset, so that not only can a gradual transition be recognized effectively, but also a specific type of the gradual transition can be further determined, and when the requirement for recognizing the gradual transition is met, the requirement of post-processing personnel based on the gradual transition is further facilitated, such as special effect design or application analysis of the gradual transition.
In the step S100, the specific steps of obtaining a video to be detected, traversing the video to be detected by using a preset sliding window, and obtaining a first video clip include:
decoding a video to be detected to obtain a video sequence with the total length of L;
the length N and the step length S of the sliding window are configured according to actual needs, the constructed sliding window is utilized to traverse the video sequence, and a segment C consisting of N video frames, namely a first video segment, is taken out from the video sequence each time.
One skilled in the relevant art may actually need to configure the length N and the step size S of the sliding window by itself.
In this embodiment, a video to be detected is segmented in advance, and then transition recognition is performed on each segment of the first video segment, so that on one hand, the recognition accuracy is improved, and on the other hand, the transition interval in the video to be detected is conveniently located.
Further, the method for acquiring the transition recognition model in step S200 includes:
a1, collecting sample video clips, judging whether the sample video clips contain transitions, taking the sample video clips containing the transitions as training positive samples, and taking the sample video clips not containing the transitions as training negative samples;
and A2, training by using the training positive samples and the training negative samples to obtain a transition recognition model, wherein the transition recognition model is used for recognizing whether the input first video clip is a transition.
The transition recognition model is a two-classification model, and the output result is whether transition exists or not.
In this embodiment, an initial transition recognition model may be determined based on any one of the video classification networks, the initial transition recognition model is trained by using the training positive samples and the training negative samples, a plurality of corresponding intermediate models (models output when a loss curve tends to be stable) are obtained, and the intermediate model with the highest accuracy is selected as the transition recognition model to be output. Those skilled in the art can freely select a video classification model according to actual needs, and can implement binary classification of video segments, for example, a two-stream network or a 3D CNN network that are already disclosed in the prior art can be used.
Further, the method for collecting the sample video clips in the step a1 is as follows:
collecting an original video (e.g., a video downloaded from the internet) containing transitions, labeling a start frame, an end frame and a corresponding transition type of the transition in the original video, and obtaining a gradual transition sample.
Randomly sampling N frames of continuous frames in training video to obtain segment Ctrain,Obtaining a fragment Ctrain,Namely the sample video clip;
and calculating the contact ratio of the sample video clip and the corresponding transition sample, judging the sample video clip to be a training positive sample when the contact ratio reaches a preset contact ratio threshold value, and otherwise, judging the sample video clip to be a training negative sample.
The overlap ratio is calculated by the sample video segment including the first non-overlapping segment L1 and the overlapping segment L2, and the corresponding transition sample includes the second non-overlapping segment L03 and the overlapping segment L12, and then the ratio of the overlapping segment L2 to the total length (L1 + L2 + L3) of the sample video segment and the transition sample is calculated as the overlap ratio, that is, the overlap ratio is L2/(L1 + L2 + L3).
A person skilled in the relevant art can set the overlap ratio threshold according to actual needs, where the overlap ratio threshold is 50% in this embodiment, that is, when the overlap ratio calculated by a certain sample video clip reaches 50%, it is determined that the sample is a training positive sample.
Further, an original video can be generated according to a preset writing rule, and the specific method comprises the following steps:
acquiring a first shot section and a second shot section, taking the last frame of the first shot section as the starting frame of transition, and taking the first frame of the second shot section as the ending frame of transition;
and acquiring a transition generation rule, wherein the transition generation rule comprises a transition length and a transition type, generating a filling frame based on the transition generation rule, and adding the filling frame between the initial frame and the end frame to acquire an original video. The above fade transition types include, but are not limited to, fade, checkerboard, zoom, wipe, and jaggy.
Further, the sample video segment not containing the transition can be directly used as the training negative sample.
In order to enhance the stability and generalization capability of the transition recognition model, the training negative samples should be diversified enough, including various complex scenes such as conventional shooting, fast moving of a lens, virtual focus, dim light, strong light and the like, so as to effectively reduce the false alarm of difficult scenes therein.
Further, the method for acquiring the transition recognition model in step S300 includes:
a3, labeling the training positive sample based on the transition type to generate prediction training data;
namely, the transition type of the gradual transition sample corresponding to the training positive sample is used as the transition type of the training positive sample, that is, the training positive sample only labels the gradual transition type, such as fade-in fade-out, checkerboard, zoom, wipe, and sawtooth.
And A4, training by using the prediction training data to obtain a type prediction model, wherein the input of the type prediction model is a second video segment, and the output is the transition type of the second video segment.
The type prediction model is a multi-classification model, and the output result is a gradual transition type.
In this embodiment, an initial type prediction model may be determined based on any video classification network, the initial type prediction model may be trained by using prediction training data, a plurality of corresponding intermediate models (output models when a loss curve tends to be stable) may be obtained, and the intermediate model with the highest accuracy may be selected as the type prediction model to be output. Those skilled in the art can freely select the video classification according to actual needs, and can realize multi-classification of video segments, for example, a two stream network or a 3D CNN network which are available for disclosure.
Note that in the process of training the transition recognition model and the type prediction model, classification accuracy can be improved by adopting methods such as cascading grouping and increasing the error weight of the difficultly-classified samples, wherein the method of cascading grouping is to divide input into a plurality of large classes with obvious class boundary, then each large class is subdivided into a plurality of small classes, the method of increasing the weight of the difficultly-classified samples is to give a larger weight to the wrongly-classified samples with smaller loss values in the training process, and the error weights of cascading grouping and increasing the difficultly-classified samples are all the existing conventional technologies, so detailed description thereof is not given in the specification.
According to the method, the transition recognition model and the type prediction model are trained, the transition recognition is realized by utilizing a machine learning-based technology, the mistaken recognition of the transition under the adverse conditions of lens shaking, virtual focus and the like can be effectively avoided, and the accuracy and the recall rate of the transition recognition are effectively improved.
Further, the step S400 of determining the transition interval in the video to be detected based on the transition type of each second video segment includes the specific steps of:
s401, generating a prediction result array according to the transition types, wherein the prediction result array represents the transition types of the first video segments, namely, for the video sequence with the length of L, the prediction result array RES comprises ((L-N)/S) +1 transition types.
In this embodiment, the transition type corresponding to the training positive sample is labeled in an encoding manner (for example, one-hot encoding may be adopted), and at this time, the type prediction model outputs the encoding corresponding to the predicted transition type.
For example, the non-transition is recorded as 0, and each transition type is coded sequentially (fade-in and fade-out is recorded as 1, checkerboard is recorded as 2, and zoom is recorded as 3);
and sequentially acquiring the codes of the transition types corresponding to the first video segments, and generating a prediction result array RES, wherein each data in the prediction result array represents the transition type of the first video segment corresponding to the data one by one.
Note that, since the second video segment is a video segment including a transition in the first video segment, the prediction result array RES can be generated only according to the transition type of the second video segment.
S402, eliminating the data with the transition type of no transition from the prediction result array, obtaining at least one continuous interval, taking the continuous interval as the transition interval, and extracting the transition type with the maximum number in the transition interval as the transition type of the transition interval.
Namely, traversing the prediction result array RES according to the sequence from front to back to reach a first non-zero position i, continuously searching backwards until the first zero position j, wherein the zero position indicates that the transition type of the corresponding first video segment is no transition, sequence numbers i to j-1 are transition intervals, and merging the first video segments corresponding to the sequence numbers i to j-1 to obtain the corresponding gradual transition segments.
Counting the values from RES [ i ] to RES [ j-1], and obtaining the transition type with the largest number as the transition type of the transition section, for example, if i is 5, j is 9, RES [ i ] to RES [ j-1] is [3,3,2,3], then the transition type of the transition section is the transition type (zoom) corresponding to number 3.
As can be seen from the above, the present embodiment can locate the start and end positions of the gradual transition in the video to be detected, and can also identify the specific type of the gradual transition.
Embodiment 2, the output of the type prediction model in embodiment 1 is changed from "transition type of the second video segment" to "feature of the second video segment", and the rest is the same as embodiment 1.
That is, in the present embodiment, the type prediction model is changed from the multi-classification model to the feature extractor, but since the multi-classification model includes the feature extractor, and the probability that the input second video clip belongs to each transition type is calculated according to the features extracted by the feature extractor, so as to output the corresponding transition type, in the present embodiment, only the softmax layer of the type prediction model in embodiment 1 needs to be removed.
In this embodiment, the step S300 of predicting the transition type of each second video segment based on a preset type prediction model includes the specific steps of:
taking the feature output by the type prediction model as a first Feature (FEA);
carrying out similarity matching on the first Feature (FEA) and a second feature in a preset transition feature library (FEA-L IB) to obtain a matching result, determining the transition type of the first feature according to the matching result, and taking the transition type as the transition type corresponding to the corresponding second video clip;
in this embodiment, the method for obtaining the transition feature library (FEA-L IB) includes inputting the gradual transition samples in embodiment 1 into a type prediction model, obtaining features of each gradual transition sample, using the obtained features as second features, and constructing the transition feature library (FEA-L IB) based on the second features.
Furthermore, a transition feature library (FEA-L IB) can be subjected to class expansion and data expansion by collecting transition samples according to actual needs;
the method for the category expansion comprises the steps of collecting corresponding transition samples based on categories to be expanded, inputting the transition samples to a type prediction model to obtain corresponding features, and adding the features serving as second features into a transition feature library (FEA-L IB), so that the expansion of the transition categories is realized.
The data expansion method comprises the following steps: and collecting a gradual transition sample according to actual needs, and extracting the characteristics of the gradual transition sample as second characteristics.
Extension of the transition feature library (FEA-L IB) may also be achieved by adding a first feature as a second feature to the transition feature library (FEA-L IB) when the transition type of the first feature is of a gradual transition type.
Further, the specific steps of performing similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, determining a transition type of the first feature according to the matching result, and using the transition type as a transition type corresponding to a corresponding second video clip are as follows:
respectively calculating cosine distances between the first features and the second features, and generating feature distance values according to the cosine distances;
and acquiring an optimal characteristic distance value, judging whether the optimal characteristic distance value meets a preset check condition, taking a transition type of a second characteristic corresponding to the optimal characteristic distance value as a transition type of the first characteristic when the optimal characteristic distance value meets the preset check condition, and judging whether the transition type of the first characteristic is transition-free if not.
The characteristic distance value may be a cosine distance value or a (1-cosine distance value).
When the characteristic distance value is a cosine distance value, the optimal characteristic distance value is a maximum characteristic distance value, and the verification condition is that the cosine distance value is greater than a preset distance threshold (such as 0.5).
When the feature distance value is (1-cosine distance value), the optimal feature distance value is the minimum feature distance value, and the verification condition is that the cosine distance value is smaller than a preset distance threshold (such as 0.5).
In this embodiment, the features of the second video clip are extracted, the transition type of the second video clip is determined based on the obtained features, the transition feature library can be expanded according to actual needs, and in actual use, the recognizable transition type can be expanded according to needs, so that the expansibility is strong, and the application range is wide.
Embodiment 3, a system for identifying a gradual transition, as shown in fig. 2, includes a preprocessing module 100, a transition identifying module 200, a type predicting module 300, a location identifying module 400, and a model training module 500;
the preprocessing module 100 is configured to acquire a video to be detected, and traverse the video to be detected by using a preset sliding window to acquire a first video clip;
the transition recognition module 200 is configured to perform transition recognition on the first video segment based on a preset transition recognition model, and further configured to extract the first video segment with a transition recognized, so as to obtain a second video segment;
the type prediction module 300 is configured to predict a transition type of each second video segment based on a preset type prediction model, and obtain the transition type of each second video segment;
the positioning and identifying module 400 is configured to determine a transition interval in the video to be detected based on the transition type of each second video segment, and determine the transition type of the transition interval.
As shown in fig. 3, the model construction module 500 includes a first data processing unit 510, a second data processing unit 520, a recognition model training unit 530, and a prediction model training unit 540;
the first data processing unit 510 is configured to collect a sample video clip, determine whether the sample video clip contains a transition, use the sample video clip containing the transition as a training positive sample, and use the sample video clip not containing the transition as a training negative sample;
the identification model training unit 530 is configured to train with the training positive sample and the training negative sample to obtain a transition identification model, where the transition identification model is configured to identify whether the input first video segment is a transition;
the second data processing unit 520 is configured to label the training positive sample based on a transition type to generate predicted training data;
the prediction model training unit 540 is configured to train with the prediction training data to obtain a type prediction model, where an input of the type prediction model is a second video segment, and an output of the type prediction model is a transition type of the second video segment.
Further, the location identification module 400 is configured to:
generating a prediction result array according to the transition type, wherein the prediction result array represents the transition type of each first video segment;
and eliminating the data with the transition type of no transition from the prediction result array to obtain at least one continuous interval, taking the continuous interval as the transition interval, and extracting the transition type with the maximum number in the transition interval as the transition type of the transition interval.
This embodiment is an embodiment of the apparatus corresponding to embodiment 1, and since it is basically similar to the embodiment of the method (embodiment 1), the description is relatively simple, and for the relevant points, refer to the partial description of the embodiment of the method (embodiment 1).
Embodiment 4, the prediction model training unit 540 in embodiment 3 is modified to "obtain a type prediction model by training using the prediction training data, the input of the type prediction model is a second video segment, and the output is a feature of the second video segment", and the rest is the same as in embodiment 3.
As shown in fig. 4, the type prediction module 300 in the present embodiment includes a feature extraction unit 310 and a similarity matching unit 320:
the feature extraction unit 310 is configured to perform feature extraction on the input second video segment by using a type prediction model to obtain a first feature;
the similarity matching unit 320 is configured to perform similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, and is further configured to determine a transition type of the first feature according to the matching result, and use the transition type as a transition type corresponding to a corresponding second video clip;
further, the similarity matching unit 320 is configured to:
respectively calculating cosine distances between the first features and the second features, and generating feature distance values according to the cosine distances;
and acquiring an optimal characteristic distance value, judging whether the optimal characteristic distance value meets a preset check condition, taking a transition type of a second characteristic corresponding to the optimal characteristic distance value as a transition type of the first characteristic when the optimal characteristic distance value meets the preset check condition, and judging whether the transition type of the first characteristic is transition-free if not.
This embodiment is an embodiment of an apparatus corresponding to embodiment 2, and since it is basically similar to the method embodiment (embodiment 2), the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment (embodiment 2).
Embodiment 5 is a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method of embodiment 1 or embodiment 2.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that:
reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, the appearances of the phrase "one embodiment" or "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
In addition, it should be noted that the specific embodiments described in the present specification may differ in the shape of the components, the names of the components, and the like. All equivalent or simple changes of the structure, the characteristics and the principle of the invention which are described in the patent conception of the invention are included in the protection scope of the patent of the invention. Various modifications, additions and substitutions for the specific embodiments described may be made by those skilled in the art without departing from the scope of the invention as defined in the accompanying claims.

Claims (10)

1. A method for identifying a gradual transition is characterized by comprising the following steps:
acquiring a video to be detected, and traversing the video to be detected by using a preset sliding window to obtain a first video clip;
carrying out transition recognition on the first video clip based on a preset transition recognition model, and extracting the first video clip with the transition recognized to obtain a second video clip;
predicting the transition type of each second video segment based on a preset type prediction model to obtain the transition type of each second video segment;
and determining a transition interval in the video to be detected based on the transition type of each second video clip, and determining the transition type of the transition interval.
2. The identification method of gradual transition according to claim 1, wherein the transition identification model is obtained by:
collecting a sample video clip, judging whether the sample video clip contains transition, taking the sample video clip containing the transition as a training positive sample, and taking the sample video clip not containing the transition as a training negative sample;
and training by using the training positive sample and the training negative sample to obtain a transition recognition model, wherein the transition recognition model is used for recognizing whether the input first video clip is a transition or not.
3. The method for identifying gradual transitions according to claim 2, wherein the type prediction model is obtained by:
labeling the training positive sample based on the transition type to generate prediction training data;
and training by utilizing the prediction training data to obtain a type prediction model, wherein the input of the type prediction model is a second video segment, and the output is the transition type of the second video segment or the characteristic of the second video segment.
4. The method for identifying gradual transition as claimed in claim 3, wherein when the output of the type prediction model is a feature, the step of predicting the transition type of each second video segment based on the preset type prediction model to obtain the transition type of each second video segment comprises:
taking the feature output by the type prediction model as a first feature;
and performing similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, determining the transition type of the first feature according to the matching result, and taking the transition type as the transition type corresponding to the corresponding second video clip.
5. The gradual transition recognition method according to claim 4, wherein the specific steps of performing similarity matching on the first feature and a second feature in a preset transition feature library to obtain a matching result, determining a transition type of the first feature according to the matching result, and using the transition type as a transition type corresponding to a corresponding second video segment are as follows:
respectively calculating cosine distances between the first features and the second features, and generating feature distance values according to the cosine distances;
and acquiring an optimal characteristic distance value, judging whether the optimal characteristic distance value meets a preset check condition, taking a transition type of a second characteristic corresponding to the optimal characteristic distance value as a transition type of the first characteristic when the optimal characteristic distance value meets the preset check condition, and judging whether the transition type of the first characteristic is transition-free if not.
6. The method for identifying gradual transition according to any one of claims 1 to 5, wherein the specific steps of determining the transition interval in the video to be detected based on the transition type of each second video segment and determining the transition type of the transition interval are as follows:
generating a prediction result array according to the transition type, wherein the prediction result array represents the transition type of each first video segment;
and eliminating the data with the transition type of no transition from the prediction result array to obtain at least one continuous interval, taking the continuous interval as the transition interval, and extracting the transition type with the maximum number in the transition interval as the transition type of the transition interval.
7. A system for identifying a gradual transition, comprising:
the system comprises a preprocessing module, a first video clip and a second video clip, wherein the preprocessing module is used for acquiring a video to be detected and traversing the video to be detected by utilizing a preset sliding window to acquire a first video clip;
the transition recognition module is used for carrying out transition recognition on the first video clip based on a preset transition recognition model and extracting the first video clip with the transition recognized to obtain a second video clip;
the type prediction module is used for predicting the transition type of each second video segment based on a preset type prediction model to obtain the transition type of each second video segment;
and the positioning identification module is used for determining a transition interval in the video to be detected based on the transition type of each second video clip and determining the transition type of the transition interval.
8. The system for identifying gradual transitions of claim 7, further comprising a model building module comprising a first data processing unit, a second data processing unit, an identification model training unit, and a prediction model training unit;
the first data processing unit is used for collecting sample video clips, judging whether the sample video clips contain transitions or not, taking the sample video clips containing the transitions as training positive samples, and taking the sample video clips not containing the transitions as training negative samples;
the identification model training unit is used for training by using the training positive sample and the training negative sample to obtain a transition identification model, and the transition identification model is used for identifying whether the input first video clip is a transition or not;
the second data processing unit is used for labeling the training positive sample based on the transition type to generate prediction training data;
and the prediction model training unit is used for training by utilizing the prediction training data to obtain a type prediction model, wherein the input of the type prediction model is a second video segment, and the output of the type prediction model is the transition type of the second video segment or the characteristic of the second video segment.
9. The system for identifying gradual transitions as claimed in claim 7 or 8, wherein the location identification module is configured to:
generating a prediction result array according to the transition type, wherein the prediction result array represents the transition type of each first video segment;
and eliminating the data with the transition type of no transition from the prediction result array to obtain at least one continuous interval, taking the continuous interval as the transition interval, and extracting the transition type with the maximum number in the transition interval as the transition type of the transition interval.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN202010165457.3A 2020-03-11 2020-03-11 Gradual transition identification method and system Active CN111428589B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010165457.3A CN111428589B (en) 2020-03-11 2020-03-11 Gradual transition identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010165457.3A CN111428589B (en) 2020-03-11 2020-03-11 Gradual transition identification method and system

Publications (2)

Publication Number Publication Date
CN111428589A true CN111428589A (en) 2020-07-17
CN111428589B CN111428589B (en) 2023-05-30

Family

ID=71547708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010165457.3A Active CN111428589B (en) 2020-03-11 2020-03-11 Gradual transition identification method and system

Country Status (1)

Country Link
CN (1) CN111428589B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439482A (en) * 2022-11-09 2022-12-06 荣耀终端有限公司 Transition detection method and related equipment thereof
CN117132925A (en) * 2023-10-26 2023-11-28 成都索贝数码科技股份有限公司 Intelligent stadium method and device for sports event

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753975A (en) * 2019-02-02 2019-05-14 杭州睿琪软件有限公司 Training sample obtaining method and device, electronic equipment and storage medium
CN110263729A (en) * 2019-06-24 2019-09-20 腾讯科技(深圳)有限公司 A kind of method of shot boundary detector, model training method and relevant apparatus
CN110830734A (en) * 2019-10-30 2020-02-21 新华智云科技有限公司 Abrupt change and gradual change lens switching identification method
CN110856042A (en) * 2019-11-18 2020-02-28 腾讯科技(深圳)有限公司 Video playing method and device, computer readable storage medium and computer equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753975A (en) * 2019-02-02 2019-05-14 杭州睿琪软件有限公司 Training sample obtaining method and device, electronic equipment and storage medium
CN110263729A (en) * 2019-06-24 2019-09-20 腾讯科技(深圳)有限公司 A kind of method of shot boundary detector, model training method and relevant apparatus
CN110830734A (en) * 2019-10-30 2020-02-21 新华智云科技有限公司 Abrupt change and gradual change lens switching identification method
CN110856042A (en) * 2019-11-18 2020-02-28 腾讯科技(深圳)有限公司 Video playing method and device, computer readable storage medium and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
戴小文;魏志强;苟先太;: "基于改进BEMD的视频镜头转场检测算法研究" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439482A (en) * 2022-11-09 2022-12-06 荣耀终端有限公司 Transition detection method and related equipment thereof
CN117132925A (en) * 2023-10-26 2023-11-28 成都索贝数码科技股份有限公司 Intelligent stadium method and device for sports event
CN117132925B (en) * 2023-10-26 2024-02-06 成都索贝数码科技股份有限公司 Intelligent stadium method and device for sports event

Also Published As

Publication number Publication date
CN111428589B (en) 2023-05-30

Similar Documents

Publication Publication Date Title
CN111611847B (en) Video motion detection method based on scale attention hole convolution network
CN113065474B (en) Behavior recognition method and device and computer equipment
CN109063611B (en) Face recognition result processing method and device based on video semantics
CN110795595A (en) Video structured storage method, device, equipment and medium based on edge calculation
CN107358141B (en) Data identification method and device
CN110298297A (en) Flame identification method and device
CN110414367B (en) Time sequence behavior detection method based on GAN and SSN
CN112329656B (en) Feature extraction method for human action key frame in video stream
CN107341508B (en) Fast food picture identification method and system
CN112733660B (en) Method and device for splitting video strip
CN110991397B (en) Travel direction determining method and related equipment
WO2014193220A2 (en) System and method for multiple license plates identification
CN110853074A (en) Video target detection network system for enhancing target by utilizing optical flow
CN111428589B (en) Gradual transition identification method and system
CN115062186B (en) Video content retrieval method, device, equipment and storage medium
CN116030396B (en) Accurate segmentation method for video structured extraction
CN110309720A (en) Video detecting method, device, electronic equipment and computer-readable medium
CN114041165A (en) Video similarity detection method, device and equipment
CN111027555A (en) License plate recognition method and device and electronic equipment
CN113553952A (en) Abnormal behavior recognition method and device, equipment, storage medium and program product
CN116095363A (en) Mobile terminal short video highlight moment editing method based on key behavior recognition
CN115713731A (en) Crowd scene pedestrian detection model construction method and crowd scene pedestrian detection method
CN111832351A (en) Event detection method and device and computer equipment
CN115424253A (en) License plate recognition method and device, electronic equipment and storage medium
CN114937248A (en) Vehicle tracking method and device for cross-camera, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20221215

Address after: Room 430, cultural center, 460 Wenyi West Road, Xihu District, Hangzhou City, Zhejiang Province, 310012

Applicant after: XINHUA ZHIYUN TECHNOLOGY Co.,Ltd.

Applicant after: Xinhua fusion media technology development (Beijing) Co.,Ltd.

Address before: Room 430, cultural center, 460 Wenyi West Road, Xihu District, Hangzhou City, Zhejiang Province, 310012

Applicant before: XINHUA ZHIYUN TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant