CN116258995A

CN116258995A - Video transition identification method, device, computing equipment and computer storage medium

Info

Publication number: CN116258995A
Application number: CN202310323901.3A
Authority: CN
Inventors: 李诗琪; 汤然; 成超; 屈振宇
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2023-03-29
Filing date: 2023-03-29
Publication date: 2023-06-13

Abstract

The embodiment of the invention discloses a video transition identification method, a video transition identification device, a computing device and a computer storage medium. The method comprises the following steps: determining a first previous frame and a second previous frame of the target frame from the first N video frames of the target frame; wherein the first previous frame of the target frame is a previous video frame of the target frame, and the second previous frame of the target frame is a video frame except the first previous frame in the previous N video frames; calculating a first correlation between the target frame and the first previous frame, and calculating a second correlation between the target frame and each second previous frame; and determining whether the target frame is a transition frame according to the difference between the first correlation and each second correlation. By adopting the scheme, whether soft transition occurs can be accurately identified, and the video transition identification precision is improved; in addition, the implementation process of the scheme is simple and feasible, and the execution efficiency is high.

Description

Video transition identification method, device, computing equipment and computer storage medium

Technical Field

The embodiment of the invention relates to the technical field of video processing, in particular to a video transition identification method, a device, computing equipment and a computer storage medium.

Background

Scene transitions are typically involved in video, and the process of transitioning from one video scene to another in video is referred to as video transitions. The recognition of video transition can provide a basis for processing such as segmentation and recombination of video, and therefore the recognition of video transition has very important significance in the field of video processing.

The method for identifying video transition in the prior art is based on a similarity identification method of adjacent frames, and the method is to calculate the similarity of two adjacent frames, and determine that video transition occurs between the two frames when the similarity is lower than a set threshold value.

However, the inventors found in practice that the following drawbacks exist in the prior art: existing video transitions typically include both hard and soft modes. Hard transition refers to a video processing mode that videos of two different scenes are directly connected without processing; soft transition refers to a video processing mode in which two videos of different scenes are subjected to transition linking (such as overlapping processing, fade-in and fade-out processing, etc.). In soft transition, due to the fact that the similarity of two adjacent frames is high, the soft transition cannot be accurately identified by the video transition identification method in the prior art, and the video transition identification precision is low. Meanwhile, the similarity of two adjacent frames possibly caused by factors such as lens movement in the same video scene segment is low, and the two frames can be mistakenly identified as transition by adopting the prior art.

Disclosure of Invention

In view of the technical problem of low video transition recognition accuracy in the prior art, embodiments of the present invention are provided to provide a video transition recognition method, apparatus, computing device and computer storage medium, which overcome or at least partially solve the above-mentioned problems.

According to a first aspect of an embodiment of the present invention, there is provided a video transition identification method, including:

determining a first previous frame and a second previous frame of the target frame from the first N video frames of the target frame; wherein the first previous frame of the target frame is a previous video frame of the target frame, and the second previous frame of the target frame is a video frame other than the first previous frame among the previous N video frames;

calculating a first correlation between the target frame and the first previous frame, and calculating a second correlation between the target frame and each second previous frame;

and determining whether the target frame is a transition frame according to the difference between the first correlation and each second correlation.

In an optional embodiment, the determining whether the target frame is a transition frame according to the difference between the first correlation and each second correlation further includes:

calculating the difference between the first correlation and each second correlation,

A maximum difference value is determined, and whether the target frame is a transition frame is determined based on the maximum difference value.

In an optional embodiment, the determining whether the target frame is a transition frame based on the maximum difference value further includes:

and if the maximum difference value is larger than a first preset threshold value, determining that the target frame is a transition frame.

determining a target second previous frame corresponding to the maximum difference value;

and if the correlation degree of the image content of the target frame and the image content of the target second previous frame are smaller than a second preset threshold value, and the correlation degree of the gray level histogram of the target frame and the gray level histogram of the target second previous frame are smaller than a third preset threshold value, determining that the target frame is a transition frame.

and if the correlation degree of the gray level histogram of the target frame and the gray level histogram of the second previous frame of the target is smaller than a fourth preset threshold value, determining that the target frame is a transition frame.

And determining whether the target frame is a transition frame of the soft transition mode according to the difference between the first correlation and each second correlation.

In an alternative embodiment, the method further comprises:

calculating a third correlation degree between a previous video frame of the first previous frame and the first previous frame;

calculating a difference between the third phase Guan Du and the first correlation;

if the difference between the third phase Guan Du and the first correlation is greater than the fifth preset threshold, the target frame is determined to be a transition frame.

In an optional embodiment, if the difference between the third phase Guan Du and the first correlation is greater than the fifth preset threshold, determining that the target frame is the transition frame further includes:

if the difference between the third phase Guan Du and the first correlation is greater than the fifth preset threshold, determining that the target frame is a transition frame in the hard transition mode.

In an alternative embodiment, after determining that the target frame is a transition frame, the method further comprises:

calculating a first frame number difference between the target frame and the nearest real transition frame;

if the first frame number difference is larger than a first preset frame number threshold value, determining that the target frame is a real transition frame;

and if the first frame number difference is smaller than or equal to a first preset frame number threshold value, determining that the target frame is a buffer transition frame.

In an alternative embodiment, the method further comprises:

calculating a second frame number difference between the target frame and the nearest buffered transition frame;

if the second frame number difference is larger than a second preset frame number threshold value, determining that the target frame is a real transition frame;

and if the second frame number difference is smaller than or equal to a second preset frame number threshold value, determining that the target frame is a buffer transition frame.

In an alternative embodiment, the buffered transition frame is a transition frame in fast mirror mode.

In an alternative embodiment, before the determining the first previous frame and the second previous frame of the target frame from the first N video frames of the target frame, the method further includes: caching N continuous video frames in a cache region; taking the next video frame of the video frame with the largest frame sequence number in the buffer area as a target frame;

the determining the first previous frame and the second previous frame of the target frame from the first N video frames of the target frame further comprises: and determining a first previous frame and a second previous frame of the target frame from the N video frames cached in the cache area.

According to a second aspect of an embodiment of the present invention, there is provided a video transition identifying apparatus, including:

The first determining module is used for determining a first previous frame and a second previous frame of the target frame from the first N video frames of the target frame; wherein the first previous frame of the target frame is a previous video frame of the target frame, and the second previous frame of the target frame is a video frame other than the first previous frame among the previous N video frames;

a calculating module, configured to calculate a first correlation between the target frame and a first previous frame, and calculate a second correlation between the target frame and each second previous frame;

and the second determining module is used for determining whether the target frame is a transition frame according to the difference between the first correlation degree and each second correlation degree.

In an alternative embodiment, the second determining module is configured to: calculating the difference between the first correlation and each second correlation,

In an alternative embodiment, the second determining module is configured to: and if the maximum difference value is larger than a first preset threshold value, determining that the target frame is a transition frame.

In an alternative embodiment, the second determining module is configured to: determining a target second previous frame corresponding to the maximum difference value;

In an alternative embodiment, the second determining module is configured to: and determining whether the target frame is a transition frame of the soft transition mode according to the difference between the first correlation and each second correlation.

In an alternative embodiment, the computing module is configured to: calculating a third correlation degree between a previous video frame of the first previous frame and the first previous frame; calculating a difference between the third phase Guan Du and the first correlation;

the second determining module is used for: if the difference between the third phase Guan Du and the first correlation is greater than the fifth preset threshold, the target frame is determined to be a transition frame.

In an alternative embodiment, the second determining module is configured to: if the difference between the third phase Guan Du and the first correlation is greater than the fifth preset threshold, determining that the target frame is a transition frame in the hard transition mode.

In an alternative embodiment, the computing module is configured to: calculating a first frame number difference between the target frame and the nearest real transition frame;

The second determining module is used for: if the first frame number difference is larger than a first preset frame number threshold value, determining that the target frame is a real transition frame; and if the first frame number difference is smaller than or equal to a first preset frame number threshold value, determining that the target frame is a buffer transition frame.

In an alternative embodiment, the computing module is configured to: calculating a second frame number difference between the target frame and the nearest buffered transition frame;

the second determining module is used for: if the second frame number difference is larger than a second preset frame number threshold value, determining that the target frame is a real transition frame; and if the second frame number difference is smaller than or equal to a second preset frame number threshold value, determining that the target frame is a buffer transition frame.

In an alternative embodiment, the first determining module is configured to: caching N continuous video frames in a cache region; taking the next video frame of the video frame with the largest frame sequence number in the buffer area as a target frame;

and determining a first previous frame and a second previous frame of the target frame from the N video frames cached in the cache area.

According to a third aspect of embodiments of the present invention, there is provided a computing device comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

The memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the video transition identification method.

According to a fourth aspect of the embodiments of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the video transition identification method described above.

The embodiment of the invention divides the first N video frames of the target frame into a first previous frame and a second previous frame of the target frame according to the degree of proximity to the target frame based on the first N video frames of the target frame, calculates a first correlation degree between the target frame and the first previous frame and calculates a second correlation degree between the target frame and each second previous frame; and determining whether the target frame is a transition frame according to the difference between the first correlation and each second correlation. By adopting the scheme, whether soft transition occurs can be accurately identified, and the video transition identification precision is improved; moreover, the implementation process of the scheme is simple and feasible, and the execution efficiency is high.

According to the embodiment of the invention, the difference value between the first correlation degree and each second correlation degree is calculated, and whether the target frame is the transition frame is determined based on the maximum difference value, so that the transition frame in soft transition can be accurately identified.

According to the embodiment of the invention, whether the target frame is the transition frame is determined according to the comparison result of the maximum difference value and the first preset threshold value, so that the target frame can be determined to be the transition frame under the condition that the maximum difference value is maximum, and the recognition efficiency of the transition frame is improved.

The embodiment of the invention determines the target second previous frame corresponding to the maximum difference value, and determines whether the target frame is a transition frame according to the image content correlation degree and the gray histogram correlation degree of the target frame and the target second previous frame, thereby improving the recognition precision of the transition frame.

According to the embodiment of the invention, the target second previous frame corresponding to the maximum difference value is determined, and whether the target frame is a transition frame or not is determined according to the gray histogram correlation degree of the target frame and the target second previous frame, so that the recognition efficiency of the transition frame is improved.

According to the embodiment of the invention, the third correlation degree between the previous video frame of the first previous frame and the first previous frame is calculated, and whether the target frame is a transition frame is determined according to the difference value between the third phase Guan Du and the first correlation degree, so that the transition frame of the hard transition mode can be accurately identified, and the video transition identification precision is improved.

According to the embodiment of the invention, whether the target frame is a transition frame of a soft transition mode is determined according to the difference between the first correlation and each second correlation, and whether the target frame is a transition frame of a hard transition mode is determined according to the difference between the third phase Guan Du and the first correlation, so that the type of the transition frame can be determined on the basis of identifying the transition frame, and the transition frames in the soft transition mode and the hard transition mode are distinguished.

According to the embodiment of the invention, after the target frame is determined to be the transition frame, the first frame number difference between the target frame and the nearest real transition frame is calculated, and the target frame is determined to be the real transition frame or the buffer transition frame according to the comparison result of the first frame number difference and the first preset frame number threshold value, so that the transition frames in the soft transition mode, the hard transition mode and the fast operation mirror mode can be distinguished.

According to the embodiment of the invention, after the target frame is determined to be the transition frame, the second frame number difference between the target frame and the nearest buffer transition frame is calculated, and the target frame is determined to be the real transition frame or the buffer transition frame according to the comparison result of the second frame number difference and the second preset frame number threshold value, so that the transition frames in the soft transition mode, the hard transition mode and the fast operation mirror mode can be distinguished.

The embodiment of the invention is provided with the buffer area, N continuous video frames are buffered in the buffer area, and the next video frame of the video frame with the largest frame sequence number in the buffer area is taken as the target frame, so that when the current frame is identified, the comparison is directly carried out on the basis of the video frame in the buffer area, and the identification efficiency is improved. And through the arrangement of the buffer area, the video with high real-time performance such as video stream and the like can be rapidly segmented, and a foundation is provided for real-time segmentation of the video.

The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and may be implemented according to the content of the specification, so that the technical means of the embodiments of the present invention can be more clearly understood, and the following specific implementation of the embodiments of the present invention will be more apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

fig. 1 is a schematic flow chart of a video transition identification method according to an embodiment of the present invention;

FIG. 2 shows a schematic diagram of a previous frame provided by an embodiment of the present invention;

fig. 3 is a schematic flow chart of another video transition identification method according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of still another video transition identification method according to an embodiment of the present invention;

fig. 5 shows a schematic structural diagram of a video transition recognition device according to an embodiment of the present invention;

FIG. 6 illustrates a schematic diagram of a computing device provided by an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that embodiments of the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the embodiments to those skilled in the art.

Fig. 1 shows a flow chart of a video transition identification method according to an embodiment of the present invention. In this embodiment, the video transition identifying method may be executed by a preset computing device or the like.

Specifically, as shown in fig. 1, the method includes the steps of:

step S110, determining a first previous frame and a second previous frame of the target frame from the first N video frames of the target frame; wherein the first previous frame of the target frame is a previous video frame of the target frame, and the second previous frame of the target frame is a video frame other than the first previous frame of the previous N video frames.

When the video is identified for transition, the identification can be performed by taking the video frame as a unit to determine whether the video frame is a transition frame in the video, and the video frame to be identified is a target frame. In general, the first M video frames in the video will not be subjected to video transition, so the target frame in the embodiment of the present invention may be any video frame after the mth video frame in the video.

Unlike the prior art, which only compares two adjacent frames, the embodiment of the invention determines the first N video frames of the target frame, wherein N is greater than or equal to 2. The first N video frames are N video frames that are in the same video as the target frame, have a frame number that is less than the target frame, and are capable of forming a continuous sequence of frames with the target frame, which may also be referred to as the first N previous frames. For example, if the target frame is the T-th frame of video X, the first N video frames of the target frame are the T-1 th frame and the T-2 frame … … T-N frame of video X, respectively.

Further, the N video frames are divided into a first previous frame and a second previous frame according to the proximity relation between the first N video frames of the target frame and the target frame. Wherein, the first previous frame of the target frame is the previous video frame of the target frame, and the first previous frame is the nearest previous frame to the target frame, and the first previous frame is usually one; the second previous frame of the target frame is other video frames than the first previous frame of the first N video frames, and the second previous frame of the target frame may be one or more. For example, if the target frame is the T-th frame of video X, the T-1 th frame of video X is the first previous frame of the target frame, and the T-2 th frame of video X and the … … T-N frame are the second previous frames of the target frame.

In an alternative embodiment, in order to improve the determination efficiency of the first previous frame and the second previous frame and improve the video transition recognition efficiency, a buffer area is provided in this embodiment, and a plurality of video frames may be buffered in the buffer area. Specifically, N consecutive video frames are buffered in a buffer; and taking the next video frame of the video frame with the largest frame sequence number in the buffer area as a target frame, and determining a first previous frame and a second previous frame of the target frame from N video frames buffered in the buffer area. For example, the target frame is the T-th frame of the video X, and the T-1 th frame and the T-2 th frame … … T-N frame of the video X are cached in the cache region, so that when the current frame (namely the T-th frame) is identified, the comparison is directly carried out on the basis of the video frame in the cache region, and the identification efficiency is improved. Moreover, by setting the buffer area, on the basis of improving the transition recognition efficiency, the video with high real-time performance such as video stream and the like can be rapidly segmented, and a basis is provided for real-time segmentation of the video.

Further optionally, after the analysis of the target frame is finished, the target frame is stored in the buffer area, and the video frame with the minimum frame number in the original buffer area is deleted, so that the number of the video frames stored in the buffer area is ensured to be N. For example, after determining that the T frame is a transition frame or is not a transition frame, storing the T frame in the buffer area, deleting the original T-N frame in the buffer area, and identifying and analyzing the t+1 frame as a new target frame.

In step S120, a first correlation between the target frame and the first previous frame is calculated, and a second correlation between the target frame and each of the second previous frames is calculated.

The correlation between the target frame and the first N previous frames is calculated, and may be determined according to the image content similarity, the gray histogram similarity, and/or the luminance similarity between the target frame and the previous frames. For example, the similarity of the image content between the target frame and the previous frame may be obtained by using a CORR (Correlation Coefficient) correlation algorithm, the similarity of the gray histogram of the target frame and the previous frame (i.e., the gray histogram similarity) may be calculated by using a Manhattan distance, a Euclidean distance, a Hausdorff distance, a center moment method, an X2 statistical distance, or the like, the similarity of the luminance map of the target frame and the previous frame (i.e., the luminance similarity) may be calculated by using a corresponding algorithm, and the like. The embodiment of the invention does not limit a specific relevance calculating algorithm. As shown in FIG. 2, the correlation R between the T-th frame and the T-N-th frame of the target frame is calculated _{T_T-N} Correlation degree R between the T frame and the T-N+1 frame _{T_T-N+1} … … correlation R between the T-th frame and the T-2 th frame _{T_T-2} Correlation degree R between the T-th frame and the T-1 th frame _{T_T-1} 。

The correlation degree of the target frame and the first previous frame is a first correlation degree, and the correlation degree of the target frame and the second previous frame is a second correlation degree. Since the first previous frame is usually one, the first correlation calculated in this step is one; and the target frame and each second previous frame have a corresponding second correlation, whereby the number of second correlations calculated in this step corresponds to the number of second previous frames. As shown in FIG. 2, the correlation R between the T-1 frame and the T-1 frame _{T_T-1} For the first correlation, R _{T_T-N} ，R _{T_T-N+1} ……R _{T_T-2} All are of a second degree of correlation.

Step S130, determining whether the target frame is a transition frame according to the difference between the first correlation degree and each second correlation degree.

Specifically, the difference value between the first correlation degree and each second correlation degree is calculated, and whether the target frame is a transition frame is determined according to each calculated difference value. The difference is in particular the absolute value of the difference obtained by subtracting the second correlation from the first correlation. If the target frame is determined to be a transition frame, the next video scene is indicated to be entered from the target frame.

In an alternative embodiment, the step specifically determines whether the target frame is a transition frame in the soft transition mode according to the difference between the first correlation and each second correlation. I.e. the target frame is determined to be a transition frame by the difference of the first correlation and the respective second correlation, then the transition frame is typically a transition frame of a soft transition mode.

In yet another alternative embodiment, in the soft transition mode, the video frames in the previous video scene and the video frames in the next video scene are typically joined by superposition, fade-in, fade-out, and other processing methods, and the joined frames may be single frames or multiple frames, so that the similarity between the transition frame and the joined frame before the transition frame is higher, and the similarity between the transition frame and the video frame of the previous video scene before the joined frame is lower. In view of this, in the present embodiment, after calculating the differences of the first correlation and the respective second correlations, the maximum value among the respective differences is determined, which is the maximum difference, and then it is determined whether the target frame is a transition frame based on the maximum difference.

Further alternatively, it may be determined whether the target frame is a transition frame based on the maximum difference value, in particular by a combination of one or more of the following ways;

mode one: and comparing the maximum difference value with a first preset threshold value, and if the maximum difference value is larger than the first preset threshold value, determining that the target frame is a transition frame. Wherein the first preset threshold may be 0.8, etc. Specifically, in the case where the maximum difference exceeds a certain threshold, it is indicated that the target frame, which is a transition frame, is closely associated with the first previous frame, and the second previous frame corresponding to the maximum difference is associated with a small amount, thereby determining that the soft-special field has occurred. In the mode, under the condition that the maximum difference value is extremely large, the target frame is determined to be the transition frame, and the recognition efficiency of the transition frame is improved.

Mode two: and determining a target second previous frame corresponding to the maximum difference value, and if the correlation degree of the image content of the target frame and the target second previous frame is smaller than a second preset threshold value and the correlation degree of the gray histogram of the target frame and the target second previous frame is smaller than a third preset threshold value, determining the target frame as a transition frame. In this manner, the second previous frame corresponding to the maximum difference is the target second previous frame, the image content correlation between the target frame and the target second previous frame is calculated, for example, the image content correlation may be obtained by using a CORR correlation algorithm, the correlation (i.e., the gray histogram correlation) between the target frame and the gray histogram of the target second previous frame is calculated, and under the condition that the image content correlation and the gray histogram correlation are both smaller than the corresponding threshold, it is determined that there is a large difference between the target frame and the target second previous frame, so as to determine that the target frame is a transition frame. The second preset threshold may be 0.3, and the third preset threshold may be 0.8. In the mode, the transition frame is comprehensively identified by utilizing the image content relativity and the gray level histogram relativity, so that the identification accuracy of the transition frame is improved.

Mode three: determining a target second previous frame corresponding to the maximum difference value; and if the correlation degree of the gray level histogram of the target frame and the gray level histogram of the second previous frame of the target is smaller than a fourth preset threshold value, determining that the target frame is a transition frame. Wherein the fourth preset threshold is smaller than the third preset threshold, for example, the fourth preset threshold may be 0.2, etc. In the mode, under the condition that the correlation degree of the gray level histogram of the target frame and the gray level histogram of the second previous frame of the target is extremely low, the target frame is determined to be a transition frame, and the recognition efficiency of the transition frame is improved.

In addition, in still another alternative embodiment, after determining that the target frame is a transition frame of the soft transition mode, a target second previous frame corresponding to the maximum difference value may be determined, and a video frame between the target second previous frame and the target frame is used as a soft transition linking frame or a soft transition frame. When video segmentation is carried out later, a segmentation point can be arranged between a transition frame and a frame above the transition frame, so that the transition frame is used as a starting point of a video scene corresponding to a video segment; or, in addition to setting a cut point between the transition frame and the last frame of the transition frame, a cut point is further set between the second previous frame of the target and the frame next to the second previous frame of the target, so that the second previous frame of the target is used as the end point of the video segment corresponding to the last video scene, and the video segment formed by the connection frame between the second previous frame of the target and the transition frame is additionally obtained.

Therefore, the embodiment of the invention divides the first N video frames into the first previous frame and the second previous frame of the target frame according to the degree of proximity to the target frame based on the first N video frames of the target frame, calculates the first correlation degree between the target frame and the first previous frame, and calculates the second correlation degree between the target frame and each second previous frame; and determining whether the target frame is a transition frame according to the difference between the first correlation and each second correlation. By adopting the scheme, whether soft transition occurs can be accurately identified, and the video transition identification precision is improved; moreover, the implementation process of the scheme is simple and feasible, and the execution efficiency is high.

Fig. 3 is a schematic flow chart of another video transition identification method according to an embodiment of the present invention. In this embodiment, the video transition identifying method may be executed by a preset computing device or the like. Among other things, the present embodiment focuses on identifying hard transitions.

Specifically, as shown in fig. 3, the method includes the steps of:

in step S310, a first previous frame and a second previous frame of the target frame are determined from the first N video frames of the target frame.

This step may be described with reference to the embodiment of fig. 1, and is not described here.

In step S320, a first correlation between the target frame and the first previous frame is calculated, and a third correlation between the previous video frame of the first previous frame and the first previous frame is calculated.

The first correlation is a correlation between the target frame and the first previous frame, the third phase Guan Du is a correlation between a previous video frame of the first previous frame and the first previous frame, and the third correlation may be a correlation between the second previous frame with the largest frame number and the first previous frame, or a correlation between previous frames adjacent to the first previous frame. The specific calculation method of the correlation degree may refer to the description in the embodiment of fig. 1, and will not be described herein.

In step S330, a difference between the third phase Guan Du and the first correlation is calculated.

Specifically, the third phase Guan Du is subjected to a difference processing with respect to the first correlation, and the obtained difference is used as the difference between the third phase Guan Du and the first correlation.

In step S340, if the difference between the third phase Guan Du and the first correlation is greater than the fifth preset threshold, the target frame is determined to be a transition frame.

If the difference between the third phase Guan Du and the first correlation is greater than the fifth predetermined threshold, it indicates that the first previous frame is closely related to the previous video frame and the first previous frame is not closely related to the target frame, thereby determining that a transition has occurred between the first previous frame and the target frame. Further, in the hard transition mode, no transition processing is performed on the video frames in the two video scenes, that is, there is no linking frame or no transition frame, so in this embodiment, when the difference between the third phase Guan Du and the first correlation is greater than the fifth preset threshold, it is determined that the target frame is the transition frame in the hard transition mode.

Therefore, in the embodiment of the invention, the correlation degree between two adjacent frames is calculated, and the hard transition is determined to occur under the condition that the difference value of the adjacent correlation degrees is large, so that the transition frame of the hard transition mode can be accurately identified, and the video transition identification precision is improved.

Fig. 4 is a schematic flow chart of a video transition identification method according to an embodiment of the present invention. In this embodiment, the video transition identifying method may be executed by a preset computing device or the like. In this embodiment, the fast moving mirror is distinguished from the real scene switching. The fast moving mirror is an important video shooting mode, and can realize large-range scene shooting through fast moving shooting equipment.

Specifically, as shown in fig. 4, the method includes the steps of:

in step S410, the target frame is determined to be a transition frame according to the difference between the first correlation degree and each second correlation degree, or the target frame is determined to be a transition frame according to the difference between the third phase Guan Du and the first correlation degree.

The specific implementation process of this step may refer to descriptions in other method embodiments, which are not described herein.

In step S420, a first frame number difference between the target frame and the nearest real transition frame is calculated.

In this embodiment, after identifying that the target frame is a transition frame by the method in the embodiment of fig. 1 and/or fig. 3, the target frame is further checked to determine whether the current soft/hard transition mode corresponds to the fast mirror mode.

Specifically, after determining that the target frame is a transition frame in step S410, a frame number difference between the target frame and the nearest real transition frame is calculated, the frame number difference being a first frame number difference, the first frame number difference identifying the number of video frames between the target frame and the nearest real transition frame. The nearest real transition frame is the video frame determined to be the real transition frame with the largest current frame number, and the real transition frame indicates that the scene switching is actually performed, that is, soft transition or hard transition is performed.

Step S430, judging whether the first frame number difference is larger than a first preset frame number threshold; if yes, go to step S460; if not, step S470 is performed.

Through analysis of a large number of videos, in the soft switching mode and the hard switching mode, in order to achieve good user visual experience, the number of frames of each video scene is usually greater than a certain threshold, that is, the video is switched to the next video scene after the corresponding number of frames. In view of this, in the present embodiment, the first frame difference is compared with a first preset frame threshold value, which is determined according to the conventional frame count value for each video scene in the soft-switching and hard-switching modes. For example, if the average value of the number of video frames used for each video scene in the soft transition and hard transition modes is 30, the first preset frame number threshold may be slightly lower than or equal to 30.

In step S440, a second frame number difference between the target frame and the nearest buffered transition frame is calculated.

After determining that the target frame is a transition frame in step S410, a frame number difference between the target frame and the nearest buffered transition frame is calculated, the frame number difference being a second frame number difference, the second frame number difference identifying the number of video frames between the target frame and the nearest buffered transition frame. The most recent buffered transition frame is the video frame with the largest current frame number that is determined to be the buffered transition frame that indicates that it is actually doing a fast mirror.

Step S450, judging whether the second frame number difference is larger than a second preset frame number threshold value; if yes, go to step S460; if not, step S470 is performed.

Through analysis of a large number of videos, in the fast mirror mode, video scenes can be quickly transformed, so that the video scenes can be transformed to the next video scene within a corresponding frame number. In view of this, in the present embodiment, the second frame difference is compared with a second preset frame threshold value, which is determined according to a conventional frame count value for each video scene in the fast mirror mode. For example, if the average value of the number of video frames used by each video scene in the fast mirror mode is 15, the second preset frame number threshold may be slightly lower than or equal to 15. The second preset frame number threshold can be matched with the frame number of the video buffered in the buffer zone, so that the comparison of the video frames is facilitated.

Step S460, determining the target frame as a real transition frame.

If the first frame number difference is larger than a first preset frame number threshold value, the target frame is far from a transition frame of a soft transition mode or a hard transition mode, and the target frame is determined to be a real transition frame. The real transition frame is a transition frame in either a soft transition mode or a hard transition mode.

If the second frame number difference is greater than the second preset frame number threshold, the target frame is far from the transition frame in the last fast mirror mode, and the target frame is not in the fast mirror mode any more, so that the target frame is determined to be a real transition frame.

In step S470, the target frame is determined to be the buffered transition frame.

If the first frame number difference is smaller than or equal to a first preset frame number threshold, the target frame is indicated to be closer to a transition frame of a soft transition mode or a hard transition mode, and the target frame enters a fast mirror mode, so that the target frame is determined to be a buffer transition frame. The buffered transition frame is a transition frame in fast mirror mode.

If the second frame number difference is smaller than or equal to a second preset frame number threshold, the target frame is indicated to be closer to the transition frame in the last fast mirror mode, and the current frame is indicated to be still in the fast mirror mode, so that the target frame is determined to be the buffer transition frame.

In an alternative embodiment, in the subsequent video segmentation, since the real transition frame is a transition frame in the soft transition mode or the hard transition mode, a video segmentation point can be determined between the real transition frame and the previous frame of the real transition frame, so that segmentation of two different scenes is realized; whereas, since the buffered transition frames are transition frames in the fast-mirror mode, the video frames in the fast-mirror mode generally correspond to one and the same theme or semantic, and thus the buffered transition frames are not used as video slicing points in the video slicing.

Further optionally, in order to facilitate the subsequent processing of the video, after the video is segmented, a corresponding identifier may be allocated to each video segment according to the type of the transition frame included in the video segment. For example, if the video segment includes a buffered transition frame, then an identifier of a fast mirror mode is allocated to the video segment; if the video segment contains a transition frame of a soft scene mode, distributing an identification of the soft transition mode for the video segment; and if the video segment contains the transition frame of the hard scene mode, distributing the identification of the hard transition mode for the video segment.

It can be seen that, in the embodiment of the present invention, after determining that the target frame is a transition frame according to the difference between the first correlation degree and each second correlation degree, or determining that the target frame is a transition frame according to the difference between the third phase Guan Du and the first correlation degree, the target frame is further checked to determine whether the current soft/hard transition mode corresponds to the fast mirror mode or the fast mirror mode. And the first frame number difference between the target frame and the nearest real transition frame and the second frame number difference between the target frame and the nearest buffer transition frame are used for determining whether the target frame corresponds to the real transition frame or the buffer transition frame, so that the recognition efficiency is further improved, and a basis is provided for the accurate segmentation of the subsequent video.

Fig. 5 shows a schematic structural diagram of a video transition recognition device according to an embodiment of the present invention. As shown in fig. 5, the video transition identifying apparatus 500 includes: a first determination module 510, a calculation module 520, and a second determination module 530.

A first determining module 510, configured to determine a first previous frame and a second previous frame of the target frame from the first N video frames of the target frame; wherein the first previous frame of the target frame is a previous video frame of the target frame, and the second previous frame of the target frame is a video frame other than the first previous frame among the previous N video frames;

a calculating module 520, configured to calculate a first correlation between the target frame and the first previous frame, and calculate a second correlation between the target frame and each of the second previous frames;

a second determining module 530, configured to determine whether the target frame is a transition frame according to the difference between the first correlation and each second correlation.

Determining from N video frames buffered in the buffer a first previous frame and a second previous frame of the target frame.

FIG. 6 illustrates a schematic diagram of a computing device provided by an embodiment of the present invention. The specific embodiments of the present invention are not limited to a particular implementation of a computing device.

As shown in fig. 6, the computing device may include: a processor 602, a communication interface (Communications Interface), a memory 606, and a communication bus 608.

Wherein: processor 602, communication interface 604, and memory 606 perform communication with each other via communication bus 608. Communication interface 604 is used to communicate with network elements of other devices, such as clients or other computing devices. A processor 602, for executing a program 610, the relevant steps described above for the video transition identification method embodiment may be specifically performed.

In particular, program 610 may include program code including computer-operating instructions.

The processor 602 may be a central processing unit CPU or a specific integrated circuit ASIC (Application Specific Integrated Circuit) or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included by the computing device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.

A memory 606 for storing a program 610. The memory 606 may comprise high-speed RAM memory or may further comprise non-volatile memory (non-volatile memory), such as at least one disk memory. The program 610 may be specifically configured to cause the processor 602 to perform the method of any of the method embodiments described above.

Embodiments of the present invention provide a non-volatile computer storage medium storing at least one executable instruction that may perform the video transition identification method of any of the above-described method embodiments.

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It will be appreciated that the teachings of embodiments of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the embodiments of the present invention.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., an embodiment of the invention that is claimed, requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of embodiments of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components according to embodiments of the present invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). Embodiments of the present invention may also be implemented as a device or apparatus program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the embodiments of the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Embodiments of the invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.

Claims

1. A method for video transition identification, comprising:

2. The method of claim 1, wherein determining whether the target frame is a transition frame based on the difference between the first correlation and the respective second correlation further comprises:

3. The method of claim 2, wherein the determining whether a target frame is a transition frame based on the maximum difference value further comprises:

4. The method of claim 2, wherein the determining whether a target frame is a transition frame based on the maximum difference value further comprises:

5. The method of claim 2, wherein the determining whether a target frame is a transition frame based on the maximum difference value further comprises:

6. The method of any of claims 1-5, wherein determining whether the target frame is a transition frame based on the difference between the first correlation and the respective second correlations further comprises:

7. The method according to any one of claims 1-6, further comprising:

8. The method of claim 7, wherein determining that the target frame is a transition frame if the difference between the third phase Guan Du and the first correlation is greater than a fifth predetermined threshold further comprises:

9. The method according to any one of claims 1-8, wherein after determining that the target frame is a transition frame, the method further comprises:

10. The method according to any one of claims 1-8, further comprising:

11. The method according to claim 9 or 10, wherein the buffered transition frame is a transition frame in fast mirror mode.

12. The method of any of claims 1-11, wherein prior to the determining a first previous frame and a second previous frame of a target frame from a first N video frames of the target frame, the method further comprises: caching N continuous video frames in a cache region; taking the next video frame of the video frame with the largest frame sequence number in the buffer area as a target frame;

13. A video transition recognition apparatus, comprising:

14. A computing device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform operations corresponding to the video transition identification method according to any one of claims 1 to 12.

15. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the video transition identification method of any one of claims 1-12.