CN108010044A

CN108010044A - A kind of method of video boundaries detection

Info

Publication number: CN108010044A
Application number: CN201610962372.1A
Authority: CN
Inventors: 孙伟芳; 朱立松; 黄建杰
Original assignee: CCTV INTERNATIONAL NETWORKS WUXI Co Ltd
Current assignee: CCTV INTERNATIONAL NETWORKS WUXI Co Ltd
Priority date: 2016-10-28
Filing date: 2016-10-28
Publication date: 2018-05-08
Anticipated expiration: 2036-10-28
Also published as: CN108010044B

Abstract

The invention discloses the method that a kind of video boundaries of computer video technical field of information processing detect, the method for the video boundaries detection, comprises the following steps that：S1：Standard deviation between pixel by calculating each frame；S4：Calculate the intermediate frame of all non-orphaned fragments and the histogram similarity of head and the tail；S5：It is doubtful gradual shot fragment to merge continuous non-orphaned fragment, this method first passes through pixel difference calculation and deletes monochrome frame, video is segmented, calculate the histogram similarity of fragment head and the tail frame, according to the information deletion redundant video fragment, remaining video segment is divided into the difference detector lens mutation of isolated and non-orphaned and gradual change, all calculating of this method does not all compute repeatedly, consider the positional information and colouring information of image, the method of shot cut detection is different from other methods, it can not only detect gradual shot, and out position and length can be detected, algorithm has certain robustness, thought is easy and effective.

Description

Video boundary detection method

Technical Field

The invention relates to the technical field of computer video information processing, in particular to a video boundary detection method.

Background

The basis of shot boundary detection is that visual contents in the same shot are similar, video contents between different shots are different greatly, and feature differences are obvious. The shot change of the video is mainly divided into two types of shear and gradual change, wherein the shear is the first frame of the next shot immediately after the last frame of the previous shot without transition, and the gradual change is the gradual completion of the shot change in a period of time through the visual effects of dissolution, fade-in/fade-out, erasure and the like, and generally lasts for several frames to over ten frames.

At present, many researches on video shearing and gradual change are performed, one part of the researches only aims at shearing or gradual change shots, and the complexity of a method for simultaneously detecting gradual change and shearing is higher. The main reason for limiting the processing speed of the shot boundary detection algorithm is that the number of video pictures to be processed is large due to repeated feature extraction and comparison.

The existing detection method does not consider the influence of factors such as video captions (bottom fixed rolling pictures), station captions, black marks and the like when video detection is carried out, and does not consider position information when the detection algorithm carries out color similarity between video frames. In the existing algorithm, the gradual change detection is carried out, or the gradual change length and position are not detected, or the detection process is too high in complexity, time-consuming and low in accuracy, and the influence of a monochromatic frame is not removed. Therefore, we have invented a method for detecting video boundary to solve the above problems.

Disclosure of Invention

The present invention aims to provide a method for detecting a video boundary, so as to solve the problem that the conventional detection method proposed in the background art does not consider the influence of factors such as video subtitles (bottom fixed rolling pictures), station captions and black captions when performing video detection, and does not consider position information when performing color similarity between video frames in a detection algorithm. The gradual change detection in the existing algorithm does not detect the gradual change length and position, or the problems of too high complexity, time consumption, low accuracy and no removal of the influence of a monochromatic frame in the detection process.

In order to achieve the purpose, the invention provides the following technical scheme: a method for detecting video boundary comprises the following specific steps:

s1: by calculating the standard deviation between pixels of each frame, if the standard deviation is close to 0, the frame is a monochromatic frame, all monochromatic frames are removed, then 20 frames are taken as intervals, the video to be detected is divided into a series of video segments with the length of 21 frames, and then the similarity of the histograms between the head frame and the tail frame of each video segment is calculated;

s2: dynamically selecting a threshold value, and calculating the elimination threshold values of all video clips in the step S1;

s3: the segments with adjacent segments removed from all the remaining video segments are called isolated segments, and d is set ²⁰ (n + 1) and d ²⁰ (n-1) is rejected, then d ²⁰ (n) calculating the similarity of the head-to-tail histograms of the isolated segments and the adjacent segments which are not eliminated, and marking as d ²⁰ (n-2, n) and d ²⁰ (n, n + 2) if 1 < d ²⁰ (n)/d ²⁰ (n-2,n)＜T ₁ Or 1 < d ²⁰ (n)/d ²⁰ (n,n+2)＜T ₁ The video clip frames are eliminated, which indicates that no large characteristic difference exists among the video clip frames;

s4: calculating the similarity of the intermediate frames and the head and the tail histograms of all the non-isolated segmentsAndif it isAnd isThen the video clip is eliminated if the video clip has no large characteristic difference;

s5: merging continuous non-isolated segments into suspected gradient shot segmentsFor an isolated segment, sequentially calculating the similarity of histograms between two adjacent frames in the segment, and recording the segment d ²⁰ (n) the maximum value and the average value except the maximum value in all the adjacent frame similarity are respectivelyAnd

s6: selecting a segment m with the largest similarity of the head frame and the tail frame segmented in the step S1 in the suspected gradual change video to represent a segment with the strongest gradual change, calculating the histogram similarity between two adjacent frames in the segment, and selectingThe similarity d between the corresponding two frames and the head and the tail of the segment is calculated respectively ₁₀ And d ₂₀ ；

S7：The corresponding two frames are respectively x1 and x2, the interframe similarity of the two sides is sequentially and respectively calculated, and then the standard deviation from x1 to x1+10 and from x2 to x2+10 is calculated to be delta _X1 And delta _X2 Then adding frames one by one to calculate the subsequent standard deviation;

s8: calculating the standard deviation in 10 frames of video clips at one side with larger similarity, and then sequentially adding the frames one by one to calculate the subsequent standard deviation;

s9: and setting the length of the residual segment as L, when L is less than or equal to 25, because the new gradual change can not occur within 25 frames, otherwise, the human eyes can not recognize, performing shearing re-detection in the step S5, and when L is more than 25, performing iteration according to the step S6.

Preferably, in step S1, the histogram similarity is calculated by removing the upper 10% and the lower 15% of the picture, leaving the middle 75% of the picture, dividing the picture into 4 blocks to calculate HSV histogram information of each frame, calculating the inter-frame similarity corresponding to the components, and then weighting, summing and normalizing, wherein the specific calculation method of the histogram similarity is as follows:

in the formula: d (i, i + 1) represents the histogram similarity between i and i +1 frames after normalization, H (j) represents the value of the histogram with the i-th intra-frame level being j, and M is the total level of the histogram. Especially for some histogram levels to be emphasized, a weight W (j) is used, b representing the number of rectangular blocks, C _k Representing the weight of the kth rectangular block, d (20n, 20 (n + 1)) representing the head-to-tail characteristic difference of the nth time slice, and using d ²⁰ (n) is abbreviated.

Preferably, in step S2, the dynamic threshold calculation method includes:

in the formula: mu.s _G Representing all the segment head and tail characteristics d in the video ²⁰ Mean value of (n), μ _L Means, δ, representing the mean of the differences of the head and tail characteristics of 10 segments in a set of threshold units _L Is the characteristic distance standard deviation of each segment in the threshold unit, in the formula, a is a parameter which is determined in the training parameter, and the characteristic difference is less than or equal to the threshold T _L And do not satisfy (d) ²⁰ (n))＞3d ²⁰ (n-1)∪d ²⁰ (n)＞3d ²⁰ (n+1)∩d ²⁰ (n)＞0.8μ _G The video clip of (2) is culled.

Preferably, in the step S3, T ₁ Representing a multiple of approximately 1 of the histogram similarity.

Preferably, in step S5, the specific method for determining the similarity between all adjacent frames is as follows:

s51: if it isIndicating that the middle of the segment has abnormal conditions such as a large lens or a screen splash and the like, and one party influences the detection result and rejects the detection result;

s52: if it isWherein T is ₂ The similarity between the two histograms is a larger multiple, which indicates that the similarity between the head and the tail of the video is almost the same as the maximum inter-frame similarity, and the difference between the maximum inter-frame similarity and the similarity between other average inter-frame similarities is not large or the characteristic change trend of the whole video segment is not obvious, and the representative scene of the segment is not transformed and removed;

s53: if it isIndicating that the similarity between the head and the tail frames is almost the same as the maximum similarity, and the maximum similarity between the frames is far greater than the similarity between other average frames, indicating that the shot has sudden change in the segment,is a mutation site;

s54: if it isThe similarity between all frames in the segment is not high, and the frame number of the segment is less than 25 frames, so that gradual change which can be recognized by human eyes cannot occur, and the segment is represented that a lens shakes or moves and is removed.

Preferably, in step S6, 1 < d ₁₀ /d ₂₀ ＜T ₁ Or 1 < d ₂₀ /d ₁₀ ＜T ₁ Indicating that the feature variations on both sides of the segment maximum are similar, assuming d ₁₀ Larger, 1 < d ₁₀ /d ₂₀ ＜T ₁ When is coming into contact withTime, showThe similarity between frames at two sides is very large, which represents that the segment is a mutation segment,for the mutation position, the remaining video segments on both sides of the m segment are iterated again according to the step S6 respectively, whenShow thatThe frames on the two sides do not change greatly, the whole suspected gradual change segment is a no-lens change segment, and the segment is removed when d ₁₀ /d ₂₀ ＞T ₁ Or d ₂₀ /d ₁₀ ＞T ₁ When the time is short, the characteristic changes on two sides of the maximum value of the segment are inconsistent and representAt one of the two ends of the gradual change, assuming d ₁₀ Larger, then d ₁₀ The corresponding side segment is transferred to step S8, d ₂₀ The corresponding side segment is transferred to step S6.

Preferably, in the step S7, when delta is greater than _x1 /δ _x1+1 ＞T ₃ Or delta _X2 /δ _X2+1 ＞T ₃ In which T is ₃ And (4) respectively representing that the corresponding positions on the left side or the right side are gradual change end points and the length of the middle video segment is gradual change length by larger multiples of the standard deviation of the similarity of the two histograms, and if the two sides have the residual video segments, turning to the step S9, otherwise, continuing to iterate until the segments are ended.

Preferably, in the step S8, the standard deviations of the kth frame and the K +1 are respectively δ _K And delta _K+1 Up to delta _K /δ _K+1 >T ₃ In which T is ₃ Is a larger multiple of the standard deviation of the similarity of the two histograms, K is the position where the gradual change ends, and the gradual change length isAnd when the current position reaches the Kth frame, the rest video segments are transferred to the next step, otherwise, the iteration is continued until the segments are finished.

Compared with the prior art, the invention has the beneficial effects that: the method deletes the monochromatic frame through the pixel difference calculation mode, segments the video, calculates the histogram similarity of the head frame and the tail frame of the segment, deletes the redundant video segment according to the information, and the other video segments are isolated and non-isolated to respectively detect the lens mutation and the gradual change.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram illustrating dynamically selecting threshold segments according to the present invention;

FIG. 3 is a schematic diagram of calculating inter-frame similarity between the beginning and the end of a segment according to the present invention;

FIG. 4 is a schematic diagram of a suspected gradual transition segment according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

Referring to fig. 1-4, the present invention provides a technical solution: a method for detecting video boundary comprises the following specific steps:

s1: by calculating the standard deviation between pixels of each frame, if the standard deviation is close to 0, the frame is a monochromatic frame, all monochromatic frames are removed, 20 frames are taken as intervals, the video to be detected is divided into a series of video segments with the length of 21 frames, then, the histogram similarity between the head frame and the tail frame of each video segment is calculated, the histogram similarity is divided into 4 blocks to calculate HSV histogram information of each frame by removing the upper 10% and the lower 15% of the picture and leaving the middle 75%, and the similarity between frames is calculated by calculating corresponding components, and then, weighting and summing are performed again and normalization are performed, wherein the specific calculation method of the histogram similarity is as follows:

in the formula: d (i, i + 1) represents the histogram similarity between i and i +1 frames after normalization, H (j) represents the value of the histogram with the i-th intra-frame level being j, and M is the total level of the histogram. Especially for some histogram levels to be emphasized, a weight W (j) is used, b representing the number of rectangular blocks, C _k Representing the weight of the kth rectangular block, d (20n, 20 (n + 1)) representing the head-to-tail characteristic difference of the nth time slice, and using d ²⁰ (n) abbreviation;

s2: dynamically selecting threshold values and calculating culling threshold values for all video segments in step S1, e.g.d ²⁰ (n) selecting a threshold value unit according to the graph 2, when n is less than 6, selecting the threshold value unit from 1 st to 10 th video clips according to the graph 2, and the dynamic threshold value calculation method comprises the following steps:

in the formula: mu.s _G Representing the head and tail characteristics d of all segments in the video ²⁰ Mean value of (n), μ _L Means, δ, representing the mean of the differences between the head and tail features of 10 segments in a set of threshold units _L Is the characteristic distance standard deviation of each segment in the threshold unit, in the formula, a is a parameter which is determined in the training parameter, and the characteristic difference is less than or equal to the threshold T _L And does not satisfy (d) ²⁰ (n))＞3d ²⁰ (n-1)∪d ²⁰ (n)＞3d ²⁰ (n+1)∩d ²⁰ (n)＞0.8μ _G Removing the video clips;

s3: the segments with adjacent segments removed from all the remaining video segments are called isolated segments, and d is set ²⁰ (n + 1) and d ²⁰ (n-1) is rejected, then d ²⁰ (n) calculating the similarity of the head-to-tail histograms of the isolated segments and the adjacent segments which are not eliminated, and marking as d ²⁰ (n-2, n) and d ²⁰ (n, n + 2) if 1 < d ²⁰ (n)/d ²⁰ (n-2,n)＜T ₁ Or 1 < d ²⁰ (n)/d ²⁰ (n,n+2)＜T ₁ Indicating that there is no large feature difference between frames of the video segment, culling, T ₁ A multiple close to 1 representing the histogram similarity;

s4: calculating the similarity of the intermediate frames and the histograms of the head and the tail of all the non-isolated segmentsAndif it isAnd isThen the video clip is eliminated if the video clip has no big characteristic difference;

s5: merging continuous non-isolated segments into suspected gradient shot segments, calculating the histogram similarity between two adjacent frames in the segments in sequence aiming at the isolated segments, and recording the segments d ²⁰ (n) the maximum value and the average value except the maximum value in all the adjacent inter-frame similarities are respectivelyAndthe specific judgment method of the similarity between all adjacent frames is as follows:

s51: if it isIndicating that a large lens or a screen splash and other abnormal conditions exist in the middle of the segment, and one party influences the detection result and rejects the segment;

s53: if it isIndicating that the similarity between the head and the tail frames is almost the same as the maximum similarity, and the maximum similarity between the frames is far greater than the similarity between other average frames, indicating that the segment has shot abrupt change,is a mutation site;

s54: if it isIndicating that the similarity between all frames in the segment is not high, and because the frame number of the segment is less than 25 frames, the gradual change which can be identified by human eyes can not occur, representing that the lens in the segment shakes or moves, and is removed;

s6: selecting the segment m with the largest similarity of the head frame and the tail frame segmented in the step S1 in the suspected gradual change video to represent the segment with the strongest gradual change, calculating the histogram similarity between two adjacent frames in the segment, and selectingCalculating the similarity d between the corresponding two frames and the head and the tail of the segment respectively ₁₀ And d ₂₀ 1 < d as shown in FIG. 3 ₁₀ /d ₂₀ ＜T ₁ Or 1 < d ₂₀ /d ₁₀ ＜T ₁ Indicating similar feature variations on both sides of the segment maximum, assuming d ₁₀ Larger, 1 < d ₁₀ /d ₂₀ ＜T ₁ When is coming into contact withTime, showThe similarity between frames at two sides is very large, which represents that the segment is a mutation segment,for the mutation position, the remaining video segments on both sides of the m segment are iterated again according to the step S6 respectively, whenShow thatThe frames on the two sides do not change greatly, the whole suspected gradual change segment is a no-lens change segment, and the segment is removed when d ₁₀ /d ₂₀ ＞T ₁ Or alternativelyd ₂₀ /d ₁₀ ＞T ₁ When the time is short, the characteristic changes on both sides of the maximum value of the segment are inconsistent and representAt one of the two ends of the gradual change, assuming d ₁₀ Larger, then d ₁₀ The corresponding side segment is transferred to step S8, d ₂₀ The corresponding side segment is transferred to the step S6;

S7：the two corresponding frames are x1 and x2 respectively, the interframe similarity of the two sides is sequentially and respectively calculated, and then the standard deviation of x1 to x1+10 and x2 to x2+10 is calculated to be delta _X1 And delta _X2 Then add frames one by one to calculate the subsequent standard deviation, as shown in FIG. 4, for example, the standard deviation of 11 frames on both sides is delta _x1+1 And delta _x2+1 When delta is _x1 /δ _x1+1 ＞T ₃ Or delta _X2 /δ _X2+1 ＞T ₃ If the left side or the right side of the video segment is the gradual change end point, the length of the middle video segment is the gradual change length, if the two sides have the residual video segments, the step S9 is carried out, otherwise, the iteration is continued until the segment is ended;

s8: calculating the standard deviation in 10 frames of the video clip with larger similarity, adding the frames one by one to calculate the subsequent standard deviation, and setting the standard deviations of the Kth frame and the Kth +1 frame as delta respectively _K And delta _K+1 Up to delta _K /δ _K+1 >T ₃ In which T is ₃ Is a larger multiple of the standard deviation of the similarity of the two histograms, K is the position where the gradual change ends, and the gradual change length isWhen the current position reaches the Kth frame, the rest video segments are transferred to the next step, otherwise, the iteration is continued until the segments are finished;

s9: and setting the length of the residual segment as L, when L is less than or equal to 25, because the new gradual change can not occur within 25 frames, otherwise, human eyes can not recognize the new gradual change, performing shearing re-detection in the step S5, and when the video segment with L being more than 25 is iterated according to the step S6.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method of video boundary detection, comprising: the method for detecting the video boundary comprises the following specific steps:

s3: the segments with adjacent segments removed from all the remaining video segments are called isolated segments, and d is set ²⁰ (n + 1) and d ²⁰ (n-1) is rejected, then d ²⁰ (n) calculating the similarity of the head-to-tail histograms of the isolated segments and the adjacent segments which are not eliminated, and marking as d ²⁰ (n-2, n) and d ²⁰ (n, n + 2) if 1 < d ²⁰ (n)/d ²⁰ (n-2,n)＜T ₁ Or 1 < d ²⁰ (n)/d ²⁰ (n,n+2)＜T ₁ The video clip frames are eliminated if no large characteristic difference exists;

s4: calculating the similarity of the intermediate frames and the head and the tail histograms of all the non-isolated segmentsAndif it isAnd is provided withThen the video clip is eliminated if the video clip has no large characteristic difference;

s5: merging continuous non-isolated segments into suspected gradient shot segments, calculating the histogram similarity between two adjacent frames in the segments in sequence aiming at the isolated segments, and recording the segments d ²⁰ (n) the maximum value and the average value except the maximum value in all the adjacent inter-frame similarities are respectivelyAnd

S7：The two corresponding frames are x1 and x2 respectively, the interframe similarity of the two sides is sequentially and respectively calculated, and then the standard deviation of x1 to x1+10 and x2 to x2+10 is calculated to be delta _X1 And delta _X2 Then adding frames one by one to calculate the subsequent standard deviation;

2. The method of claim 1, wherein the step of detecting the video boundary comprises: in the step S1, the histogram similarity is calculated by removing the upper 10% and the lower 15% of the picture, leaving the middle 75% of the picture, dividing the picture into 4 blocks to calculate HSV histogram information of each frame, calculating the inter-frame similarity of the corresponding components, weighting, summing and normalizing, wherein the specific calculation method of the histogram similarity is as follows:

in the formula: d (i, i + 1) represents the histogram similarity between i and i +1 frames after normalization, H (j) represents the value of the histogram with the i-th intra-frame level being j, and M is the total level of the histogram. Especially for some histogram levels to be emphasized, a weight W (j) is used, b representing the number of rectangular blocks, C _k Representing the weight of the k-th rectangular block, d (20n, 20 (n + 1)) representing the head-to-tail feature difference of the nth time slice, and using d ²⁰ (n) is abbreviated.

3. The method of claim 1, wherein: in step S2, the dynamic threshold calculation method includes:

in the formula: mu.s _G Representing the head and tail characteristics d of all segments in the video ²⁰ Mean value of (n), μ _L Means, δ, representing the mean of the differences between the head and tail features of 10 segments in a set of threshold units _L Is the characteristic distance standard deviation of each segment in the threshold unit, in the formula, a is a parameter which is determined in the training parameter, and the characteristic difference is less than or equal to the threshold T _L And do not satisfy (d) ²⁰ (n))＞3d ²⁰ (n-1)∪d ²⁰ (n)＞3d ²⁰ (n+1)∩d ²⁰ (n)＞0.8μ _G The video clip of (2) is culled.

4. The method of claim 1, wherein: in the step S3, T ₁ Representing a multiple of approximately 1 of the histogram similarity.

5. The method of claim 1, wherein: in step S5, the specific method for determining the similarity between all adjacent frames is as follows:

s53: if it isIndicating head and tail framesThe inter-frame similarity is similar to the maximum similarity, and the maximum inter-frame similarity is far greater than the other average inter-frame similarities, which represents that the shot has sudden changes in the segment,is a mutation site;

6. The method of claim 1, wherein: in the step S6, d is more than 1 ₁₀ /d ₂₀ ＜T ₁ Or 1 < d ₂₀ /d ₁₀ ＜T ₁ Indicating similar feature variations on both sides of the segment maximum, assuming d ₁₀ Larger, 1 < d ₁₀ /d ₂₀ ＜T ₁ When is coming into contact withTime, showThe similarity between frames at two sides is very large, which represents that the segment is a mutation segment,for mutation position, the rest video segments on both sides of the m segments are respectively iterated according to the step S6, whenShow thatThe change between frames on two sides is not large, and the whole suspected gradual change sheetThe segment is a lens-free change segment, and is removed when d ₁₀ /d ₂₀ ＞T ₁ Or d ₂₀ /d ₁₀ ＞T ₁ When the time is short, the characteristic changes on two sides of the maximum value of the segment are inconsistent and representAt one of the two ends of the gradual change, assuming d ₁₀ Larger, then d ₁₀ The corresponding side segment is transferred to step S8, d ₂₀ The corresponding side segment is transferred to step S6.

7. The method of claim 1, wherein the step of detecting the video boundary comprises: in the step S7, when delta is greater than _x1 /δ _x1+1 ＞T ₃ Or delta _X2 /δ _X2+1 ＞T ₃ In which T is ₃ And (4) respectively representing that the corresponding positions on the left side or the right side are gradual change end points and the length of the middle video segment is gradual change length as a larger multiple of the standard deviation of the similarity of the two histograms, if the two sides have the residual video segments, turning to the step S9, and otherwise, continuing to iterate until the segments are ended.

8. The method of claim 1, wherein: in the step S8, the standard deviations of the Kth frame and the Kth +1 frame are respectively set as delta _K And delta _K+1 Up to delta _K /δ _K+1 >T ₃ Then K is the position where the gradual change is finished, and the gradual change length isAnd when the current position reaches the Kth frame, the rest video segments are transferred to the next step, otherwise, the iteration is continued until the segments are finished.