WO2016037422A1

WO2016037422A1 - Method for detecting change of video scene

Info

Publication number: WO2016037422A1
Application number: PCT/CN2014/092640
Authority: WO
Inventors: 刘鹏
Original assignee: 刘鹏
Priority date: 2014-09-11
Filing date: 2014-12-01
Publication date: 2016-03-17
Also published as: CN104184925A

Abstract

Disclosed is a method for detecting change of a video scene. The method comprises the steps of: converting image frames of a video file from RGB space to HSV space; splitting the video file into a plurality of video sequences, and acquiring through calculation an averaging hue histogram of each video sequence; performing normalization processing on each averaging hue histogram; performing matching calculation on the averaging hue histograms of every two adjacent video sequences to obtain a matching coefficient; and if the matching coefficient between the averaging hue histograms of the corresponding two adjacent video sequences is greater than a preset matching threshold, regarding the two adjacent video sequences as video sequences of different scenes; otherwise, regarding the two adjacent video sequences as video sequences of the same scene. Therefore, quick video scene detection is realized on the basis of video sequences according to main hue differences between adjacent video sequences. The present invention can be further applied to other image detection fields, and has high application value.

Description

Method for detecting video scene change

Cross-reference to related applications

The present application claims the benefit of priority to the benefit of the priority of the priority of the entire disclosure of the entire disclosure of the entire disclosure of the entire disclosure of

Technical field

The present invention relates to video image analysis technology, and in particular to a method for detecting video scene changes.

Background technique

The content type of the video is also different during the playback of the video. The type conversion of the video often occurs at the moment of the video scene change. The scene change of the video often causes the content type of the video to change. In order to ensure the consistency of a video in a visual effect, it is necessary to perform fusion processing for different video scenes, provided that the video scene transformation is effectively detected.

Existing video scene detection methods mainly include:

1. A method for judging based on differences between video frames. For example, Chinese patent application CN201310332133.4 proposes a dynamic video scene change detection method, comprising the steps of: acquiring a current frame of a dynamic video image in real time; calculating a scene transformation feature parameter ti(n) of the current frame; and according to the dynamic video image The scene transformation feature parameter of the previous one or several frames is calculated corresponding to the dynamic threshold threshold(n) of the current frame; determining whether the scene transformation feature parameter ti(n) of the current frame is less than or equal to its corresponding dynamic threshold, and if so, Then, it is determined that it is not a scene change frame; otherwise, it is determined to be a scene change frame.

2. A scene detection method based on an undirected weighted graph. The method treats all video sequences as the endpoints of the image, uses the similarity of the video sequence in the spatial and temporal domains as the distance between each edge, and loops through the end points of the graph in a tree stripping manner, each time determining a most likely scene. Boundary until the end of the graph The points are all stripped.

Although the existing detection methods can detect the change of the video scene, the existing video scene change detection method has the defects of complicated processing and low detection efficiency.

Summary of the invention

To overcome the shortcomings of the prior art, the present invention provides a video scene change detection method that is simple to implement and fast to detect.

The invention adopts the following technical solutions: a method for detecting a video scene change, which comprises the steps of:

A. Convert the image frame of the video file from the RGB space to the HSV space;

B. dividing the video file into a plurality of video sequences, and calculating a perforated histogram of each video sequence;

C. Normalize the chromatic field histogram;

D. Perform matching calculation on the average hue histogram of the adjacent two video sequences to obtain a matching coefficient;

E. If the matching coefficient between the average hue histograms of the adjacent two video sequences is greater than the preset matching threshold, if the adjacent two video sequences are considered to be video sequences of different scenes, otherwise the video is considered to be the same scene. sequence.

The method for detecting a video scene change further includes the step of performing pixel preprocessing on the image frame of the video file before the step B.

The step of preprocessing the pixel specifically includes:

When the saturation S of a certain pixel point is less than the preset first threshold T1 and the brightness V of the pixel is less than the preset second threshold T2, the pixel is discarded;

When a saturation S of a pixel is greater than a preset third threshold T3 and the brightness V of the pixel is less than a preset fourth threshold T4, the pixel is discarded;

The remaining pixels in the image frame of the video sequence are preserved.

Wherein, the first threshold T1=0.2, the second threshold T2=0.8, and the third threshold T3=0.8 are preset. The fourth threshold T4 = 0.2.

The first threshold T1=0.14, the second threshold T2=0.92, the third threshold T3=0.94, and the fourth threshold T4=0.13 are preset.

The step of calculating a field average histogram of each video sequence specifically includes:

Calculating a H component histogram of each image frame in each video sequence;

The histograms of the hue H components of each image frame are superimposed and their mean values are respectively taken, and the histograms of the average hue of each video sequence are respectively calculated.

The step C specifically includes:

After obtaining the corresponding histogram of the hue of the video sequence, the total number of pixels counted by the histogram of the hue tone is calculated;

The normalized histogram of the histogram of the histogram is divided by the number of pixels of each field of the hue histogram divided by the total number of pixels.

Wherein, the step D calculates a matching coefficient between the average hue histogram H1 (K) and the hue hue histogram H ₂ (K), and adopts the following formula:

K represents the tone level of the pixel, K = 1, 2, 3, ..., Q, Q is the maximum tone level.

Compared with the prior art, the present invention has the following beneficial effects:

The invention provides a scene detection method based on a histogram of histograms, firstly determining the main color of the background color of the video sequence according to the cumulative histogram corresponding to the hue components of the color categories of each video sequence, according to the adjacent video sequence. The main difference in hue between the video sequences enables fast video scene detection. The invention can also be further applied to other fields of image detection, and has high application value.

DRAWINGS

BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a flow chart showing an embodiment of the present invention.

detailed description

Since the video in a scene tends to have the same environment background, the resulting color of the picture is relatively consistent, and different scene environments will have large differences, and the background color will be different. Therefore, the present invention expresses colors according to each video sequence. The cumulative histogram corresponding to the hue component of the category determines the main hue of the background color of the video sequence, and fast video scene detection is implemented on the basis of the video sequence according to the main hue difference between adjacent video sequences.

As shown in FIG. 1, a preferred embodiment of the present invention includes the following implementation steps:

Step S1: Convert the image frame of the video file from the RGB space to the HSV space.

In computer image processing, the RGB color model is usually used, which adopts the three primary color mechanism of color. Although it has a very clear physical meaning, it is not suitable for human visual features.

The HSV color model is more suitable for human visual features. The HSV color model determines one color using three parameters: hue H (Hue), saturation S (Saturation), and brightness V (Value). The color type represented by the hue H can directly reflect the color values of the corresponding wavelengths in the color and the spectrum, such as red, orange, yellow, green, blue, purple, etc.; the saturation S represents the vividness of the color, which can be understood as a certain The proportion of the white component in the color, the larger the S, the less the white component, the brighter the color; and the brightness V represents the degree of lightness and darkness of the color, and there is no direct relationship between the light intensity and the light intensity.

Taking the pixel value of 8 bits as an example, the calculation formula for converting each pixel in the image frame from RGB space to HSV space is as follows:

Step S2: performing pixel preprocessing on the image frame of the video file.

In the image frame of the video file, the color change of some pixels is not noticeable by the human eye. These pixels not only increase the calculation difficulty of the scene change detection, but also reduce the accuracy of the detection result. Therefore, each image frame of the pre-video sequence needs to be pre-processed to filter out pixels whose colors can be recognized by the human eye.

The pixel pre-processing process determines whether a pixel point can be recognized by setting a certain threshold value for the saturation S and the brightness V: when the saturation S of a certain pixel point is smaller than a preset first threshold value T1 and the pixel point is When the brightness V is less than the preset second threshold T2, the pixel point is discarded; when the saturation S of a certain pixel point is greater than the preset third threshold T3 and the brightness V of the pixel is less than the preset fourth threshold T4, Pixels are discarded; the remaining pixels in the image frame of the video sequence are preserved.

Moreover, when the saturation of a certain pixel point S ∈ (0.8, 1] and the brightness V ∈ [0, 0.2), the pixel point is considered to be a black pixel point; when the saturation of a certain pixel point S ∈ [0, 0.2) and the brightness V ∈ (0.8, 1), the pixel point is considered to be a white pixel point. Accordingly, the first threshold T1=0.2, the second threshold T2=0.8, and the third threshold T3=0.8 may be set in advance. The fourth threshold T4 = 0.2.

In a preferred embodiment, the first threshold T1=0.14, the second threshold T2=0.92, the third threshold T3=0.94, and the fourth threshold T4=0.13 are preset.

Step S3: Dividing the video file into a plurality of video sequences, and calculating a perforated histogram of each video sequence.

The hue H component histogram is represented as H(K), where K represents the hue level of the pixel, K=1, 2, 3, ..., Q, Q is the total number of hue levels of the hue H (the maximum hue series); And the range of the hue H is [0, 2π]. Due to the limited ability of the human eye to identify colors, the h-component of the hue can be non-uniformly quantized into Q levels according to the ability of the human eye to recognize the color, respectively representing Q different colors that can be recognized by the human eye, such as Q=8. , then the value range of Q is [0,7]

The average hue histogram refers to the cumulative average histogram of the H component of the video sequence. It counts the total number of pixels corresponding to each tone level of a multi-frame image within a certain range. The average histogram can also be regarded as A histogram of the H component is obtained for all pixels of a video.

The entire video sequence (or video file) is divided into a plurality of video sequences by a predetermined length, and each video sequence includes N image frames. Therefore, if you want to detect faster, you can choose a larger N value. If the detection result is more accurate, you can choose a relatively small N value.

Assuming that the m-th video sequence contains N image frames, the histogram of the hue H component of the n-th image frame is sequentially calculated as H _n (K), where n=1, 2, 3, . . . , N, then the m-th segment The perforation histogram L _m (K) of the video sequence can be expressed as the following formula (4):

That is, a video sequence containing N image frames actually calculates the hue H component histogram of each image frame as H _n (K) and then takes the mean value after superposition, and obtains the average color tone of the video sequence. Figure L _m (K).

In step S4, the averaging hue histogram is normalized.

After the image frame is preprocessed in step S2, the number of remaining pixels in each image frame is also different, which causes the total number of statistical pixel points of the perforation histogram of each video sequence to be different. Therefore, it is necessary to normalize the histogram of the hue tone of each video sequence to facilitate comparison of the hue histogram between each video sequence.

The present invention employs a normalization process based on total pixel points. After obtaining the corresponding histogram of the hue of the video sequence, the total number of pixels counted by the histogram of the average hue of the field is calculated, and then the number of pixels H(K) of the field histogram is divided by the pixel point. The number of totals is the normalized histogram of the histogram of the histogram.

Step S5: Perform matching calculation on the average hue histogram of the adjacent two video sequences to obtain a matching coefficient ξ.

For example, to calculate the matching coefficient between the hue histogram H ₁ (K) and the hue histogram H ₂ (K), use the following formula (5):

The distribution of the permeation histogram H1 represented by the matching coefficient 偏离 deviates from the distribution of the histogram H2 of the averaging hue, and the smaller the matching coefficient ξ indicates that the lower the degree of deviation, the more the two histograms H1 and H2 match.

Step S6: sequentially determining whether the matching coefficient 场 between the histograms of the adjacent two video sequences is greater than a preset matching threshold. If yes, the adjacent two video sequences are considered to be video sequences of different scenes, otherwise Is a video sequence of the same scene.

The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. Within the scope.

Claims

A method for detecting a change in a video scene, comprising the steps of:

A. Convert the image frame of the video file from the RGB space to the HSV space;

B. dividing the video file into a plurality of video sequences, and calculating a perforated histogram of each video sequence;

C. Normalize the chromatic field histogram;

D. Perform matching calculation on the average hue histogram of the adjacent two video sequences to obtain a matching coefficient;

E. If the matching coefficient between the average hue histograms of the adjacent two video sequences is greater than the preset matching threshold, if the adjacent two video sequences are considered to be video sequences of different scenes, otherwise the video is considered to be the same scene. sequence.
The method for detecting a video scene change according to claim 1, wherein the step B further comprises the step of performing pixel preprocessing on the image frame of the video file.
The method for detecting a change of a video scene according to claim 2, wherein the step of preprocessing the pixel comprises:

When the saturation S of a certain pixel point is less than the preset first threshold T1 and the brightness V of the pixel is less than the preset second threshold T2, the pixel is discarded;

When a saturation S of a pixel is greater than a preset third threshold T3 and the brightness V of the pixel is less than a preset fourth threshold T4, the pixel is discarded;

The remaining pixels in the image frame of the video sequence are preserved.
A method for detecting a video scene change according to claim 3, wherein the first threshold T1 = 0.2, the second threshold T2 = 0.8, the third threshold T3 = 0.8, and the fourth threshold T4 = 0.2 are set in advance.
The method for detecting a video scene change according to claim 3, wherein the first threshold T1=0.14, the second threshold T2=0.92, the third threshold T3=0.94, and the fourth threshold T4=0.13 are preset.
The method for detecting a video scene change according to claim 1, wherein the step of calculating a field average histogram of each video sequence comprises:

Calculating a H component histogram of each image frame in each video sequence;

The histograms of the hue H components of each image frame are superimposed and their mean values are respectively taken, and the histograms of the average hue of each video sequence are respectively calculated.
The method for detecting a video scene change according to claim 1, wherein the step C specifically includes:

After obtaining the corresponding histogram of the hue of the video sequence, the total number of pixels counted by the histogram of the hue tone is calculated;

The normalized histogram of the histogram of the histogram is divided by the number of pixels of each field of the hue histogram divided by the total number of pixels.
A method for detecting a change of a video scene according to claim 1, wherein said step D calculates a matching coefficient between a histogram histogram H 1 (K) and a histogram histogram H 2 (K) ξ Use the following formula:
K represents the tone level of the pixel, K = 1, 2, 3, ..., Q, Q is the maximum tone level.