CN107657228B

CN107657228B - Video scene similarity analysis method and system, and video encoding and decoding method and system

Info

Publication number: CN107657228B
Application number: CN201710873784.2A
Authority: CN
Inventors: 叶龙; 彭剑民; 林秀桃; 钟微; 张勤
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2017-09-25
Filing date: 2017-09-25
Publication date: 2020-08-04
Anticipated expiration: 2037-09-25
Also published as: CN107657228A

Abstract

The invention provides a video scene similarity analysis method and system and a video coding and decoding method and system, wherein the analysis method comprises the following steps: respectively selecting a frame of image from each shot of the video as a key frame; extracting the feature vector of each key frame and constructing a similarity matrix; and according to the similarity matrix, clustering each key frame by taking the sum of the attraction degree and the attribution degree as a linear core, and setting a clustering label of each key frame. The video coding and decoding method comprises the video scene similarity analysis method; compressing the key frames grouped into one class as a GOP; compressing the key frame, reconstructing the key frame at an encoding end, and placing the key frame in a frame buffer area, wherein the B frames and the P frames of the rest GOPs find the key frame corresponding to the frame buffer area through respective key frame indexes to perform inter-frame prediction encoding; when decoding, all key frames are decoded first, and then B frames and P frames are decoded according to the clustering labels. The method and the system can mine the correlation between the discontinuous frames and the discontinuous GOPs.

Description

Video scene similarity analysis method and system, and video encoding and decoding method and system

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a video scene similarity analysis method, a video encoding and decoding method, a video scene similarity analysis system, and a video encoding and decoding system.

Background

With the rapid development of internet industry in recent years, carriers of various multimedia contents such as digital image video and audio are also increasing explosively, so that bandwidth and storage are subject to great pressure if image video data is not compressed redundantly. Conventional redundancy compression methods are intra prediction coding of key frames and inter prediction coding of consecutive frames, and do not consider redundancy between non-consecutive frames and non-consecutive GOPs (groups of Pictures).

In the field of image video compression coding research, the mainstream idea at present is expanded and improved based on 'prediction-transformation-entropy coding'. Although the development of applications has been successful, the compression efficiency has reached a bottleneck. Therefore, people begin to analyze the content of the image video and combine with the video compression technology to break through the traditional image video compression technology bottle.

In 2007, L iu et al propose an image coding method based on image inpainting technology, the idea of the method is to divide the image into a structure region and a texture region, and use analysis tools such as image edge extraction technology and texture detection, each region is divided into three types of contents, namely necessary content, partial necessary content and redundant content, the necessary structure information and texture information are necessary content, and the necessary content is needed to be referred to determine whether the gradient change is large, and the part needing to be restored according to the former two is partial necessary content, for the different content regions separated, different coding methods are adopted, the redundancy is not needed, so that the coding efficiency can be greatly improved, on the basis of the work, L iu et al combine the image content analysis with the traditional compression coding method, propose an edge-based intra-frame prediction method, and make the direction of prediction more self-adaptive, and the image local continuity described by using the image edge structure information and laplace equation can be compatible with the existing intra-frame coding standard.

Conventional video coding systems generally use the correlation between consecutive frames to remove temporal redundancy, and only search for frames within a GOP for reference even with the multi-reference frame technique, so that the number of I frames (key frames) is large and the correlation between I frames is not considered.

Disclosure of Invention

In view of the foregoing problems, it is an object of the present invention to provide a video scene similarity analysis method, a video encoding and decoding method, a video scene similarity analysis system, and a video encoding and decoding system for mining the correlation between discontinuous frames and discontinuous GOPs.

According to an aspect of the present invention, there is provided a video scene similarity analysis method, including: respectively selecting a frame of image from each shot of a video as a key frame of each shot; extracting the feature vector of each key frame, measuring the similarity between the key frames according to the feature vector, and constructing a similarity matrix; and according to the similarity matrix, clustering each key frame by taking the sum of the attraction degree and the attribution degree as a linear core, and setting a clustering label of each key frame, wherein the clustering label is represented by a clustering center corresponding to the key frame, the attraction degree represents information transmitted from the key frame to the clustering center, and the attribution degree represents information transmitted from the clustering center to the key frame.

According to another aspect of the present invention, there is provided a video encoding and decoding method, including: the video scene similarity analysis method is provided; compressing the key frames grouped into one class as a GOP; compressing the key frame, reconstructing the compressed key frame at an encoding end, and placing the compressed key frame in a frame buffer area, and finding the key frame corresponding to the frame buffer area by the B frame (bidirectional difference frame) and the P frame (difference frame with the previous frame) of the rest GOPs through respective key frame indexes to perform inter-frame prediction encoding; when decoding, all key frames are decoded first, and then B frames and P frames are decoded according to the clustering labels.

According to a third aspect of the present invention, there is provided a video scene similarity analysis system, comprising: the key frame extraction module is used for respectively selecting a frame of image from each shot of the video as a key frame of each shot and sending each key frame to the similarity matrix construction module; the similarity matrix construction module is used for extracting the feature vector of each key frame, measuring the similarity between the key frames according to the feature vector, constructing a similarity matrix and sending the similarity matrix to the clustering module; and the clustering module is used for clustering each key frame by taking the sum of the attraction degree and the attribution degree as a linear check according to the similarity matrix, setting a clustering label of each key frame, wherein the clustering label is represented by a clustering center corresponding to the key frame, the attraction degree represents information transmitted from the key frame to the clustering center, and the attribution degree represents information transmitted from the clustering center to the key frame.

According to a fourth aspect of the present invention, there is provided a video coding and decoding system, comprising:

the video scene similarity analysis system; a compression unit for compressing the key frames grouped into one group as a GOP; the encoding part is used for reconstructing the compressed key frames at an encoding end and placing the compressed key frames in a frame buffer area, and finding the key frames corresponding to the frame buffer area by the B frames and the P frames of the rest GOPs through respective key frame indexes to perform inter-frame prediction encoding; and the decoding part decodes all the key frames and then decodes the B frames and the P frames according to the clustering labels.

The video scene similarity analysis method and the video scene similarity analysis system cluster I frames (key frames), and compress the I frames which are clustered into one class as a GOP, so that the number of the I frames which are independently predicted in the frame is reduced, and the data volume of the I frames is further compressed.

The video coding and decoding method and the video coding and decoding system provided by the invention utilize the similarity of video scenes to cluster the I frames occupying the most code rate in each GOP, excavate the correlation between discontinuous frames and discontinuous GOPs, are combined with the video compression technology to greatly improve the video compression efficiency, and can improve the compression efficiency by 3% -4% compared with the traditional coding when the scene similarity analysis technology is applied to the compression of the I frames under the condition that the PSNR is basically maintained.

Drawings

Other objects and results of the present invention will become more apparent and more readily appreciated as the same becomes better understood by reference to the following description and appended claims, taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 is a flow chart of a video scene similarity analysis method according to the present invention;

FIG. 2 is a flow chart of constructing a similarity matrix according to the present invention;

FIG. 3 is a flow chart of clustering key frames according to the present invention using the sum of the attraction degree and the attribution degree as a linear core;

FIG. 4 is a schematic diagram of the clustering compression of key frames according to the present invention;

FIG. 5 is a block diagram of a video scene similarity analysis system according to the present invention;

fig. 6 is a block diagram of the video codec system according to the present invention.

The same reference numbers in all figures indicate similar or corresponding features or functions.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that such embodiment(s) may be practiced without these specific details. Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a video scene similarity analysis method according to the present invention, and as shown in fig. 1, the video scene similarity analysis method includes:

step 1, respectively selecting a frame of image from each shot of a video as an I frame of each shot;

step 2, extracting the feature vector of each I frame, measuring the similarity between the I frames according to the feature vector, and constructing a similarity matrix;

and 3, according to the similarity matrix, taking the sum of the attraction degree and the attribution degree as a linear check to cluster the I frames, and setting the clustering labels of the I frames, wherein the clustering labels are represented by the clustering centers corresponding to the I frames, the attraction degree represents the information transmitted from the I frames to the clustering centers, and the attribution degree represents the information transmitted from the clustering centers to the I frames.

For videos with high content regression rate, the relevance between non-continuous frames and non-continuous GOPs is still very large, the video scene similarity analysis method clusters the I frames occupying the most code rate in each GOP, excavates the relevance between the non-continuous frames and the non-continuous GOPs, reduces the number of the I frames independently subjected to intra-frame prediction, and further compresses the data volume of the I frames.

In a preferred embodiment of the present invention, the step 1 includes: detecting the boundaries of the shots, and determining two adjacent boundaries as a shot; selecting a frame at a set position in each shot as an I frame, for example, determining each set number of frames (15 frames) as a shot, then using the first frame in each set number of frames as a key frame of the shot, or, for example, directly selecting the first frame, the last frame, or an internal frame in the shot as an I frame of the shot according to a requirement.

In a preferred embodiment of the present invention, as shown in fig. 2, the step 2 includes:

step 21, the image of the key frame is blocked to form sub-blocks with the same size, a histogram of the color of the sub-blocks is obtained, a cumulative histogram of the key frame is obtained through accumulation, the histogram represents the distribution situation of all pixel chromaticity space values on one image, it reflects the distribution of the image color space and the basic tones, but the histogram does not include the location features, therefore, different images may obtain the same histogram feature through calculation, and aiming at the problem, the invention adopts a representation method of cumulative histograms, that is, instead of directly calculating the image histogram, the image histogram is preprocessed first to divide the image into blocks with the same size, and then the color histogram of the original image sub-blocks is calculated, then, a cumulative histogram is calculated, and the cumulative histogram is a method with higher robustness compared with a color histogram;

step 22, extracting the color feature vector of the cumulative histogram of each I frame by using a color space, for example, using an HSV color space, performing non-uniform quantization on HSV values to reduce the calculated amount, generally dividing hue H into eight quantization regions, dividing saturation S and brightness V into three quantization regions, and according to the difference of the value ranges thereof, specifically quantizing the hue H as follows:

the invention quantizes the chroma H into 8 parts, the saturation S into 3 parts and the brightness V into 3 parts, thus obtaining a 72-dimensional characteristic vector;

step 23, extracting the texture feature vector of each I frame by using a gray level co-occurrence matrix, specifically: constructing a gray level co-occurrence matrix of each key frame; constructing a texture feature vector by adopting one or more indexes of energy, inertia, correlation, entropy, local uniformity and maximum probability; normalizing the texture feature vector by adopting Gaussian normalization;

step 24, constructing a similarity matrix according to the following formula (1) by using the color feature vector and the texture feature vector of each I frame,

where m, n are two I-frames, s_m,nRepresents the similarity of I frame m and I frame n, (x)₁,x₂,...,x_d) Is the feature vector of I frame m, (y)₁,y₂,...,y_d) Is an I-frame n feature vector.

Preferably, in step 23, a co-occurrence matrix in 4 directions (0, pi/4, pi/2, 3 pi/4) is constructed, gaussian normalization is applied to the co-occurrence matrix, and the mean and standard deviation of the 4 parameters of inertia, energy entropy, correlation and entropy are taken as components to describe the texture feature vector, so as to obtain an 8-dimensional texture feature vector.

In addition, preferably, a similarity matrix, for example, (x) may be constructed according to the above formula (1) from the color texture synthesis feature vector₁,x₂,...,x_d) In the feature vector of I frame m (x)₁,x₂,...,x_j) Is a color feature vector, (x)_j+1,x_j+2,...,x_d) For the texture feature vector, the color texture synthesis feature vector is (α x)₁,αx₂,...,αx_j,βx_j+1,...,βx_d) α and β are characteristic coefficients, α + β being 1.

In a preferred embodiment of the present invention, as shown in fig. 3, the step 3 includes:

step 31, initializing the similarity matrix, using the average value (reference P) of each similarity in the similarity matrix as the self-similarity of each I frame, setting the initial attribution degree and the initial attraction degree between each I frame in the attribution matrix and the attraction matrix to be 0,

wherein S is a similarity matrix, and the average value of all similarities in the similarity matrix S is used as S_1,1,s_2,2,…,s_m,mA value of (d); a is a attribution degree matrix, a_m,nRepresenting the attribution degree of the I frame m to the I frame n; r is an attraction matrix, R_m，nRepresenting the attraction of the I frame n to the I frame m;

step 32, updating the attraction degree matrix according to the similarity degree matrix by adopting the following formula (2),

wherein r'_m,nAdopting the attraction degree of the key frame n to the key frame m after the similarity matrix is updated for the initial attribution degree and the initial attraction degree which are both 0;

step 33, updating the attribution degree matrix according to the following formulas (3) and (4) according to the updated attraction degree matrix,

wherein, a'_m,nAnd a'_m,mRespectively carrying out the attribution degree of the key frame m by the key frame n updated according to the updated attraction degree matrix and the autocorrelation attribution degree of the key frame m;

step 34, setting a damping factor λ (for example, 0.5) and the number of iterations T, iteratively updating the attraction degree matrix and the attribution degree matrix by using the following equations (5) and (6),

wherein the content of the first and second substances,

for the attraction of the keyframe n to the keyframe m after being updated by T iterations,

attribution degree of the key frame n to the key frame m after the iteration update for T times;

step 35, after the iterative update, filtering out the I frame whose sum of the autocorrelation attribution degree and the autocorrelation attraction degree is not more than 0, namely judging whether the autocorrelation attribution degree and the autocorrelation attraction degree of each key frame accord with the following formula (7), filtering out the key frames which do not accord with the following formula (7), wherein,

step 36, the maximum value of the sum of the attribution degree and the attraction degree of the I frame meeting the condition that the autocorrelation attribution degree and the autocorrelation attraction degree are more than 0 and other I frames

And the corresponding other I frame is used as the clustering center of the I frame, so that the clustering center of each I frame meeting the condition is obtained, and the I frames belonging to the same clustering center are clustered into one class.

According to the method for clustering each key frame by using the sum of the attraction degree and the attribution degree as the linear check according to the similarity matrix, the defect that a traditional clustering algorithm is sensitive when an initial value is selected is overcome, all data are used as candidate clustering centers, each data needs to be continuously calculated, iterated and updated through messages transmitted among the data, the data competing to become the clustering centers or other clustering centers are finally determined, and finally a plurality of high-quality clustering centers are obtained after iteration is finished.

In addition, the above-mentioned clustering method "message passing mechanism" performs the passing of two kinds of information, i.e., the attraction (responsiveness) and the attribution (availabilitity), between the data point and the candidate clustering center, and the attraction is

Indicating information transferred from the data to the candidate cluster center k, degree of attribution

Representing the information passed from candidate cluster center k to data point i; the attraction degree represents the appropriate degree of the k point becoming the clustering center of the data point i, and the attribution degree represents the appropriate degree of the i point taking the k point as the clustering center;

and

the larger the probability is, the higher the probability that the data point i has k points as its own cluster center is, and the higher the probability that the k points become the cluster centers is.

Furthermore, a Reference (Reference) is associated with the attraction and the attribution, and the Reference is not information to be transmitted but a criterion. I.e. using the value S on the diagonal of the similarity matrix S_k,kTo judge whether the k point can become a clustering center, s_k,kThe larger the k point becomes the cluster center, the higher the probability that the k point becomes the cluster center, and in the initialization process of the cluster, any data point i does not belong to any class, and any data k is not the cluster center, so the mean value of the matrix S is assigned to the reference P of all data points.

The video scene similarity analysis method can be applied to a video coding and decoding method, which includes the video analysis method, and further includes:

compressing I frames grouped into one class as a GOP, wherein as shown in FIG. 4, the I frames of F01-F20 respectively have respective clustering centers which can be represented by a code number, such as S1-S4, and compressing the I frames grouped into one class as a GOP, such as GOP1-GOP 4;

compressing the I frame, reconstructing the I frame at an encoding end, and placing the I frame in a frame buffer area, wherein B frames and P frames of other GOPs find the I frame corresponding to the frame buffer area through respective I frame indexes to perform inter-frame prediction encoding;

when decoding, all I frames are decoded, and then B frames and P frames are decoded according to the clustering labels.

Fig. 5 is a block diagram of a video scene similarity analysis system according to the present invention, and as shown in fig. 5, the video scene similarity analysis system 100 includes:

the key frame extraction module 110 is configured to select a frame of image from each shot of the video as an I frame of each shot, and send each I frame to the similarity matrix construction module 120;

the similarity matrix construction module 120 extracts the feature vector of each frame I, measures the similarity between frames I according to the feature vector, constructs a similarity matrix, and sends the similarity matrix to the clustering module 130;

the clustering module 130 is configured to perform clustering on each I frame by using the sum of the attraction degree and the attribution degree as a linear check according to the similarity matrix, and set a clustering label of each I frame, wherein the clustering label is represented by a clustering center corresponding to the I frame, the attraction degree represents information transmitted from the I frame to the clustering center, and the attribution degree represents information transmitted from the clustering center to the I frame.

In a preferred embodiment of the present invention, the key frame extracting module 11 includes:

a shot determining unit 111 that detects a boundary of a shot and determines a space between two adjacent boundaries as a shot;

the key frame extracting unit 112 selects a frame at a predetermined position in each shot as an I frame.

In a preferred embodiment of the present invention, the similarity matrix structure modeling 120 block includes:

the cumulative histogram construction unit 121 divides the image of the I frame into blocks to form sub-blocks with the same size, obtains a histogram of colors of the sub-blocks, obtains the cumulative histogram of the I frame through accumulation, and sends the cumulative histogram to the color feature vector extraction unit 122;

the color feature vector extraction unit 122 extracts the color feature vector of the cumulative histogram of each I frame by using a color space, and sends the color feature vector of each I frame to the similarity matrix construction unit 124;

the texture feature vector extraction unit 123 extracts the texture feature vector of each I frame by using the gray level co-occurrence matrix, and sends the texture feature vector of each I frame to the similarity matrix construction unit 124;

a similarity matrix construction unit 124 constructing a similarity matrix according to the following equation (1) using the color feature vector and texture feature vector of each I frame,

where m, n are two I-frames, s_m,nRepresents the similarity of I frame m and I frame n, (x)₁,x₂,…,x_d) Is the feature vector of I frame m, (y)₁,y₂,…,y_d) Is an I-frame n feature vector.

In a preferred embodiment of the present invention, the clustering module 130 includes:

the initializing unit 131 initializes the similarity matrix, uses an average value of each similarity in the similarity matrix as a self-similarity of each I frame, and sends the initialized similarity matrix to the first updating unit 133, wherein,

wherein S is a similarity matrix, and the average value P of all similarities in the similarity matrix S is used as S_1,1,s_2,2,…,s_m,mA value of (d);

the setting unit 132 sets the initial attribution degree and the initial attraction degree between the I frames in the attribution degree matrix and the attraction degree matrix to be 0, sends the initial attribution degree and the initial attraction degree to the first updating unit 133, sets the damping factor λ and the iteration number T, sends the damping factor λ and the iteration number T to the second updating unit 134, wherein,

wherein A is a attribution degree matrix, a_m,nRepresenting the attribution degree of the I frame m to the I frame n; r is an attraction matrix, R_m,nRepresenting the attraction of the I frame n to the I frame m;

the first updating unit 133 updates the affinity matrix according to the similarity matrix by using the following expression (2), updates the attribution matrix according to the following expressions (3) and (4) according to the updated affinity matrix, transmits the updated affinity matrix and attribution matrix to the second updating unit 134,

wherein r'_m,nThe attraction degree a 'of the key frame n to the key frame m updated by the similarity matrix is adopted for both the initial attribution degree and the initial attraction degree being 0'_m,nAnd a'_m,mRespectively carrying out the attribution degree of the key frame m by the key frame n updated according to the updated attraction degree matrix and the autocorrelation attribution degree of the key frame m;

the second updating unit 134 iteratively updates the attraction degree matrix and the attribution degree matrix updated by the first updating unit by using the following formulas (5) and (6), and sends the attraction degree matrix and the attribution degree matrix after T iterations to the screening unit 135, wherein,

wherein the content of the first and second substances,

the screening unit 135 filters out the I frame whose sum of the autocorrelation attribution degree and the autocorrelation attraction degree is not greater than 0, and sends the attraction degree matrix and the attribution degree matrix, which are used for filtering out the vectors corresponding to the I frame that do not meet the above conditions, to the clustering unit 136;

the clustering unit 136 uses the attraction degree matrix and the other I frame corresponding to the maximum value of the sum of the attraction degree and the attraction degree of the I frame and the other I frame in the attraction degree matrix filtered by the screening unit as the clustering center of the I frame, so as to obtain the clustering center of each I frame meeting the conditions, and cluster the I frames belonging to the same clustering center into one class.

The video scene similarity analysis system can be applied to a video coding and decoding system, as shown in fig. 6, the video coding and decoding system 1000 includes, in addition to the video scene similarity analysis system 100, the following:

a compression unit 200 for compressing I frames grouped into one group as one GOP;

an encoding part 300, which reconstructs the I frame after compressing the I frame at an encoding end and places the I frame in a frame buffer area, and B frames and P frames of other GOPs find the I frame corresponding to the frame buffer area through respective I frame indexes to perform inter-frame prediction encoding;

the decoding unit 400 decodes all I frames first, and then decodes B frames and P frames based on the cluster labels.

In summary, the video scene similarity analysis method, the video encoding and decoding method, the video scene similarity analysis system and the video encoding and decoding system proposed by the present invention are described by way of example with reference to the accompanying drawings. However, it will be appreciated by those skilled in the art that various modifications could be made to the system and method of the present invention described above without departing from the spirit of the invention. Therefore, the scope of the present invention should be determined by the contents of the appended claims.

Claims

1. A video scene similarity analysis method is characterized by comprising the following steps:

respectively selecting a frame of image from each shot of a video as a key frame of each shot;

extracting the feature vector of each key frame, measuring the similarity between the key frames according to the feature vector, and constructing a similarity matrix;

the method for constructing the similarity matrix comprises the following steps:

the method comprises the steps of partitioning an image of a key frame into sub-blocks with the same size to obtain a histogram of colors of the sub-blocks, and accumulating to obtain an accumulated histogram of the key frame;

extracting a color feature vector of the cumulative histogram of each key frame by adopting a color space;

extracting a texture feature vector of each key frame by adopting a gray level co-occurrence matrix;

constructing a similarity matrix according to the following formula (1) by using the color feature vector and the texture feature vector of each key frame,

where m, n are two key frames, s_m,nRepresenting the similarity of key frame m and key frame n, (x)₁,x₂,...,x_d) Is composed ofFeature vector of key frame m, (y)₁,y₂,...,y_d) The feature vector of the key frame n;

according to the similarity matrix, taking the sum of the attraction degree and the attribution degree as a linear check to cluster each key frame, and setting a cluster label of each key frame, wherein the cluster label is represented by a cluster center corresponding to the key frame, the attraction degree represents information transmitted from the key frame to the cluster center of the key frame, and the attribution degree represents information transmitted from the cluster center to the key frame;

the method for clustering each key frame by taking the sum of the attraction degree and the attribution degree as a linear core according to the similarity matrix and setting the clustering label of each key frame comprises the following steps:

initializing the similarity matrix, taking the average value of each similarity in the similarity matrix as the self-similarity of each key frame,

setting the initial attribution degree and the initial attraction degree between the key frames in the attribution degree matrix and the attraction degree matrix to be 0, wherein,

wherein A is a attribution degree matrix, a_m,nRepresenting the attribution degree of the key frame m to the key frame n; r is an attraction matrix, R_m,nRepresenting the attraction of the key frame n to the key frame m;

updating the attraction degree matrix according to the similarity degree matrix by adopting the following formula (2),

updating the attribution degree matrix according to the following formulas (3) and (4) according to the updated attraction degree matrix,

setting a damping factor lambda and an iteration number T, and performing iterative update on the attraction degree matrix and the attribution degree matrix by using the following formulas (5) and (6),

wherein the content of the first and second substances,

filtering the key frames with the sum of the autocorrelation attribution degree and the autocorrelation attraction degree not more than 0 after the iterative updating;

and taking another key frame corresponding to the maximum value of the sum of the attribution degree and the attraction degree of the key frame and other key frames which meet the condition that the autocorrelation attribution degree and the autocorrelation attraction degree are greater than 0 as the clustering center of the key frame, thereby obtaining the clustering center of each key frame which meets the condition, and clustering the key frames belonging to the same clustering center into one class.

2. The method of claim 1, wherein the selecting a frame of image from each shot of the video as the key frame of each shot comprises:

detecting the boundaries of the shots, and determining two adjacent boundaries as a shot;

and selecting a frame at a set position in each lens as a key frame.

3. A video encoding and decoding method, comprising:

the video scene similarity analysis method of any one of claims 1-2;

compressing the key frames grouped into one class as a GOP;

compressing the key frame, reconstructing the key frame at an encoding end, and placing the key frame in a frame buffer area, wherein the B frames and the P frames of the rest GOPs find the key frame corresponding to the frame buffer area through respective key frame indexes to perform inter-frame prediction encoding;

when decoding, all key frames are decoded first, and then B frames and P frames are decoded according to the clustering labels.

4. A video scene similarity analysis system, comprising:

the key frame extraction module is used for respectively selecting a frame of image from each shot of the video as a key frame of each shot and sending each key frame to the similarity matrix construction module;

the similarity matrix construction module is used for extracting the feature vector of each key frame, measuring the similarity between the key frames according to the feature vector, constructing a similarity matrix and sending the similarity matrix to the clustering module;

the similarity matrix construction module comprises:

the accumulated histogram construction unit is used for blocking the image of the key frame to form sub-blocks with the same size to obtain a color histogram of the sub-blocks, accumulating to obtain the accumulated histogram of the key frame, and sending the accumulated histogram to the color feature vector extraction unit;

the color feature vector extraction unit is used for extracting the color feature vector of the cumulative histogram of each key frame by adopting a color space and sending the color feature vector of each key frame to the similarity matrix construction unit;

the texture feature vector extraction unit is used for extracting the texture feature vector of each key frame by adopting a gray level co-occurrence matrix and sending the texture feature vector of each key frame to the similarity matrix construction unit;

a similarity matrix construction unit for constructing a similarity matrix according to the following formula (1) using the color feature vector and texture feature vector of each key frame,

where m, n are two key frames, s_m,nRepresenting the similarity of key frame m and key frame n, (x)₁,x₂,...,x_d) Is the feature vector of the key frame m, (y)₁,y₂,...,y_d) The feature vector of the key frame n;

the clustering module is used for clustering each key frame by taking the sum of the attraction degree and the attribution degree as a linear check according to the similarity matrix, and setting a clustering label of each key frame, wherein the clustering label is represented by a clustering center corresponding to the key frame, the attraction degree represents information transmitted from the key frame to the clustering center, and the attribution degree represents information transmitted from the clustering center to the key frame;

the clustering module comprises:

an initialization unit for initializing the similarity matrix, taking the average value of each similarity in the similarity matrix as the self-similarity of each key frame, and sending the initialized similarity matrix to a first updating unit,

a setting unit for setting the initial attribution degree and the initial attraction degree between the key frames in the attribution degree matrix and the attraction degree matrix to be 0, sending the initial attribution degree and the initial attraction degree to the first updating unit, setting the damping factor lambda and the iteration number T, sending the damping factor lambda and the iteration number T to the second updating unit, wherein,

a first updating unit for updating the attraction degree matrix according to the similarity matrix by the following formula (2), updating the attribution degree matrix according to the updated attraction degree matrix by the following formulas (3) and (4), and sending the updated attraction degree matrix and the attribution degree matrix to a second updating unit,

wherein r'_m,nAdopting the attraction degree of the key frame n to the key frame m after the similarity matrix is updated for the initial attribution degree and the initial attraction degree which are both 0; a'_m,nAnd a'_m,mRespectively carrying out the attribution degree of the key frame m by the key frame n updated according to the updated attraction degree matrix and the autocorrelation attribution degree of the key frame m;

a second updating unit which iteratively updates the attraction degree matrix and the attribution degree matrix updated by the first updating unit by using the following formulas (5) and (6) and sends the attraction degree matrix and the attribution degree matrix after T iterations to the screening unit, wherein,

wherein the content of the first and second substances,

the screening unit filters the key frames with the sum of the autocorrelation attribution degree and the autocorrelation attraction degree not greater than 0, and sends the attraction degree matrix and the attribution degree matrix of the vectors corresponding to the key frames which do not meet the conditions to the clustering unit;

and the clustering unit is used for taking another key frame corresponding to the maximum value of the sum of the attribution degree and the attraction degree of the key frame and other key frames in the attraction degree matrix and the attribution degree matrix filtered by the screening unit as a clustering center of the key frame, so that the clustering center of each key frame meeting the conditions is obtained, and the key frames belonging to the same clustering center are clustered into one class.

5. The video scene similarity analysis system according to claim 4, wherein the key frame extraction module comprises:

the shot determining unit detects the boundaries of the shots and determines the interval between two adjacent boundaries as a shot;

the key frame extraction unit selects a frame at a set position in each lens as a key frame.

6. A video coding/decoding system, comprising:

the video scene similarity analysis system of any one of claims 4-5;

a compression unit for compressing the key frames grouped into one group as a GOP;

the encoding part is used for reconstructing the compressed key frames at an encoding end and placing the compressed key frames in a frame buffer area, and finding the key frames corresponding to the frame buffer area by the B frames and the P frames of the rest GOPs through respective key frame indexes to perform inter-frame prediction encoding;

and the decoding part decodes all the key frames and then decodes the B frames and the P frames according to the clustering labels.