CN107657228B - Video scene similarity analysis method and system, and video encoding and decoding method and system - Google Patents

Video scene similarity analysis method and system, and video encoding and decoding method and system Download PDF

Info

Publication number
CN107657228B
CN107657228B CN201710873784.2A CN201710873784A CN107657228B CN 107657228 B CN107657228 B CN 107657228B CN 201710873784 A CN201710873784 A CN 201710873784A CN 107657228 B CN107657228 B CN 107657228B
Authority
CN
China
Prior art keywords
key frame
matrix
degree
frame
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710873784.2A
Other languages
Chinese (zh)
Other versions
CN107657228A (en
Inventor
叶龙
彭剑民
林秀桃
钟微
张勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Communication University of China
Original Assignee
Communication University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Communication University of China filed Critical Communication University of China
Priority to CN201710873784.2A priority Critical patent/CN107657228B/en
Publication of CN107657228A publication Critical patent/CN107657228A/en
Application granted granted Critical
Publication of CN107657228B publication Critical patent/CN107657228B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention provides a video scene similarity analysis method and system and a video coding and decoding method and system, wherein the analysis method comprises the following steps: respectively selecting a frame of image from each shot of the video as a key frame; extracting the feature vector of each key frame and constructing a similarity matrix; and according to the similarity matrix, clustering each key frame by taking the sum of the attraction degree and the attribution degree as a linear core, and setting a clustering label of each key frame. The video coding and decoding method comprises the video scene similarity analysis method; compressing the key frames grouped into one class as a GOP; compressing the key frame, reconstructing the key frame at an encoding end, and placing the key frame in a frame buffer area, wherein the B frames and the P frames of the rest GOPs find the key frame corresponding to the frame buffer area through respective key frame indexes to perform inter-frame prediction encoding; when decoding, all key frames are decoded first, and then B frames and P frames are decoded according to the clustering labels. The method and the system can mine the correlation between the discontinuous frames and the discontinuous GOPs.

Description

Video scene similarity analysis method and system, and video encoding and decoding method and system
Technical Field
The present invention relates to the field of video processing technologies, and in particular, to a video scene similarity analysis method, a video encoding and decoding method, a video scene similarity analysis system, and a video encoding and decoding system.
Background
With the rapid development of internet industry in recent years, carriers of various multimedia contents such as digital image video and audio are also increasing explosively, so that bandwidth and storage are subject to great pressure if image video data is not compressed redundantly. Conventional redundancy compression methods are intra prediction coding of key frames and inter prediction coding of consecutive frames, and do not consider redundancy between non-consecutive frames and non-consecutive GOPs (groups of Pictures).
In the field of image video compression coding research, the mainstream idea at present is expanded and improved based on 'prediction-transformation-entropy coding'. Although the development of applications has been successful, the compression efficiency has reached a bottleneck. Therefore, people begin to analyze the content of the image video and combine with the video compression technology to break through the traditional image video compression technology bottle.
In 2007, L iu et al propose an image coding method based on image inpainting technology, the idea of the method is to divide the image into a structure region and a texture region, and use analysis tools such as image edge extraction technology and texture detection, each region is divided into three types of contents, namely necessary content, partial necessary content and redundant content, the necessary structure information and texture information are necessary content, and the necessary content is needed to be referred to determine whether the gradient change is large, and the part needing to be restored according to the former two is partial necessary content, for the different content regions separated, different coding methods are adopted, the redundancy is not needed, so that the coding efficiency can be greatly improved, on the basis of the work, L iu et al combine the image content analysis with the traditional compression coding method, propose an edge-based intra-frame prediction method, and make the direction of prediction more self-adaptive, and the image local continuity described by using the image edge structure information and laplace equation can be compatible with the existing intra-frame coding standard.
Conventional video coding systems generally use the correlation between consecutive frames to remove temporal redundancy, and only search for frames within a GOP for reference even with the multi-reference frame technique, so that the number of I frames (key frames) is large and the correlation between I frames is not considered.
Disclosure of Invention
In view of the foregoing problems, it is an object of the present invention to provide a video scene similarity analysis method, a video encoding and decoding method, a video scene similarity analysis system, and a video encoding and decoding system for mining the correlation between discontinuous frames and discontinuous GOPs.
According to an aspect of the present invention, there is provided a video scene similarity analysis method, including: respectively selecting a frame of image from each shot of a video as a key frame of each shot; extracting the feature vector of each key frame, measuring the similarity between the key frames according to the feature vector, and constructing a similarity matrix; and according to the similarity matrix, clustering each key frame by taking the sum of the attraction degree and the attribution degree as a linear core, and setting a clustering label of each key frame, wherein the clustering label is represented by a clustering center corresponding to the key frame, the attraction degree represents information transmitted from the key frame to the clustering center, and the attribution degree represents information transmitted from the clustering center to the key frame.
According to another aspect of the present invention, there is provided a video encoding and decoding method, including: the video scene similarity analysis method is provided; compressing the key frames grouped into one class as a GOP; compressing the key frame, reconstructing the compressed key frame at an encoding end, and placing the compressed key frame in a frame buffer area, and finding the key frame corresponding to the frame buffer area by the B frame (bidirectional difference frame) and the P frame (difference frame with the previous frame) of the rest GOPs through respective key frame indexes to perform inter-frame prediction encoding; when decoding, all key frames are decoded first, and then B frames and P frames are decoded according to the clustering labels.
According to a third aspect of the present invention, there is provided a video scene similarity analysis system, comprising: the key frame extraction module is used for respectively selecting a frame of image from each shot of the video as a key frame of each shot and sending each key frame to the similarity matrix construction module; the similarity matrix construction module is used for extracting the feature vector of each key frame, measuring the similarity between the key frames according to the feature vector, constructing a similarity matrix and sending the similarity matrix to the clustering module; and the clustering module is used for clustering each key frame by taking the sum of the attraction degree and the attribution degree as a linear check according to the similarity matrix, setting a clustering label of each key frame, wherein the clustering label is represented by a clustering center corresponding to the key frame, the attraction degree represents information transmitted from the key frame to the clustering center, and the attribution degree represents information transmitted from the clustering center to the key frame.
According to a fourth aspect of the present invention, there is provided a video coding and decoding system, comprising:
the video scene similarity analysis system; a compression unit for compressing the key frames grouped into one group as a GOP; the encoding part is used for reconstructing the compressed key frames at an encoding end and placing the compressed key frames in a frame buffer area, and finding the key frames corresponding to the frame buffer area by the B frames and the P frames of the rest GOPs through respective key frame indexes to perform inter-frame prediction encoding; and the decoding part decodes all the key frames and then decodes the B frames and the P frames according to the clustering labels.
The video scene similarity analysis method and the video scene similarity analysis system cluster I frames (key frames), and compress the I frames which are clustered into one class as a GOP, so that the number of the I frames which are independently predicted in the frame is reduced, and the data volume of the I frames is further compressed.
The video coding and decoding method and the video coding and decoding system provided by the invention utilize the similarity of video scenes to cluster the I frames occupying the most code rate in each GOP, excavate the correlation between discontinuous frames and discontinuous GOPs, are combined with the video compression technology to greatly improve the video compression efficiency, and can improve the compression efficiency by 3% -4% compared with the traditional coding when the scene similarity analysis technology is applied to the compression of the I frames under the condition that the PSNR is basically maintained.
Drawings
Other objects and results of the present invention will become more apparent and more readily appreciated as the same becomes better understood by reference to the following description and appended claims, taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 is a flow chart of a video scene similarity analysis method according to the present invention;
FIG. 2 is a flow chart of constructing a similarity matrix according to the present invention;
FIG. 3 is a flow chart of clustering key frames according to the present invention using the sum of the attraction degree and the attribution degree as a linear core;
FIG. 4 is a schematic diagram of the clustering compression of key frames according to the present invention;
FIG. 5 is a block diagram of a video scene similarity analysis system according to the present invention;
fig. 6 is a block diagram of the video codec system according to the present invention.
The same reference numbers in all figures indicate similar or corresponding features or functions.
Detailed Description
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that such embodiment(s) may be practiced without these specific details. Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a video scene similarity analysis method according to the present invention, and as shown in fig. 1, the video scene similarity analysis method includes:
step 1, respectively selecting a frame of image from each shot of a video as an I frame of each shot;
step 2, extracting the feature vector of each I frame, measuring the similarity between the I frames according to the feature vector, and constructing a similarity matrix;
and 3, according to the similarity matrix, taking the sum of the attraction degree and the attribution degree as a linear check to cluster the I frames, and setting the clustering labels of the I frames, wherein the clustering labels are represented by the clustering centers corresponding to the I frames, the attraction degree represents the information transmitted from the I frames to the clustering centers, and the attribution degree represents the information transmitted from the clustering centers to the I frames.
For videos with high content regression rate, the relevance between non-continuous frames and non-continuous GOPs is still very large, the video scene similarity analysis method clusters the I frames occupying the most code rate in each GOP, excavates the relevance between the non-continuous frames and the non-continuous GOPs, reduces the number of the I frames independently subjected to intra-frame prediction, and further compresses the data volume of the I frames.
In a preferred embodiment of the present invention, the step 1 includes: detecting the boundaries of the shots, and determining two adjacent boundaries as a shot; selecting a frame at a set position in each shot as an I frame, for example, determining each set number of frames (15 frames) as a shot, then using the first frame in each set number of frames as a key frame of the shot, or, for example, directly selecting the first frame, the last frame, or an internal frame in the shot as an I frame of the shot according to a requirement.
In a preferred embodiment of the present invention, as shown in fig. 2, the step 2 includes:
step 21, the image of the key frame is blocked to form sub-blocks with the same size, a histogram of the color of the sub-blocks is obtained, a cumulative histogram of the key frame is obtained through accumulation, the histogram represents the distribution situation of all pixel chromaticity space values on one image, it reflects the distribution of the image color space and the basic tones, but the histogram does not include the location features, therefore, different images may obtain the same histogram feature through calculation, and aiming at the problem, the invention adopts a representation method of cumulative histograms, that is, instead of directly calculating the image histogram, the image histogram is preprocessed first to divide the image into blocks with the same size, and then the color histogram of the original image sub-blocks is calculated, then, a cumulative histogram is calculated, and the cumulative histogram is a method with higher robustness compared with a color histogram;
step 22, extracting the color feature vector of the cumulative histogram of each I frame by using a color space, for example, using an HSV color space, performing non-uniform quantization on HSV values to reduce the calculated amount, generally dividing hue H into eight quantization regions, dividing saturation S and brightness V into three quantization regions, and according to the difference of the value ranges thereof, specifically quantizing the hue H as follows:
Figure BDA0001417670420000041
the invention quantizes the chroma H into 8 parts, the saturation S into 3 parts and the brightness V into 3 parts, thus obtaining a 72-dimensional characteristic vector;
step 23, extracting the texture feature vector of each I frame by using a gray level co-occurrence matrix, specifically: constructing a gray level co-occurrence matrix of each key frame; constructing a texture feature vector by adopting one or more indexes of energy, inertia, correlation, entropy, local uniformity and maximum probability; normalizing the texture feature vector by adopting Gaussian normalization;
step 24, constructing a similarity matrix according to the following formula (1) by using the color feature vector and the texture feature vector of each I frame,
Figure BDA0001417670420000042
where m, n are two I-frames, sm,nRepresents the similarity of I frame m and I frame n, (x)1,x2,...,xd) Is the feature vector of I frame m, (y)1,y2,...,yd) Is an I-frame n feature vector.
Preferably, in step 23, a co-occurrence matrix in 4 directions (0, pi/4, pi/2, 3 pi/4) is constructed, gaussian normalization is applied to the co-occurrence matrix, and the mean and standard deviation of the 4 parameters of inertia, energy entropy, correlation and entropy are taken as components to describe the texture feature vector, so as to obtain an 8-dimensional texture feature vector.
In addition, preferably, a similarity matrix, for example, (x) may be constructed according to the above formula (1) from the color texture synthesis feature vector1,x2,...,xd) In the feature vector of I frame m (x)1,x2,...,xj) Is a color feature vector, (x)j+1,xj+2,...,xd) For the texture feature vector, the color texture synthesis feature vector is (α x)1,αx2,...,αxj,βxj+1,...,βxd) α and β are characteristic coefficients, α + β being 1.
In a preferred embodiment of the present invention, as shown in fig. 3, the step 3 includes:
step 31, initializing the similarity matrix, using the average value (reference P) of each similarity in the similarity matrix as the self-similarity of each I frame, setting the initial attribution degree and the initial attraction degree between each I frame in the attribution matrix and the attraction matrix to be 0,
Figure BDA0001417670420000051
wherein S is a similarity matrix, and the average value of all similarities in the similarity matrix S is used as S1,1,s2,2,…,sm,mA value of (d); a is a attribution degree matrix, am,nRepresenting the attribution degree of the I frame m to the I frame n; r is an attraction matrix, Rm,nRepresenting the attraction of the I frame n to the I frame m;
step 32, updating the attraction degree matrix according to the similarity degree matrix by adopting the following formula (2),
Figure BDA0001417670420000052
wherein r'm,nAdopting the attraction degree of the key frame n to the key frame m after the similarity matrix is updated for the initial attribution degree and the initial attraction degree which are both 0;
step 33, updating the attribution degree matrix according to the following formulas (3) and (4) according to the updated attraction degree matrix,
Figure BDA0001417670420000053
Figure BDA0001417670420000054
wherein, a'm,nAnd a'm,mRespectively carrying out the attribution degree of the key frame m by the key frame n updated according to the updated attraction degree matrix and the autocorrelation attribution degree of the key frame m;
step 34, setting a damping factor λ (for example, 0.5) and the number of iterations T, iteratively updating the attraction degree matrix and the attribution degree matrix by using the following equations (5) and (6),
Figure BDA0001417670420000055
Figure BDA0001417670420000056
wherein the content of the first and second substances,
Figure BDA0001417670420000057
for the attraction of the keyframe n to the keyframe m after being updated by T iterations,
Figure BDA0001417670420000058
attribution degree of the key frame n to the key frame m after the iteration update for T times;
step 35, after the iterative update, filtering out the I frame whose sum of the autocorrelation attribution degree and the autocorrelation attraction degree is not more than 0, namely judging whether the autocorrelation attribution degree and the autocorrelation attraction degree of each key frame accord with the following formula (7), filtering out the key frames which do not accord with the following formula (7), wherein,
Figure BDA0001417670420000059
step 36, the maximum value of the sum of the attribution degree and the attraction degree of the I frame meeting the condition that the autocorrelation attribution degree and the autocorrelation attraction degree are more than 0 and other I frames
Figure BDA0001417670420000061
And the corresponding other I frame is used as the clustering center of the I frame, so that the clustering center of each I frame meeting the condition is obtained, and the I frames belonging to the same clustering center are clustered into one class.
According to the method for clustering each key frame by using the sum of the attraction degree and the attribution degree as the linear check according to the similarity matrix, the defect that a traditional clustering algorithm is sensitive when an initial value is selected is overcome, all data are used as candidate clustering centers, each data needs to be continuously calculated, iterated and updated through messages transmitted among the data, the data competing to become the clustering centers or other clustering centers are finally determined, and finally a plurality of high-quality clustering centers are obtained after iteration is finished.
In addition, the above-mentioned clustering method "message passing mechanism" performs the passing of two kinds of information, i.e., the attraction (responsiveness) and the attribution (availabilitity), between the data point and the candidate clustering center, and the attraction is
Figure BDA0001417670420000062
Indicating information transferred from the data to the candidate cluster center k, degree of attribution
Figure BDA0001417670420000063
Representing the information passed from candidate cluster center k to data point i; the attraction degree represents the appropriate degree of the k point becoming the clustering center of the data point i, and the attribution degree represents the appropriate degree of the i point taking the k point as the clustering center;
Figure BDA0001417670420000064
and
Figure BDA0001417670420000065
the larger the probability is, the higher the probability that the data point i has k points as its own cluster center is, and the higher the probability that the k points become the cluster centers is.
Furthermore, a Reference (Reference) is associated with the attraction and the attribution, and the Reference is not information to be transmitted but a criterion. I.e. using the value S on the diagonal of the similarity matrix Sk,kTo judge whether the k point can become a clustering center, sk,kThe larger the k point becomes the cluster center, the higher the probability that the k point becomes the cluster center, and in the initialization process of the cluster, any data point i does not belong to any class, and any data k is not the cluster center, so the mean value of the matrix S is assigned to the reference P of all data points.
The video scene similarity analysis method can be applied to a video coding and decoding method, which includes the video analysis method, and further includes:
compressing I frames grouped into one class as a GOP, wherein as shown in FIG. 4, the I frames of F01-F20 respectively have respective clustering centers which can be represented by a code number, such as S1-S4, and compressing the I frames grouped into one class as a GOP, such as GOP1-GOP 4;
compressing the I frame, reconstructing the I frame at an encoding end, and placing the I frame in a frame buffer area, wherein B frames and P frames of other GOPs find the I frame corresponding to the frame buffer area through respective I frame indexes to perform inter-frame prediction encoding;
when decoding, all I frames are decoded, and then B frames and P frames are decoded according to the clustering labels.
Fig. 5 is a block diagram of a video scene similarity analysis system according to the present invention, and as shown in fig. 5, the video scene similarity analysis system 100 includes:
the key frame extraction module 110 is configured to select a frame of image from each shot of the video as an I frame of each shot, and send each I frame to the similarity matrix construction module 120;
the similarity matrix construction module 120 extracts the feature vector of each frame I, measures the similarity between frames I according to the feature vector, constructs a similarity matrix, and sends the similarity matrix to the clustering module 130;
the clustering module 130 is configured to perform clustering on each I frame by using the sum of the attraction degree and the attribution degree as a linear check according to the similarity matrix, and set a clustering label of each I frame, wherein the clustering label is represented by a clustering center corresponding to the I frame, the attraction degree represents information transmitted from the I frame to the clustering center, and the attribution degree represents information transmitted from the clustering center to the I frame.
In a preferred embodiment of the present invention, the key frame extracting module 11 includes:
a shot determining unit 111 that detects a boundary of a shot and determines a space between two adjacent boundaries as a shot;
the key frame extracting unit 112 selects a frame at a predetermined position in each shot as an I frame.
In a preferred embodiment of the present invention, the similarity matrix structure modeling 120 block includes:
the cumulative histogram construction unit 121 divides the image of the I frame into blocks to form sub-blocks with the same size, obtains a histogram of colors of the sub-blocks, obtains the cumulative histogram of the I frame through accumulation, and sends the cumulative histogram to the color feature vector extraction unit 122;
the color feature vector extraction unit 122 extracts the color feature vector of the cumulative histogram of each I frame by using a color space, and sends the color feature vector of each I frame to the similarity matrix construction unit 124;
the texture feature vector extraction unit 123 extracts the texture feature vector of each I frame by using the gray level co-occurrence matrix, and sends the texture feature vector of each I frame to the similarity matrix construction unit 124;
a similarity matrix construction unit 124 constructing a similarity matrix according to the following equation (1) using the color feature vector and texture feature vector of each I frame,
Figure BDA0001417670420000071
where m, n are two I-frames, sm,nRepresents the similarity of I frame m and I frame n, (x)1,x2,…,xd) Is the feature vector of I frame m, (y)1,y2,…,yd) Is an I-frame n feature vector.
In a preferred embodiment of the present invention, the clustering module 130 includes:
the initializing unit 131 initializes the similarity matrix, uses an average value of each similarity in the similarity matrix as a self-similarity of each I frame, and sends the initialized similarity matrix to the first updating unit 133, wherein,
Figure BDA0001417670420000072
wherein S is a similarity matrix, and the average value P of all similarities in the similarity matrix S is used as S1,1,s2,2,…,sm,mA value of (d);
the setting unit 132 sets the initial attribution degree and the initial attraction degree between the I frames in the attribution degree matrix and the attraction degree matrix to be 0, sends the initial attribution degree and the initial attraction degree to the first updating unit 133, sets the damping factor λ and the iteration number T, sends the damping factor λ and the iteration number T to the second updating unit 134, wherein,
Figure BDA0001417670420000081
Figure BDA0001417670420000082
wherein A is a attribution degree matrix, am,nRepresenting the attribution degree of the I frame m to the I frame n; r is an attraction matrix, Rm,nRepresenting the attraction of the I frame n to the I frame m;
the first updating unit 133 updates the affinity matrix according to the similarity matrix by using the following expression (2), updates the attribution matrix according to the following expressions (3) and (4) according to the updated affinity matrix, transmits the updated affinity matrix and attribution matrix to the second updating unit 134,
Figure BDA0001417670420000083
Figure BDA0001417670420000084
Figure BDA0001417670420000085
wherein r'm,nThe attraction degree a 'of the key frame n to the key frame m updated by the similarity matrix is adopted for both the initial attribution degree and the initial attraction degree being 0'm,nAnd a'm,mRespectively carrying out the attribution degree of the key frame m by the key frame n updated according to the updated attraction degree matrix and the autocorrelation attribution degree of the key frame m;
the second updating unit 134 iteratively updates the attraction degree matrix and the attribution degree matrix updated by the first updating unit by using the following formulas (5) and (6), and sends the attraction degree matrix and the attribution degree matrix after T iterations to the screening unit 135, wherein,
Figure BDA0001417670420000086
Figure BDA0001417670420000087
wherein the content of the first and second substances,
Figure BDA0001417670420000088
for the attraction of the keyframe n to the keyframe m after being updated by T iterations,
Figure BDA0001417670420000089
attribution degree of the key frame n to the key frame m after the iteration update for T times;
the screening unit 135 filters out the I frame whose sum of the autocorrelation attribution degree and the autocorrelation attraction degree is not greater than 0, and sends the attraction degree matrix and the attribution degree matrix, which are used for filtering out the vectors corresponding to the I frame that do not meet the above conditions, to the clustering unit 136;
the clustering unit 136 uses the attraction degree matrix and the other I frame corresponding to the maximum value of the sum of the attraction degree and the attraction degree of the I frame and the other I frame in the attraction degree matrix filtered by the screening unit as the clustering center of the I frame, so as to obtain the clustering center of each I frame meeting the conditions, and cluster the I frames belonging to the same clustering center into one class.
The video scene similarity analysis system can be applied to a video coding and decoding system, as shown in fig. 6, the video coding and decoding system 1000 includes, in addition to the video scene similarity analysis system 100, the following:
a compression unit 200 for compressing I frames grouped into one group as one GOP;
an encoding part 300, which reconstructs the I frame after compressing the I frame at an encoding end and places the I frame in a frame buffer area, and B frames and P frames of other GOPs find the I frame corresponding to the frame buffer area through respective I frame indexes to perform inter-frame prediction encoding;
the decoding unit 400 decodes all I frames first, and then decodes B frames and P frames based on the cluster labels.
In summary, the video scene similarity analysis method, the video encoding and decoding method, the video scene similarity analysis system and the video encoding and decoding system proposed by the present invention are described by way of example with reference to the accompanying drawings. However, it will be appreciated by those skilled in the art that various modifications could be made to the system and method of the present invention described above without departing from the spirit of the invention. Therefore, the scope of the present invention should be determined by the contents of the appended claims.

Claims (6)

1. A video scene similarity analysis method is characterized by comprising the following steps:
respectively selecting a frame of image from each shot of a video as a key frame of each shot;
extracting the feature vector of each key frame, measuring the similarity between the key frames according to the feature vector, and constructing a similarity matrix;
the method for constructing the similarity matrix comprises the following steps:
the method comprises the steps of partitioning an image of a key frame into sub-blocks with the same size to obtain a histogram of colors of the sub-blocks, and accumulating to obtain an accumulated histogram of the key frame;
extracting a color feature vector of the cumulative histogram of each key frame by adopting a color space;
extracting a texture feature vector of each key frame by adopting a gray level co-occurrence matrix;
constructing a similarity matrix according to the following formula (1) by using the color feature vector and the texture feature vector of each key frame,
Figure FDA0002441084720000011
where m, n are two key frames, sm,nRepresenting the similarity of key frame m and key frame n, (x)1,x2,...,xd) Is composed ofFeature vector of key frame m, (y)1,y2,...,yd) The feature vector of the key frame n;
according to the similarity matrix, taking the sum of the attraction degree and the attribution degree as a linear check to cluster each key frame, and setting a cluster label of each key frame, wherein the cluster label is represented by a cluster center corresponding to the key frame, the attraction degree represents information transmitted from the key frame to the cluster center of the key frame, and the attribution degree represents information transmitted from the cluster center to the key frame;
the method for clustering each key frame by taking the sum of the attraction degree and the attribution degree as a linear core according to the similarity matrix and setting the clustering label of each key frame comprises the following steps:
initializing the similarity matrix, taking the average value of each similarity in the similarity matrix as the self-similarity of each key frame,
Figure FDA0002441084720000012
wherein S is a similarity matrix, and the average value P of all similarities in the similarity matrix S is used as S1,1,s2,2,…,sm,mA value of (d);
setting the initial attribution degree and the initial attraction degree between the key frames in the attribution degree matrix and the attraction degree matrix to be 0, wherein,
Figure FDA0002441084720000021
Figure FDA0002441084720000022
wherein A is a attribution degree matrix, am,nRepresenting the attribution degree of the key frame m to the key frame n; r is an attraction matrix, Rm,nRepresenting the attraction of the key frame n to the key frame m;
updating the attraction degree matrix according to the similarity degree matrix by adopting the following formula (2),
Figure FDA0002441084720000023
wherein r'm,nAdopting the attraction degree of the key frame n to the key frame m after the similarity matrix is updated for the initial attribution degree and the initial attraction degree which are both 0;
updating the attribution degree matrix according to the following formulas (3) and (4) according to the updated attraction degree matrix,
Figure FDA0002441084720000024
Figure FDA0002441084720000025
wherein, a'm,nAnd a'm,mRespectively carrying out the attribution degree of the key frame m by the key frame n updated according to the updated attraction degree matrix and the autocorrelation attribution degree of the key frame m;
setting a damping factor lambda and an iteration number T, and performing iterative update on the attraction degree matrix and the attribution degree matrix by using the following formulas (5) and (6),
Figure FDA0002441084720000026
Figure FDA0002441084720000027
wherein the content of the first and second substances,
Figure FDA0002441084720000028
for the attraction of the keyframe n to the keyframe m after being updated by T iterations,
Figure FDA0002441084720000029
attribution degree of the key frame n to the key frame m after the iteration update for T times;
filtering the key frames with the sum of the autocorrelation attribution degree and the autocorrelation attraction degree not more than 0 after the iterative updating;
and taking another key frame corresponding to the maximum value of the sum of the attribution degree and the attraction degree of the key frame and other key frames which meet the condition that the autocorrelation attribution degree and the autocorrelation attraction degree are greater than 0 as the clustering center of the key frame, thereby obtaining the clustering center of each key frame which meets the condition, and clustering the key frames belonging to the same clustering center into one class.
2. The method of claim 1, wherein the selecting a frame of image from each shot of the video as the key frame of each shot comprises:
detecting the boundaries of the shots, and determining two adjacent boundaries as a shot;
and selecting a frame at a set position in each lens as a key frame.
3. A video encoding and decoding method, comprising:
the video scene similarity analysis method of any one of claims 1-2;
compressing the key frames grouped into one class as a GOP;
compressing the key frame, reconstructing the key frame at an encoding end, and placing the key frame in a frame buffer area, wherein the B frames and the P frames of the rest GOPs find the key frame corresponding to the frame buffer area through respective key frame indexes to perform inter-frame prediction encoding;
when decoding, all key frames are decoded first, and then B frames and P frames are decoded according to the clustering labels.
4. A video scene similarity analysis system, comprising:
the key frame extraction module is used for respectively selecting a frame of image from each shot of the video as a key frame of each shot and sending each key frame to the similarity matrix construction module;
the similarity matrix construction module is used for extracting the feature vector of each key frame, measuring the similarity between the key frames according to the feature vector, constructing a similarity matrix and sending the similarity matrix to the clustering module;
the similarity matrix construction module comprises:
the accumulated histogram construction unit is used for blocking the image of the key frame to form sub-blocks with the same size to obtain a color histogram of the sub-blocks, accumulating to obtain the accumulated histogram of the key frame, and sending the accumulated histogram to the color feature vector extraction unit;
the color feature vector extraction unit is used for extracting the color feature vector of the cumulative histogram of each key frame by adopting a color space and sending the color feature vector of each key frame to the similarity matrix construction unit;
the texture feature vector extraction unit is used for extracting the texture feature vector of each key frame by adopting a gray level co-occurrence matrix and sending the texture feature vector of each key frame to the similarity matrix construction unit;
a similarity matrix construction unit for constructing a similarity matrix according to the following formula (1) using the color feature vector and texture feature vector of each key frame,
Figure FDA0002441084720000041
where m, n are two key frames, sm,nRepresenting the similarity of key frame m and key frame n, (x)1,x2,...,xd) Is the feature vector of the key frame m, (y)1,y2,...,yd) The feature vector of the key frame n;
the clustering module is used for clustering each key frame by taking the sum of the attraction degree and the attribution degree as a linear check according to the similarity matrix, and setting a clustering label of each key frame, wherein the clustering label is represented by a clustering center corresponding to the key frame, the attraction degree represents information transmitted from the key frame to the clustering center, and the attribution degree represents information transmitted from the clustering center to the key frame;
the clustering module comprises:
an initialization unit for initializing the similarity matrix, taking the average value of each similarity in the similarity matrix as the self-similarity of each key frame, and sending the initialized similarity matrix to a first updating unit,
Figure FDA0002441084720000042
a setting unit for setting the initial attribution degree and the initial attraction degree between the key frames in the attribution degree matrix and the attraction degree matrix to be 0, sending the initial attribution degree and the initial attraction degree to the first updating unit, setting the damping factor lambda and the iteration number T, sending the damping factor lambda and the iteration number T to the second updating unit, wherein,
Figure FDA0002441084720000043
Figure FDA0002441084720000044
wherein A is a attribution degree matrix, am,nRepresenting the attribution degree of the key frame m to the key frame n; r is an attraction matrix, Rm,nRepresenting the attraction of the key frame n to the key frame m;
a first updating unit for updating the attraction degree matrix according to the similarity matrix by the following formula (2), updating the attribution degree matrix according to the updated attraction degree matrix by the following formulas (3) and (4), and sending the updated attraction degree matrix and the attribution degree matrix to a second updating unit,
Figure FDA0002441084720000051
Figure FDA0002441084720000052
Figure FDA0002441084720000053
wherein r'm,nAdopting the attraction degree of the key frame n to the key frame m after the similarity matrix is updated for the initial attribution degree and the initial attraction degree which are both 0; a'm,nAnd a'm,mRespectively carrying out the attribution degree of the key frame m by the key frame n updated according to the updated attraction degree matrix and the autocorrelation attribution degree of the key frame m;
a second updating unit which iteratively updates the attraction degree matrix and the attribution degree matrix updated by the first updating unit by using the following formulas (5) and (6) and sends the attraction degree matrix and the attribution degree matrix after T iterations to the screening unit, wherein,
Figure FDA0002441084720000054
Figure FDA0002441084720000055
wherein the content of the first and second substances,
Figure FDA0002441084720000056
for the attraction of the keyframe n to the keyframe m after being updated by T iterations,
Figure FDA0002441084720000057
attribution degree of the key frame n to the key frame m after the iteration update for T times;
the screening unit filters the key frames with the sum of the autocorrelation attribution degree and the autocorrelation attraction degree not greater than 0, and sends the attraction degree matrix and the attribution degree matrix of the vectors corresponding to the key frames which do not meet the conditions to the clustering unit;
and the clustering unit is used for taking another key frame corresponding to the maximum value of the sum of the attribution degree and the attraction degree of the key frame and other key frames in the attraction degree matrix and the attribution degree matrix filtered by the screening unit as a clustering center of the key frame, so that the clustering center of each key frame meeting the conditions is obtained, and the key frames belonging to the same clustering center are clustered into one class.
5. The video scene similarity analysis system according to claim 4, wherein the key frame extraction module comprises:
the shot determining unit detects the boundaries of the shots and determines the interval between two adjacent boundaries as a shot;
the key frame extraction unit selects a frame at a set position in each lens as a key frame.
6. A video coding/decoding system, comprising:
the video scene similarity analysis system of any one of claims 4-5;
a compression unit for compressing the key frames grouped into one group as a GOP;
the encoding part is used for reconstructing the compressed key frames at an encoding end and placing the compressed key frames in a frame buffer area, and finding the key frames corresponding to the frame buffer area by the B frames and the P frames of the rest GOPs through respective key frame indexes to perform inter-frame prediction encoding;
and the decoding part decodes all the key frames and then decodes the B frames and the P frames according to the clustering labels.
CN201710873784.2A 2017-09-25 2017-09-25 Video scene similarity analysis method and system, and video encoding and decoding method and system Active CN107657228B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710873784.2A CN107657228B (en) 2017-09-25 2017-09-25 Video scene similarity analysis method and system, and video encoding and decoding method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710873784.2A CN107657228B (en) 2017-09-25 2017-09-25 Video scene similarity analysis method and system, and video encoding and decoding method and system

Publications (2)

Publication Number Publication Date
CN107657228A CN107657228A (en) 2018-02-02
CN107657228B true CN107657228B (en) 2020-08-04

Family

ID=61130165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710873784.2A Active CN107657228B (en) 2017-09-25 2017-09-25 Video scene similarity analysis method and system, and video encoding and decoding method and system

Country Status (1)

Country Link
CN (1) CN107657228B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509917B (en) * 2018-03-30 2020-03-03 北京影谱科技股份有限公司 Video scene segmentation method and device based on lens class correlation analysis
CN110879952B (en) * 2018-09-06 2023-06-16 阿里巴巴集团控股有限公司 Video frame sequence processing method and device
CN109688429A (en) * 2018-12-18 2019-04-26 广州励丰文化科技股份有限公司 A kind of method for previewing and service equipment based on non-key video frame
CN110062163B (en) * 2019-04-22 2020-10-20 珠海格力电器股份有限公司 Multimedia data processing method and device
CN110188625B (en) * 2019-05-13 2021-07-02 浙江大学 Video fine structuring method based on multi-feature fusion
CN110377587B (en) * 2019-07-15 2023-02-10 腾讯科技(深圳)有限公司 Migration data determination method, device, equipment and medium based on machine learning
CN111126126B (en) * 2019-10-21 2022-02-01 武汉大学 Intelligent video strip splitting method based on graph convolution neural network
CN110796088B (en) * 2019-10-30 2023-07-04 行吟信息科技(上海)有限公司 Video similarity judging method and device
CN110853033B (en) * 2019-11-22 2022-02-22 腾讯科技(深圳)有限公司 Video detection method and device based on inter-frame similarity
CN118044204A (en) * 2021-09-30 2024-05-14 浙江大学 Encoding and decoding method, decoder, encoder, and computer-readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404030A (en) * 2008-11-05 2009-04-08 中国科学院计算技术研究所 Method and system for periodic structure fragment detection in video
CN102572435A (en) * 2012-01-16 2012-07-11 中南民族大学 Compressive sampling-based (CS-based) video coding/decoding system and method thereof
CN107194312A (en) * 2017-04-12 2017-09-22 广东银禧科技股份有限公司 A kind of model recommendation method based on 3D printing model content

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130114703A1 (en) * 2005-03-31 2013-05-09 Euclid Discoveries, Llc Context Based Video Encoding and Decoding
CN105469383A (en) * 2014-12-30 2016-04-06 北京大学深圳研究生院 Wireless capsule endoscopy redundant image screening method based on multi-feature fusion
CN104732545B (en) * 2015-04-02 2017-06-13 西安电子科技大学 The texture image segmenting method with quick spectral clustering is propagated with reference to sparse neighbour
CN106202256B (en) * 2016-06-29 2019-12-17 西安电子科技大学 Web image retrieval method based on semantic propagation and mixed multi-instance learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404030A (en) * 2008-11-05 2009-04-08 中国科学院计算技术研究所 Method and system for periodic structure fragment detection in video
CN102572435A (en) * 2012-01-16 2012-07-11 中南民族大学 Compressive sampling-based (CS-based) video coding/decoding system and method thereof
CN107194312A (en) * 2017-04-12 2017-09-22 广东银禧科技股份有限公司 A kind of model recommendation method based on 3D printing model content

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
近邻传播算法在非监督图像聚类中的应用;钱丽丽,施鹏飞;《微型电脑应用》;20110220;第27卷(第2期);35页第二栏1-7行 *

Also Published As

Publication number Publication date
CN107657228A (en) 2018-02-02

Similar Documents

Publication Publication Date Title
CN107657228B (en) Video scene similarity analysis method and system, and video encoding and decoding method and system
CN108632625B (en) Video coding method, video decoding method and related equipment
US6618507B1 (en) Methods of feature extraction of video sequences
US6625319B1 (en) Image compression using content-based image similarity
CN108347612B (en) Monitoring video compression and reconstruction method based on visual attention mechanism
JP2000217117A (en) Processing method for digital image expression video data in compression form
US9979976B2 (en) Light-weight video coding system and decoder for light-weight video coding system
US7068720B2 (en) Coding of digital video with high motion content
Wang et al. Scalable facial image compression with deep feature reconstruction
US11263261B2 (en) Method and system for characteristic-based video processing
KR101149522B1 (en) Apparatus and method for detecting scene change
CN103020138A (en) Method and device for video retrieval
Yu et al. Detection of fake high definition for HEVC videos based on prediction mode feature
CN108366295A (en) Visual classification feature extracting method, transcoding weight contracting detection method and storage medium
US20150249829A1 (en) Method, Apparatus and Computer Program Product for Video Compression
CN109618227B (en) Video data storage method and system
KR101163774B1 (en) Device and process for video compression
CN112770116B (en) Method for extracting video key frame by using video compression coding information
US20060109902A1 (en) Compressed domain temporal segmentation of video sequences
Dai et al. HEVC video steganalysis based on PU maps and multi-scale convolutional residual network
Baroffio et al. Hybrid coding of visual content and local image features
CN111277835A (en) Monitoring video compression and decompression method combining yolo3 and flownet2 network
CN104410863B (en) Image processor and image processing method
KR102072576B1 (en) Apparatus and method for encoding and decoding of data
Rayatifard et al. A fast and robust shot detection method in hevc/h. 265 compressed video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant