CN105959685B

CN105959685B - A kind of compression bit rate Forecasting Methodology based on video content and cluster analysis

Info

Publication number: CN105959685B
Application number: CN201610378960.0A
Authority: CN
Inventors: 宋利; 朱雨桐; 解蓉; 张文军
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2016-05-31
Filing date: 2016-05-31
Publication date: 2018-01-19
Anticipated expiration: 2036-05-31
Also published as: CN105959685A

Abstract

The present invention discloses a kind of compression bit rate Forecasting Methodology based on video content and cluster analysis, and this method does sobel filtering to each frame of video first, obtains spatial complexity information；Then difference is done to the monochrome information of adjacent two frame, obtains time complexity information；Then to spatial information and temporal information, cluster analysis is done using k means methods；Then in each class, coefficient regression is done, obtains forecast model, and utilize the model prediction compression bit rate.Method proposed by the present invention first to carry out k means cluster analyses, then doing in each class regression forecasting, hence it is evident that the predictablity rate for improving model is used.The method of such a " first cluster and return again " is predicted, and can obtain more preferable effect.

Description

A kind of compression bit rate Forecasting Methodology based on video content and cluster analysis

Technical field

The present invention relates to a kind of method in video quality evaluation and test field, is specifically that one kind is based on sdi video information and time Information, after doing cluster analysis to video source sequence, no-reference video quality is used in similar characteristic per one kind The compression bit rate Forecasting Methodology of evaluation model.

Background technology

Multimedia rapid development also provides multiple terminal selection, including the TV of giant-screen, small size for video-see Smart mobile phone, and tablet personal computer for falling between of size etc..Pursuit of the beholder to number of videos and quality is gradual Lifting, bigger memory space and the more requirement of high compression code check for equipment also increase therewith.Therefore, regard reaching certain During frequency quality, how to find compression bit rate as small as possible turns into the Research Points of this patent.Therefore, this patent proposes one kind Compression bit rate Forecasting Methodology based on video content and cluster analysis.

Video quality evaluation and test can be broadly divided into two big kind methods：It is subjective and objective.Objective quality is evaluated and tested and subjective method Compare, more flexibly, fast, be easy to put into practice.Objective quality is evaluated and tested is divided into full reference, partly with reference to and without with reference to evaluation and test side again Method.Wherein, no-reference video quality evaluating method is directly analyzed video, then makes assessment to video quality quality. Have no-reference video quality evaluating method of the major class based on video self-information parameter at present, because it need not be to video Source sequence is compressed processing, and the complexity of method is relatively low, is also easy to put into practice, therefore this method can apply to real-time system In, tool has significant practical applications.

Existing result of study shows that Subjective video quality is mainly influenceed by following factor：In coded system, video Appearance, compression bit rate, video frame rate and video resolution.Some proposed at present are regarded based on video parameter model without reference Frequency quality assessment method is also based substantially on the one or more in five elements of the above.As Motohiro Takagi et al. exist IEEE International Conference on Visual Communications and Image in 2014 Delivered on Processing, pp.33-36 (IEEE visual communications in 2014 and image procossing international conference, page 33 to 36) “Optimized spatial and temporal resolution based on subjective quality Estimation without encoding " (time domain and spatial resolution optimization based on the estimation of non-coding subjective quality) text Zhang Zhong, i.e., video quality is predicted using compression bit rate and video frame rate.

However, existing no-reference video quality evaluation is mostly that video motion information or coding information are extracted Afterwards,

Video quality is directly predicted, seldom analyzed for the classification of video content.It is existing few in number Method by being given a forecast after classifying to video, it is also mostly to observe by the naked eye video content to be classified, is such as divided into " new News class ", " cartoon class " etc..Still it is barely satisfactory in accuracy.

Therefore, the present invention proposes to do the side of compression bit rate prediction based on video content self-information and using cluster analysis Method, to improve the accuracy of model prediction and practicality.

The content of the invention

The present invention is on the basis of existing no reference video method for evaluating objective quality, there is provided one kind based on video content and The compression bit rate Forecasting Methodology of cluster analysis, classifies to video self-information, and forecasting accuracy is improved with this.

To achieve the above object, the technical solution adopted by the present invention is as follows：

S1：Sobel filtering is done to each frame of video, obtains spatial information SI；The monochrome information of adjacent two frame is made the difference Value, obtains temporal information TI；

S2：The spatial information SI and temporal information TI obtained to S1, does cluster analysis using k-means methods, obtains more Individual class；

S3：In S2 each class, coefficient regression is done, obtains compression bit rate forecast model, and utilize the model prediction Compression bit rate.By being returned in each class to it, forecasting accuracy is improved.

More preferably, the S1：For the n-th frame image of former video sequence, it is respectively processed with following two formula, from And obtain spatial information SI (Spatial Information) and temporal information TI (Temporal Information)：

SI=max_time{std_space[Sobel(F_n)]}

TI=max_time{std_space[F_n(i,j)–F_n-1(i,j)]}

Wherein F_nIt is the monochrome information of present frame, Sobel represents the Sobel operators in classical image procossing, std_spaceTable Show and standard deviation, max are asked to the result being calculated by Sobel in the frame_timeRepresent to calculate all frames by standard deviation Obtained result takes maximum.

More preferably, the S2：The spatial information SI and temporal information TI results in S1 are taken, brings into K-means algorithms and does Cluster analysis, referred to using square (the Squared Euclidean distance) of Euclidean distance as the distance for calculating cluster Mark.Meanwhile using the silhouette values in K-means cluster analyses as cluster result analysis indexes, by analyzing the value, It is determined that final cluster number.Finally, the video with similar SI and TI information is gathered for one kind.

More preferably, the S3, after S2 completes cluster analysis, in the class that each is gathered, the space that will be calculated in S1 Information SI and temporal information TI is brought into following compression bit rate forecast model, the sequence of corresponding different video, is brought into different Subjective video quality evaluates and tests MOS score values, obtains the predicted value of compression bit rate, realizes to needed for video compress under extra fine quality requirement The prediction of code check：

v_c=TISI (2)

α(v_c)=c₁+c₂·log(v_c) (3)

γ(v_c)=c₄+c₅·log(v_c) (5)

Wherein, c₁To c₆For model parameter.α, β, γ are intermediate parameters.MOS (Mean Opinion Score) represents to regard Frequency subjective testing score value, there is different values according to different method of testings, and this invention takes in ITU-RBT-500 files DSI Variant II methods, and employ the principle of 5 points of systems, i.e.,：1 point represents that quality is excessively poor；2 points represent quality compared with Difference；3 points represent that quality is general；4 points represent that quality is preferable；5 points represent that quality is very good.In addition, TI and SI represent the time respectively Information and spatial information.v_cThat represent is video content (video content), is determined by TI and SI.BR_pWhat is then represented is pre- The compression bit rate of survey.

Further, the model parameter c₁, c₂, c₃, c₄, c₅, c₆Determine by the following method：In practical application is ensured Encoder type, video resolution and frame per second it is consistent with subjective video quality ratings material in the case of, commented with subjective quality Valency result carries out least square regression calculating to the mathematical modeling of proposition, obtains the model parameter for application-specific.

The present invention considers influence of the video content to video quality, and utilization space information is with temporal information as in video Hold feature, and cluster analysis is done to video content features, the video with similar features is gathered for one kind.To based on video After the model of parameter carries out inverse transformation, you can with reference to video content and desired video quality, compressed code is done in each class Rate is predicted.The method can generally use before the coding, for required for when determining to reach the video quality of requirement substantially Compression bit rate.

Compared with prior art, the present invention has following beneficial effect：

Method proposed by the present invention first to carry out k-means cluster analyses, then doing in each class regression forecasting, hence it is evident that carry The predictablity rate for having risen model is used.The method of such a " first cluster and return again " is predicted, and can obtain more preferable effect.

Brief description of the drawings

By reading with reference to the following drawings, will become for features, objects and advantages of the invention and holistic approach It is clear to become apparent from：

Fig. 1 is the FB(flow block) of the compression bit rate Forecasting Methodology based on video content and cluster analysis.

Fig. 2 is that the spatial information of the video source sequence for Parameters in Regression Model is believed with the time in one embodiment of the invention Breath.

Fig. 3 is to use the prediction result after the inventive method.

Embodiment

With reference to specific embodiment, the present invention is described in detail.Following examples will be helpful to the technology of this area Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill to this area For personnel, without departing from the inventive concept of the premise, various modifications and improvements can be made.These belong to the present invention Protection domain.

Specific embodiment is being described without reference objective video quality evaluation application below in conjunction with the inventive method, will this hair Bright proposition carries out cluster analysis using TI and SI, and carrying out regression forecasting in each class afterwards is applied to quality evaluation, specific stream Journey block diagram is as shown in Figure 1.The 4K ultra high-definition videos for being 30fps using the frame per second of HEVC compressed encodings are applied the invention to herein In sequence.It should be noted that the frame per second that the result (such as Pearson correlation coefficients PCC) is only applicable to HEVC codings is 30fps 4K videos, for the application under different scenes, in fact it could happen that Different Results.But overall method is general, this is not influenceed The essence of invention.

The extraction step of video time complexity is introduced first below, then introduces the extraction step of sdi video complexity Suddenly, k-means clustering methods, and cluster number analysis method, last place of matchmakers next will be discussed in detail on basis herein The no-reference video quality evaluation model of foundation.

1) space and the temporal information of video are calculated.

SI=max_time{std_space[Sobel(F_n)]}

TI=max_time{std_space[F_n(i,j)–F_n-1(i,j)]}

2) K-means cluster analyses are carried out to the SI and TI of video.

The present invention carries out cluster analysis using k-means methods, because k-means is unsupervised learning method, it is only necessary to Determine the class number gathered.Therefore and silhouette values are selected as the index for evaluating and testing cluster result under inhomogeneity number.The index takes It is worth scope [- 1,1], the usual value is bigger, illustrates that the video sequence is more remote from other classes, the polymerization effect in its affiliated class is got over It is good.

When analyzing silhouette result, present invention selection following four feature carries out interpretation of result：Minimum value Silh_min, maximum Silh_maX, average Silh_meanAnd standard deviation Silh_dev.Analyzed below by taking table one as an example.Wherein, K_caRepresent cluster number.

The cluster analysis silhouette value results of the inhomogeneity number of table one

Classification	K_ca=2	K_ca=3	K_ca=4	K_ca=5
					Silh_min	0.3905	0.1383	0.5069	0.5069
Silh_max	0.9381	0.9793	0.9677	1
					Silh_mean	0.839	0.7643	0.7410	0.7717
Silh_dev	0.1726	0.2305	0.1620	0.1911

Work as K_caWhen=2, although its average highest, and standard deviation come it is second small, by subsequently a kind of being carried out to every During regression forecasting, it is found that accuracy rate is low, effect is poor.Its basic reason, which also resides in, only gathers for 2 classes, and class number is very few, knot now Fruit and the difference very little not clustered.That is, gather for 2 class when, although being met the requirements in data, can without reality meaning.

Work as K_caWhen=3, its minimum value as little as 0.1383, it means that Clustering Effect is excessively poor, and only class is gathered As a result unobvious.Therefore, it is necessary to which more class numbers could meet to require.

Work as K_caWhen=5, its maximum is 1, and this explanation gather effect is extremely good from data.But from result See, an only video sequence in such, i.e. class number now is excessive, should reduce class number.

To sum up analyze, K_caValue has optimal gather effect when being 4.

After the class number for determining cluster analysis, you can carry out cluster analysis according to k-means algorithms.Finally, will have similar Spatial information SI and the video of temporal information TI features are gathered for one kind.

3) according to cluster analysis result, in each class, the video in such return, it is accurate so as to improve prediction True rate.

After carrying out cluster analysis, in each class, return to obtain model parameter c using least square method₁To c₆, then The prediction of code check is compressed using no-reference video quality evaluation model.

By taking 4K definition video datas storehouse disclosed in Shanghai Communications University's Image Communication and network engineering research institute as an example (http://medialab.sjtu.edu.cn/resources/resources.html), the database is with 10 reference videos Based on, it is compressed with 6 code check points respectively, and provide corresponding subjective DMOS values.Spearman coefficient (SROCC) It is used as weighing the index of forecasting accuracy with Pearson's coefficient (LCC).

After table two is by cluster analysis, per a kind of prediction result, and prediction result when not carrying out cluster analysis.Can To find out, after carrying out cluster analysis in advance, PCC highests, which improve 28.76%, RMSE highests, reduces 68.98%.By this hair It is bright, more preferable effect is obtained really.

The prediction result of table two

Classification	PCC	SCC	RMSE	MOS
					Classification A	0.972	0.986	0.102	3.945
Classification B	0.953	0.951	0.087	3.818
					Classification C	0.901	0.865	0.274	4.124
Classification D	0.961	0.969	0.177	4.041
					All sequences when not clustering	0.672	0.753	1.174	4.002

Described above is only the preferred embodiment of the present invention, and protection scope of the present invention is not only limited to above-mentioned implementation Example, all technical schemes belonged under thinking of the present invention belong to the protection category of the present invention.It should be pointed out that for the art Technical staff for, some improvements and modifications without departing from the principles of the present invention, these improvements and modifications also all should It is considered as protection scope of the present invention.

Claims

1. a kind of compression bit rate Forecasting Methodology based on video content and cluster analysis, it is characterised in that comprise the following steps：

S1：Sobel filtering is done to each frame of video, obtains spatial information SI；Difference is done to the monochrome information of adjacent two frame, obtained To temporal information TI；

S2：The spatial information SI and temporal information TI obtained to S1, does cluster analysis using k-means methods, obtains multiple classes；

S3：In S2 each class, coefficient regression is done, obtains compression bit rate forecast model, and compress using the model prediction Code check；

The S3：After S2 completes cluster analysis, in the class that each is gathered, by the spatial information SI calculated in S1 and time Information TI is brought into following compression bit rate forecast model, the sequence of corresponding different video, brings different Subjective video qualities into MOS score values are evaluated and tested, the predicted value of compression bit rate is obtained, realizes the prediction to code check needed for video compress under extra fine quality requirement：

v_c=TISI (2)

α(v_c)=c₁+c₂·log(v_c) (3)

γ(v_c)=c₄+c₅·log(v_c) (5)

Wherein, c₁To c₆For model parameter, α, β, γ are intermediate parameters, and MOS represents video subjective testing score value, takes ITU-R DSI Variant II methods in BT-500 files, and employ the principle of 5 points of systems, i.e.,：1 point represents that quality is excessively poor, 2 Divide and represent second-rate, 3 points represent that quality are general, and 4 points represent that quality are preferable, and 5 points represent that quality are very good；TI and SI generations respectively Table temporal information and spatial information；v_cWhat is represented is video content, is determined by SI and TI, BR_pWhat is then represented is the compressed code of prediction Rate.

2. the compression bit rate Forecasting Methodology according to claim 1 based on video content and cluster analysis, it is characterised in that： The S1：For the n-th frame image of former video sequence, it is respectively processed with following two formula, so as to obtain spatial information SI and temporal information TI：

SI=max_time{std_space[Sobel(F_n)]}

TI=max_time{std_space[F_n(i,j)–F_n-1(i,j)]}

Wherein F_nIt is the monochrome information of present frame, Sobel represents the Sobel operators in classical image procossing, std_spaceExpression pair The result being calculated by Sobel in the frame asks standard deviation, max_timeAll frames are calculated by standard deviation for expression Result take maximum.

3. the compression bit rate Forecasting Methodology according to claim 1 based on video content and cluster analysis, it is characterised in that： The S2：The spatial information SI and temporal information TI results in S1 are taken, brings into K-means algorithms and does cluster analysis, using Europe Square range index clustered as calculating of formula distance, meanwhile, made using the silhouette values in K-means cluster analyses For cluster result analysis indexes, by analyzing the silhouette values, it is determined that final cluster number, finally, will have similar Spatial information SI and the video of temporal information TI features are gathered for one kind.

4. the compression bit rate Forecasting Methodology based on video content and cluster analysis according to claim any one of 1-3, its It is characterised by：The model parameter c₁, c₂, c₃, c₄, c₅, c₆Determine by the following method：Encoder in practical application is ensured In the case of type, video resolution and frame per second are consistent with subjective video quality ratings material, with subjective quality assessment result pair The mathematical modeling of proposition carries out least square regression calculating, obtains the model parameter for application-specific.