CN107809631A

CN107809631A - The wavelet field method for evaluating video quality eliminated based on background

Info

Publication number: CN107809631A
Application number: CN201710926882.8A
Authority: CN
Inventors: 张淑芳; 黄小琴
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2017-10-08
Filing date: 2017-10-08
Publication date: 2018-03-16
Anticipated expiration: 2037-10-08
Also published as: CN107809631B

Abstract

The invention discloses a kind of wavelet field method for evaluating video quality eliminated based on background, the first step：Video sequence global qualitySecond step：The local quality of video is calculated using background null method3rd step：The oeverall quality of video is calculated, i.e., according to the local quality and global quality of resulting video sequence, video quality evaluation modelIt is contemplated that improve the uniformity of the subjective quality assessment of video objective quality evaluation and human eye；Can there is preferable video evaluation performance for different type of distortion, different scenes, and the algorithm has relatively low complexity, can realize real-time quality assessment.

Description

The wavelet field method for evaluating video quality eliminated based on background

Technical field

The present invention relates to video quality evaluation field, more particularly to a kind of wavelet field video quality evaluation model.

Background technology

Video quality evaluation algorithm can weigh different degrees of distortion, turn into the popular research side of video field in recent years To.Method for evaluating video quality can be divided into subjective assessment and objective evaluation, and subjective quality assessment is mainly imitated by human eye vision Fruit evaluates video quality, it is considered to be most reliable quality evaluating method.Therefore generally use subjective evaluation result with it is objective Performance Evaluating Indexes of the uniformity of evaluation result as objective evaluation algorithm.

The perceived quality of distortion video is global quality and the coefficient result of local quality.Global quality is observer For the rough impression of video quality, obtained after being averaged by the quality of video sequence whole frame；Local quality is mainly by human eye The characteristic such as mass change determines between visual attention location, sequence.By calculating the overall situation, the local quality of video sequence respectively, depending on The oeverall quality of frequency.

A variety of objective video quality evaluation methods, wherein average poor (MSE), Y-PSNR in the world be present at present (PSNR), structural similarity (SSIM) and Multi-scale model similitude (MS-SSIM) due to model it is simple, in image and video matter Extensive utilization in amount evaluation, but the uniformity of these methods and human eye visual perception is poor.Kalpana et al. is improved to this, Multichannel Decomposition is carried out to image using Gabor filter, estimation is carried out to each passage, it is proposed that based drive space-time Domain video quality evaluation algorithm (MOVIE).Phong et al. combines image quality evaluation model M AD, time-space domain correlation and base In the model of human visual system, the spatial distortion to video sequence, time-space domain distortion are weighed respectively, so as to propose base In spatial domain and the video quality evaluation algorithm (VIS3) of time-space domain slice gradient similarity.Regarded according to human visual system in understanding Relied primarily on during frequency in image various marginal texture information the characteristics of, PengYan et al. proposes to be based on Spatial-temporal slice gradient The method for evaluating video quality of similarity.These methods can obtain higher accuracy, but complexity is high, limits model Practicality.

Different from image quality evaluation, video quality evaluation is not only related to spatial domain distortion, also related to the change of time domain. Distortion information of the change comprising movable information and time domain of time domain, and observer is when video observe, notice understands quilt Emergent object, the object of motion intense and time domain distortion information such as ghost image etc. are influenceed, that is to say, that foreground information is One of focal point of human eye.Frame difference method is a kind of simple Moving object extraction method, has that context update is fast, it is adaptive should be able to The features such as power is strong.Frame difference method is widely used in extraction video sequence due to its simple and quick characteristic in video quality evaluation Movable information.Loh et al. subtracts each other forward direction two frame of the present frame respectively with reference video of reference video, distortion video, proposes A kind of time-domain video quality evaluating method based on SSIM, this method speed is fast, but with human eye visual perception uniformity, general Property it is all poor because frame difference method is for larger, solid colour moving target, it is possible to produced in target internal empty Hole, so as to can not completely extract moving target.But direct application can influence the accuracy of model in video quality evaluation, Because in the case where continuous multiple frames have identical distortion information, frame difference method can filter out these distortion informations, and identical distortion The cumulative influence to video quality of type is bigger.

Due to the complexity of human visual system, the video quality evaluation algorithm that there is currently is in ageing and accuracy Good equilibrium is not reached.

The content of the invention

Consider high attention rate of the human visual system for marginal texture information, moving object and distortion information, the present invention A kind of wavelet field method for evaluating video quality eliminated based on background is proposed, considers that human visual system believes for marginal texture The high attention rate of breath, moving object and distortion information, the movable information of video sequence is extracted with improved frame difference method, and structure includes The time-space domain video block of movable information and distortion information, it is real with reference to advantage of the Haar wavelet transformations in extraction marginal information Wavelet field method for evaluating video quality (BSWQ) is showed.

A kind of wavelet field method for evaluating video quality eliminated based on background of the present invention, this method are comprised the following steps：

The first step：Global quality is calculated, using 4 grades of haar wavelet transforms (DWT) respectively to reference video R_NAnd mistake True video D_NEach frame decomposed, its decomposition coefficient represents as follows respectively：

C_R(λ, θ, t, i)=DWT (R_t) (1)

C_D(λ, θ, t, i)=DWT (D_t) (2)

Wherein, C_R(λ,θ,t,i),C_D(λ, θ, t, i) represents the wavelet decomposition of t frames in reference video and distortion video respectively Coefficient, t ∈ [1, N], R_t,D_tT frames in reference, distortion video sequence are represented respectively；{ λ, θ } represents to be used for thumbnail respectively The coefficient subband of different scale and different directions, θ=2,3,4 represent horizontal, clinodiagonal and vertical subband respectively, the table of θ=1 Show approximation subband；I represents t frames, λ yardsticks, the position for the wavelet coefficient that direction is θ；

By the wavelet field different scale of each frame of video and the detail subbands coefficient of different directions, reference is obtained respectively and is regarded Fringing coefficient E under frequency and distortion video different scale_R(λ,t,i),E_D(λ,t,i)：

Calculate the fringing coefficient similitude that t frames yardstick is λ respectively using above formula：

In formula, T is normal number, and ESIM (λ, t, i) represents t frames in reference, distortion video, λ yardsticks, the office at the i of position Portion's similitude, works as E_R,E_DWhen just the same, value 1；

By calculating the standard deviation of local similarity, the quality ESIMD of the different scale of video sequence single-frame images is obtained (λ,t)：

Wherein, N_cFor the coefficient sum in the coefficient matrix of t frames, λ yardsticks in video.

The single frames quality of video sequence is：

Wherein, l represents haar wavelet transformation series, and value herein is 4.

Then, video sequence global quality Q_globalIt is expressed as：

Second step：The local quality of video is calculated using background null method, i.e., first by reference video and distortion video with Continuous 3 frame of video sequence is one group, is configured to every group of nonoverlapping video block mutually；Secondly, with reference to average background method, will join Background of the frame of video class mean as the frame group is examined, is substituted in frame difference method using former frame as background, with the frame of centre one of frame group As the embodiment of spatial domain distortion position, remaining two frame subtracts background and obtains prospect frame to embody the feature of time-space domain, so as to by Between frame and two prospect frames collectively form time-space domain video block；

Reference video R_NWith distortion video D_NT frames in corresponding time-space domain video blockCalculation formula is as follows：

Wherein, B_gThe background image of g group frame of video groups is represented, g ∈ { 1,2 ..., floor (N/3) }, N represent video The totalframes of sequence, m represent the position of the relatively whole video sequence of intermediate frame of frame group, and m=3g-1, t represent that present frame is relative The position of whole video sequence；

The method that video sequence single frames quality is calculated according to formula (8), to each frame quality in the video block of video time-space domain Weighed, so as to obtain the quality Q of frame of video group_g：

The acquired multigroup time-space domain video block quality of video sequence is ranked up, and extracts second-rate part Final local quality Qs of the H% as video sequence_local：

Wherein, H represents the quality set { Q of video sequence frame group₁,Q₂,...,Q_g,...,Q_floor(N/3)Middle by sequence The H% of worst part set afterwards, N_HRepresent the element sum in set；

3rd step：Calculate the oeverall quality of video, i.e., according to the local quality and global quality of resulting video sequence, Video quality evaluation Model B SWQ is as follows：

The video quality evaluation algorithm of the wavelet field proposed by the present invention eliminated based on background is intended to improve video objective matter Amount evaluation and the uniformity of the subjective quality assessment of human eye；The algorithm can for different type of distortion, different scenes There is preferable video evaluation performance, and the algorithm has relatively low complexity, can realize real-time quality assessment.

Brief description of the drawings

Fig. 1 is wavelet sub-band coefficient index figure；

Fig. 2 is the wavelet field method for evaluating video quality overall flow figure eliminated based on background of the present invention；

Fig. 3 is BSWQ objective scorings and DMOS matched curve figure.

Embodiment

Embodiments of the present invention are described in further detail below in conjunction with accompanying drawing.

Concrete implementation step is as follows：

The first step：Calculate global quality：From human visual system curve, human eye it is most sensitive be midband, It is exactly the principal outline of image.Suitable image outline extracting method has important for video, image quality evaluation model Effect.Wavelet transformation is a kind of method extracted to frequency information, can be the subband under different scale by picture breakdown Image, so as to which the marginal information of image to be expressed as to the wavelet coefficient under different scale.Haar wavelet transformations are low with its complexity With effect it is good the advantages of image, video quality evaluation and compression in be widely used, based on this, the present invention use 4 grades Haar wavelet transforms (DWT) are respectively to reference video R_NWith distortion video D_NEach frame decomposed, its decomposition coefficient point Biao Shi not be as follows：

C_R(λ, θ, t, i)=DWT (R_t) (1)

C_D(λ, θ, t, i)=DWT (D_t) (2)

Wherein, C_R(λ,θ,t,i),C_D(λ, θ, t, i) represents the wavelet decomposition of t frames in reference video and distortion video respectively Coefficient, t ∈ [1, N], R_t,D_tT frames in reference, distortion video sequence are represented respectively；{ λ, θ } represents to be used for thumbnail respectively The coefficient subband of different scale and different directions, θ=2,3,4 represent horizontal, clinodiagonal and vertical subband respectively, the table of θ=1 Show approximation subband；By taking 2 grades of discrete wavelet transformations as an example, it indexes as shown in Figure 1.I represents that t frames, λ yardsticks, direction are the small of θ The position of wave system number.

In formula, T is normal number, mainly serves for ensuring ESIM stability, and ESIM (λ, t, i) represents reference, distortion video In t frames, λ yardsticks, the local similarity at the i of position, work as E_R,E_DWhen just the same, value 1.

The single frames quality of video sequence is：

Then, video sequence global quality Q_globalIt is expressed as：

Second step：Calculate the local quality of video：Used is background null method, and the specific practice is：It will refer to first Video and distortion video using continuous 3 frame of video sequence as one group, be configured to every group mutually nonoverlapping video block (due to video The image more than frame of encoding and decoding standard (such as H.264, HEVC) generally use 2 carries out estimation as reference frame and motion is mended Repay, i.e. present frame and the correlation of preceding a later frame is most strong)；Secondly, with reference to average background method, using reference video frame class mean as The background of the frame group, substitute in frame difference method using former frame as the method for background, so as to avoid the distortion information of successive frame While filtering out, the foreground information of video sequence is effectively extracted.From masking effect, the signal under background, in different skies Between position, the visibility of signal has obvious otherness.When distortion information is located in the higher background of complexity, to video The influence of quality degradation is simultaneously little, thus the locus of distortion information has material impact for video quality evaluation.According to Video quality evaluation is the result of space-time domain information interaction, to consider background complexity, movable information and distortion letter simultaneously The influence for video quality is ceased, background is subtracted using the frame of centre one of frame group as the embodiment of spatial domain distortion position, remaining two frame Prospect frame is obtained to embody the feature of time-space domain, so as to which intermediate frame and two prospect frames are collectively formed into time-space domain video block.Ginseng Examine video R_NWith distortion video D_NT frames in corresponding time-space domain video blockCalculation is as follows：

Wherein, B_gThe background image of g group frame of video groups is represented, g ∈ { 1,2 ..., floor (N/3) }, N represent video The totalframes of sequence, m represent the position of the relatively whole video sequence of intermediate frame of frame group, and m=3g-1, t represent that present frame is relative The position of whole video sequence.

The value of quality is bigger, and the quality of the frame of video group is poorer.

By the time-domain perceptual quality of distortion video is generally determined by partial frame second-rate in video sequence, therefore Herein using worst quality convergence method, the acquired multigroup time-space domain video block quality of video sequence is ranked up first, and Extract final local quality Qs of the H% as video sequence of second-rate part_local：

3rd step：Calculate the oeverall quality of video：According to the local quality and global quality of resulting video sequence, depending on Frequency Environmental Evaluation Model BSWQ is as follows：

.The predicted value of model is bigger, and video quality is higher.

Specific embodiment is illustrated below：

1) intend choosing T=1700, H=15；

2) global quality of video sequence and then according to formula (1)-(9) is calculated；

3) video sequence is divided into one group of 3 frame, using formula (10) (11) (12) to handling respectively frame group, built Time-space domain video block, so calculated with formula (13) calculate respectively each frame group quality；

4) after each frame group quality is obtained, using formula (14) calculate video sequence local quality；

5) global quality and local quality of combining video sequences, using formula (15) calculate video sequence total constitution Amount.

6) performance test：

The quality evaluating method proposed is tested from LIVE video databases, wherein including 10 kinds of different scenes Reference video and 150 distortion video sequences.Every kind of video source includes the type of distortion (Wireless of 4 kinds of varying levels Distortion, H.264 IP distortions, compression and MPEG-2 compressions), IP distortions have 3 kinds of varying levels, and its excess-three kind type of distortion is each Different degrees of distortion in own 4, i.e., the reference video under every kind of scene all contain 15 distortion videos.This algorithm uses video Two kinds commonly used in 4 kinds of evaluation indexes that Quality Expert's group (VQEG) proposes are used as evaluation index：Spearman rank order phase Relation number (SROCC), Pearson's linearly dependent coefficient (PLCC).Larger SROCC values and PLCC values represent video quality evaluation Algorithm has preferable accuracy and uniformity.

Table 1 is evaluation performance of the inventive method to different type of distortion videos, it can be seen that using background elimination algorithm To all there is good performance to various type of distortion videos, there is preferable robustness.

Table 2 is evaluation performance of the inventive method for 150 distortion videos, it can be seen that using background elimination algorithm With good versatility.

Table 3 be the inventive method for 250 frame video pa2_25fps.yuv run time, indicate the algorithm Real-Time Evaluation available for video.

Table 1

Table 2

Evaluation index	SROCC	PLCC
			BSWQ	0.8265	0.8437

Table 3

Quality evaluating method	Time (s)
		The algorithm proposed	28.58

Fig. 3 by proposition Environmental Evaluation Model BSWQ predicted value and LIVE video libraries in difference mean subjective fraction (DMOS) scatter diagram.In figure solid line be Logistic function pair video sequences objective evaluation result and subjective data it is non-thread Property matched curve, includes Wireless distortions, H.264 IP distortions, distortion, MPEG distortions.If discrete point is uniformly distributed in plan Close around curve, then it represents that the predicted value of model and the correlation of subjective data are stronger.

Claims

1. a kind of wavelet field method for evaluating video quality eliminated based on background, it is characterised in that this method comprises the following steps：

The first step：Global quality is calculated, using 4 grades of haar wavelet transforms respectively to reference video R_NWith distortion video D_N's Each frame is decomposed, and its decomposition coefficient represents as follows respectively：

C_R(λ, θ, t, i)=DWT (R_t) (1)

C_D(λ, θ, t, i)=DWT (D_t) (2)

Wherein, C_R(λ,θ,t,i),C_D(λ, θ, t, i) represents the coefficient of wavelet decomposition of t frames in reference video and distortion video respectively, t∈[1,N]、R_t,D_tT frames in reference, distortion video sequence are represented respectively；{ λ, θ } represents to be used for thumbnail difference chi respectively The coefficient subband of degree and different directions, θ=2,3,4 represent horizontal, clinodiagonal and vertical subband respectively, and θ=1 represents approximate Subband；I represents t frames, λ yardsticks, the position for the wavelet coefficient that direction is θ；

By the wavelet field different scale of each frame of video and the detail subbands coefficient of different directions, obtain respectively reference video and Fringing coefficient E under distortion video different scale_R(λ,t,i),E_D(λ,t,i)：

<mrow> <msub> <mi>E</mi> <mi>R</mi> </msub> <mrow> <mo>(</mo> <mi>&lambda;</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <msqrt> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>&theta;</mi> <mo>=</mo> <mn>2</mn> </mrow> <mn>4</mn> </munderover> <msubsup> <mi>C</mi> <mi>R</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>&lambda;</mi> <mo>,</mo> <mi>&theta;</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msqrt> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msub> <mi>E</mi> <mi>D</mi> </msub> <mrow> <mo>(</mo> <mi>&lambda;</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <msqrt> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>&theta;</mi> <mo>=</mo> <mn>2</mn> </mrow> <mn>4</mn> </munderover> <msubsup> <mi>C</mi> <mi>D</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>&lambda;</mi> <mo>,</mo> <mi>&theta;</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msqrt> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <mi>E</mi> <mi>S</mi> <mi>I</mi> <mi>M</mi> <mrow> <mo>(</mo> <mi>&lambda;</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mn>2</mn> <mo>&CenterDot;</mo> <msub> <mi>E</mi> <mi>R</mi> </msub> <mrow> <mo>(</mo> <mi>&lambda;</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msub> <mi>E</mi> <mi>D</mi> </msub> <mrow> <mo>(</mo> <mi>&lambda;</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>T</mi> </mrow> <mrow> <msubsup> <mi>E</mi> <mi>R</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>&lambda;</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>+</mo> <msubsup> <mi>E</mi> <mi>D</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>&lambda;</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>T</mi> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>

In formula, T is normal number, and ESIM (λ, t, i) represents t frames in reference, distortion video, λ yardsticks, the Local Phase at the i of position Like property, work as E_R,E_DWhen just the same, value 1；

By calculating the standard deviation of local similarity, obtain the different scale of video sequence single-frame images quality ESIMD (λ, t)：

<mrow> <mi>E</mi> <mi>S</mi> <mi>I</mi> <mi>M</mi> <mi>D</mi> <mrow> <mo>(</mo> <mi>&lambda;</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <msqrt> <mrow> <mfrac> <mn>1</mn> <msub> <mi>N</mi> <mi>c</mi> </msub> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mi>c</mi> </msub> </munderover> <msup> <mrow> <mo>(</mo> <mi>E</mi> <mi>S</mi> <mi>I</mi> <mi>M</mi> <mo>(</mo> <mrow> <mi>&lambda;</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> </mrow> <mo>)</mo> <mo>-</mo> <mi>E</mi> <mi>S</mi> <mi>I</mi> <mi>M</mi> <mi>M</mi> <mo>(</mo> <mrow> <mi>&lambda;</mi> <mo>,</mo> <mi>t</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </msqrt> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <mi>E</mi> <mi>S</mi> <mi>I</mi> <mi>M</mi> <mi>M</mi> <mrow> <mo>(</mo> <mi>&lambda;</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>N</mi> <mi>c</mi> </msub> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mi>c</mi> </msub> </munderover> <mi>E</mi> <mi>S</mi> <mi>I</mi> <mi>M</mi> <mrow> <mo>(</mo> <mi>&lambda;</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>

The single frames quality of video sequence is：

<mrow> <msub> <mi>Q</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>D</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>l</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>&lambda;</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>l</mi> </munderover> <mi>E</mi> <mi>S</mi> <mi>I</mi> <mi>M</mi> <mi>D</mi> <mrow> <mo>(</mo> <mi>&lambda;</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow>

Then, video sequence global quality Q_globalIt is expressed as：

<mrow> <msub> <mi>Q</mi> <mrow> <mi>g</mi> <mi>l</mi> <mi>o</mi> <mi>b</mi> <mi>a</mi> <mi>l</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>Q</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>D</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>

Second step：The local quality of video is calculated using background null method, i.e., first by reference video and distortion video with video 3 frames of Sequentially continuous are one group, are configured to every group of nonoverlapping video block mutually；Secondly, with reference to average background method, it will refer to and regard Background of the frequency frame class mean as the frame group, substitute frame difference method in using former frame as background, using the frame of centre one of frame group as The embodiment of spatial domain distortion position, remaining two frame subtract background and obtain prospect frame to embody the feature of time-space domain, so as to by intermediate frame Time-space domain video block is collectively formed with two prospect frames；

Reference video R_NWith distortion video D_NT frames F in corresponding time-space domain video block_t ^R,F_t ^DCalculation formula is as follows：

<mrow> <msubsup> <mi>F</mi> <mi>t</mi> <mi>R</mi> </msubsup> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>R</mi> <mi>t</mi> </msub> <mo>-</mo> <msub> <mi>B</mi> <mi>g</mi> </msub> <mo>,</mo> <mi>t</mi> <mo>&Element;</mo> <mo>&lsqb;</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>m</mi> <mo>)</mo> <mo>&cup;</mo> <mo>(</mo> <mi>m</mi> <mo>,</mo> <mi>m</mi> <mo>+</mo> <mn>1</mn> <mo>&rsqb;</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>R</mi> <mi>t</mi> </msub> <mo>,</mo> <mi>t</mi> <mo>=</mo> <mi>m</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msubsup> <mi>F</mi> <mi>t</mi> <mi>D</mi> </msubsup> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>D</mi> <mi>t</mi> </msub> <mo>-</mo> <msub> <mi>B</mi> <mi>g</mi> </msub> <mo>,</mo> <mi>t</mi> <mo>&Element;</mo> <mo>&lsqb;</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>m</mi> <mo>)</mo> <mo>&cup;</mo> <mo>(</mo> <mi>m</mi> <mo>,</mo> <mi>m</mi> <mo>+</mo> <mn>1</mn> <mo>&rsqb;</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>D</mi> <mi>t</mi> </msub> <mo>,</mo> <mi>t</mi> <mo>=</mo> <mi>m</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msub> <mi>B</mi> <mi>g</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mn>3</mn> </mfrac> <mo>&CenterDot;</mo> <mrow> <mo>(</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> </mrow> <mrow> <mi>m</mi> <mo>+</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>R</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow>

Wherein, B_gThe background image of g group frame of video groups is represented, g ∈ { 1,2 ..., floor (N/3) }, N represent video sequence Totalframes, m represent the position of the relatively whole video sequence of intermediate frame of frame group, and m=3g-1, t represent that present frame entirely regards relatively The position of frequency sequence；

The method that video sequence single frames quality is calculated according to formula (8), each frame quality in the video block of video time-space domain is carried out Weigh, so as to obtain the quality Q of frame of video group_g：

<mrow> <msub> <mi>Q</mi> <mi>g</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mn>3</mn> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> </mrow> <mrow> <mi>m</mi> <mo>+</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>Q</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>F</mi> <mi>t</mi> <mi>R</mi> </msubsup> <mo>,</mo> <msubsup> <mi>F</mi> <mi>t</mi> <mi>D</mi> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>13</mn> <mo>)</mo> </mrow> </mrow>

The acquired multigroup time-space domain video block quality of video sequence is ranked up, and the H% for extracting second-rate part makees For the final local quality Q of video sequence_local：

<mrow> <msub> <mi>Q</mi> <mrow> <mi>l</mi> <mi>o</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>N</mi> <mi>H</mi> </msub> </mfrac> <munder> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>&Element;</mo> <mi>H</mi> </mrow> </munder> <msub> <mi>Q</mi> <mi>k</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>14</mn> <mo>)</mo> </mrow> </mrow>

Wherein, H represents the quality set { Q of video sequence frame group₁,Q₂,...,Q_g,...,Q_floor(N/3)In it is worst after sequence Partial H% set, N_HRepresent the element sum in set；

3rd step：The oeverall quality of video is calculated, i.e., according to the local quality and global quality of resulting video sequence, video Environmental Evaluation Model BSWQ is as follows：