CN100559880C

CN100559880C - A kind of highly-clear video image quality evaluation method and device based on self-adapted ST area

Info

Publication number: CN100559880C
Application number: CN 200710140426
Authority: CN
Inventors: 孟放; 姜秀华; 章文辉; 孙慧; 杨爽; 许江波
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2007-08-10
Filing date: 2007-08-10
Publication date: 2009-11-11
Anticipated expiration: 2027-08-10
Also published as: CN101146226A

Abstract

The present invention is a kind of highly-clear video image quality evaluation method and device based on self-adapted ST area (space-time subregion).Self-adapted ST zone definition module input connects original video sequence and impaired sequences to be measured, according to original video content definition self-adapted ST zone; Module is divided according to defined ST area dividing criterion in the self-adapted ST zone, respectively original series and impaired sequences is divided the self-adapted ST zone, as the benchmark that extracts characteristic parameter; The characteristic parameter extraction module is extracted the characteristic parameter of spatial information, temporal information and the time and space mixed information of original series and impaired sequences respectively based on the ST zone of having divided, obtains original series and impaired sequences characteristic of correspondence parameter group; The characteristic parameter group that quality degree of injury computing module obtains according to the comparative feature parameter extraction module is calculated the right difference value of list entries, finishes the measurement of quality degree of injury.

Description

A kind of highly-clear video image quality evaluation method and device based on self-adapted ST area

Technical field

The present invention relates to a kind of method and device of highly-clear video image quality evaluation, specifically a kind of method and device that utilizes self-adapted ST area to realize highly-clear video image quality evaluation.

Background technology

HDTV (High-Definition Television) (HDTV) is the five-star form of service of current Digital Television, has advantages such as definition height, picture dimension is big, bright in luster, telepresenc is strong.Continuous development along with digital video compaction technique, HD video has also obtained using comparatively widely in daily life, the particularly appearance of HDTV (High-Definition Television) program in recent years, and the development of video compression coding-decoding technology, the digital high-definition image quality evaluation becomes a hot issue in the research gradually.

Traditional image/video quality evaluation method can be divided into subjective assessment and objective evaluation two big classes.Wherein subjective assessment is the direct evaluation of user to video quality, is a kind of important and reliable evaluation method.This method is forming metastable series standard in using for many years, as is used for the ITU-RBT.500-11 standard of Standard Definition Television subjective assessment and is used for the ITU-RBT710-2 standard etc. of HDTV (High-Definition Television) subjective assessment.Subjective assessment has visual result, characteristics such as confidence level height; But it is had relatively high expectations to test environment, and measuring process complexity, time-consuming, and evaluation result varies with each individual.In addition, subjective assessment can't realize online video quality measurement.And the method for objectively evaluating measurement index is objective, has repeatability, realizes automatically monitoring and suitable various application occasions easily easily.

The method for objectively evaluating of digital picture quality can be divided into three kinds, degree of dependence to non-distorted video (or claiming original series) during according to measurement is divided into: based on the video quality measurement (Full-Reference Video Quality Metric:FR-VQM) of full reference frame, based on reduction reference frame (Reduced-Reference, RR-VQM) with based on no reference frame (No-Reference, video quality measurement NR-VQM).Wherein NR-VQM is without any need for the data from raw video image, and only at receiving terminal impaired video image is measured, by the degree of injury that video is measured in the extraction and the statistical analysis of some feature (, technical indicators such as spatial frequency poor, movable energy difference poor as edge energy).Because do not consider the characteristic of initial data in the evaluation procedure, therefore, this method is lower to the susceptibility of some distortion, the accuracy of evaluation result more other two kinds poorer slightly.For high-quality HDTV (High-Definition Television) program, FR-VQM and RR-VQM be more commonly used, also be objective evaluation technology more reliably.Therefore, for the evaluation study of HD video quality many from full reference and reduction with reference to these two angle analysis.

Wherein, the method thinking of FR-VQM as shown in Figure 1.FR-VQM by the statistical analysis to these differences, and in conjunction with vision perception characteristic, obtains final evaluation result to be the benchmark of comparison by the difference between the pixel frame by frame between impaired video and the original video sequence.Different image processing methods has been determined different FR-VQM methods with different error image analytical methods.This method can at utmost keep the difference of two sequences.In the FR-VQM method, combine the human eye vision apperceive characteristic, can obtain the objective evaluation result more consistent with the subjective assessment result.Y-PSNR in the signal analysis (PSNR) is the simplest a kind of FR-VQM method, and it hints obliquely at figure by the difference that calculates original image and damaged image with pursuing pixel frame by frame, calculates root-mean-square value then, obtains the quality evaluation value to damaged image.This method is in conjunction with any characteristic of human vision mechanism, but to the mass measurement of general pattern/video, still can obtain measurement result comparatively accurately, therefore, and a kind of FR-VQM method of going to weigh other with reference to algorithm of Chang Zuowei.

The implementation framework of RR-VQM as shown in Figure 2.These class methods are extracted some characteristic parameter from original video and impaired video at first respectively, secondly these features are carried out statistical analysis to obtain original video and impaired video characteristic of correspondence statistics, then these two groups of statistical nature information are compared calculating, analyze multiple picture quality damage, as fuzzy, blocking effect, image jump (losing frame or repeating frame), noise, chromatic distortion, ring etc., can the indicating image extent of damage or the data of definition picture quality to obtain.For FR-VQM, the major advantage of this method is that the finite character data that only need transmission to extract in the evaluation procedure get final product from original series, and required additional bandwidth is less, therefore can be used for real-time monitoring and test.Participate in characteristic parameter relatively because the evaluation result of the method depends primarily on, therefore, the definition of characteristic parameter and extraction will directly have influence on the accuracy of evaluation result.

These two class methods have its different emphasis under study for action respectively.The FR-VQM algorithm can analyze the difference of pixel scale between the image, and comes these difference is weighted processing by the apperceive characteristic in conjunction with human eye vision, to obtain and the subjective assessment extraordinary objective evaluation result of fitness as a result.But computational process is comparatively complicated, and data volume is big, is difficult to realize on-line measurement; And RR-VQM method defined feature parameter is at first extracted the characteristic parameter of original video and impaired video then respectively, obtains quality evaluation result to damaged image by these parameters relatively.In the definition and comparison of parameter, RR-VQM also will consider the accuracy that improves evaluation result in conjunction with vision perception characteristic.Because RR-VQM participates in just a part of data of original image of computing, therefore, its evaluation result and subjective result's consistency then depends on the method for different designs.But this method fast operation, and can be implemented in the on-line measurement of transmitting the arbitrary end in two ends by auxiliary channel.

The video quality assessment of China research at present also is in progressively developing stage, comprises on the one hand the external existing comparatively ripe evaluation system and the analysis and research of evaluation algorithms, analyzes human vision mechanism on the one hand, knows that for some visual sense characteristic carries out modelling.In addition, the method for objectively evaluating for image and video quality has also carried out exploration and improving existing algorithm to a certain degree.

Along with the proposition of various video measuring methods, ITU is organized in 1997 and has set up a VQEG of informal organization (Video Quality Experts Group) aspect international, and its research emphasis is the video quality measurement method.At present at the objective evaluation algorithm research comparative maturity of SD video, no matter be FR-VQM algorithm, or RR-VQM algorithm, comparatively ripe achievement in research is all arranged.VQEG has also issued some preliminary standards.But, also there is not comparatively ripe achievement in research at present for the high definition sequence.

Subject matter in the FR-VQM algorithm is the research of vision mode and sets up process.Along with people to the going deep into of human visual system research, the apperceive characteristic of vision mechanism and perception limitation constantly are attached in the objective evaluation system.But how to design and realize that an effective vision mode then is a difficult problem.Nineteen eighty-two Lukas and Budrikis are applied to vision mode in the image quality measure for the first time, but the also just individual single channel model of the vision mode of this moment is not distinguished the different qualities of information in visually-perceptible such as brightness, colourity.In multichannel vision mode research, the most famous is JND (Just NoticeDifference) model of U.S.'s Sarnoff laboratory proposition in 1997, by the further investigation to the human visual system, but this model is accurately measured the human vision perception position in the differential image.This algorithm is attached in the actual measuring equipment (PQA300) by Tektronix company at present, has successfully realized commercialization.The video communication laboratory of Japan KDD company has proposed the video quality evaluation algorithm based on three layers of noise weighting, in addition, Switzerland EPFL laboratory is by being transformed into input video corresponding color space, pass through the tower decomposition of time-space then respectively, set up a plurality of measurement passages, but measurement result is converted into the error view at last.In addition, at the video coding mechanism based on DCT, the proposition of DVQ algorithm can improve the assess performance of video quality evaluation for certain class particular video frequency.Though in video quality measurement, the algorithm that a plurality of research institution proposed all can reach consistency preferably with the subjective assessment result, but VQEG compares the multinomial evaluation that above-mentioned various models carry out, and other are several but to find do not have which kind of algorithm to win in each comparing element.Wherein, though PSNR is considered to the FR-VQM algorithm based on the analog signal proposition, algorithm is very simple, than other complicated evaluation algorithms based on visual modeling, PSNR also can reach consistency preferably with subjective assessment, PSNR slight superiority also under some test environment.What proposed at present is to consider the SD signal more than the video quality evaluation system of vision mechanism, for the video sequence of this data volume of high-definition digital video huge (5 times SD data volumes), if present FR-VQM algorithm is grafted directly in the HD video quality evaluation, obviously be ill-considered.How must to consider that limitation in conjunction with visually-perceptible is to reduce pending data volume significantly.

At present do not have the fine characteristic of utilizing HD video (data volume is big, picture dimension is big, rich color etc.) based on the picture quality objective evaluation algorithm of SD video, and do not have the people of consideration more in watching video the vision attention characteristic and the limitation of perception.In the advantage of considering user's viewpoint and RR-VQM algorithm, we propose the present invention.

Summary of the invention

In order to overcome deficiency of the prior art, the object of the present invention is to provide a kind of method and device that utilizes self-adapted ST area to realize highly-clear video image quality evaluation.

For finishing above-mentioned purpose, a kind of method of utilizing self-adapted ST area to realize highly-clear video image quality evaluation, this method may further comprise the steps:

1) input video sequence;

2) according to the criteria for classifying in original video sequence definition self-adapted ST zone;

3) respectively original series and impaired sequences are generated the self-adapted ST zone;

4), extract the characteristic specified parameter based on the ST zone of having divided;

5) characteristic parameter of contrast original series and impaired sequences, the calculated mass degree of injury, and provide final objective evaluation result.

For finishing the foregoing invention purpose, the present invention also provides a kind of self-adapted ST area that utilizes to realize the highly-clear video image quality evaluation device, comprises self-adapted ST zone definition module, module, characteristic parameter extraction module, quality degree of injury computing module are divided in the self-adapted ST zone, it is characterized in that

Described self-adapted ST zone definition module, based on the sequence content of input original series, the division methods in definition self-adapted ST zone;

Module is divided in described self-adapted ST zone, according to defined ST area dividing criterion, respectively original series and impaired sequences is divided the self-adapted ST zone, as the benchmark that extracts characteristic parameter;

Described characteristic parameter extraction module, the characteristic parameter according to spatial information, temporal information and the time and space mixed information of the required measurement of video quality evaluation extracts the characteristic parameter group of two sequences based on the ST zone respectively;

Described quality degree of injury computing module by the characteristic parameter that relatively extracts, calculates the right difference value of list entries, i.e. the measurement of quality degree of injury.

The present invention has tangible advantage and good effect.The division in self-adapted ST zone can reduce the data volume that participates in characteristic parameter relatively on the one hand greatly; On the other hand, video content has not only been considered in the division in self-adapted ST zone, and has considered user's viewpoint, can guarantee that objective evaluation result and subjective assessment result have higher consistency.

Description of drawings

Fig. 1 is video quality measurement (FR-VQM) the model structure sketch based on full reference frame;

Fig. 2 is based on reduction reference frame (RR-VQM) model structure sketch;

Fig. 3 is the define method structure diagram according to ST of the present invention zone;

Fig. 4 is according to the RR-VQM algorithm block diagram based on the self-adapted ST zone of the present invention;

Fig. 5 is according to three grades of ST dividing region method schematic diagrames of the present invention;

Fig. 6 is the high definition test video image sequence sectional drawing according to participation experiment test of the present invention;

Fig. 7 is based on three grades of ST zone and fitness analysis chart based on single ST zone objective evaluation result and subjective assessment result according to of the present invention;

Fig. 8 realizes highly-clear video image quality evaluation apparatus system frame diagram for the self-adapted ST area that utilizes according to the present invention;

Fig. 9 realizes the highly-clear video image quality evaluation method flow chart for the self-adapted ST area that utilizes according to the present invention;

Figure 10 strengthens filter schematic for level and the vertical edge that the video sequence to input according to the present invention carries out filtering operation;

Figure 11 is a ST dividing region schematic diagram when not having special area-of-interest according to picture of the present invention.

Embodiment

Below in conjunction with Figure of description the specific embodiment of the present invention is described.

Fig. 3 is the define method structure diagram according to ST of the present invention zone.With reference to figure 3,, generate different ST zones, i.e. self-adapted ST zone according to video content for different video sequences.The full name in ST zone is Spatial-Temporal Sub-Region, i.e. the space-time subregion.This is based on a technology in the method for objectively evaluating that reduces reference frame.Whole video stream can be regarded the picture frame of playing by given pace as, and image itself is a two-dimensional space, adds time dimension, is three dimensions (video sequence on Fig. 3 left side).Before three-dimensional data processing to this, we are divided into three neighbour n-dimensional subspace ns with it, and wherein original image space is divided into the sub-piece (spatial sub unit) of 8*8, and on time shaft, we get five frames as chronon unit.Video flowing like this, originally just can be divided into three n-dimensional subspace ns of several 8*8*5." a ST district " on Fig. 3 the right is the enlarged drawing of a spatial sub piece of left figure, i.e. space 8*8, continuous 5 frames.

Fig. 4 is according to the RR-VQM algorithm block diagram based on the self-adapted ST zone of the present invention.As shown in Figure 4, at first original video sequence and (handling through compression coding and decoding) impaired video sequence are generated the self-adapted ST zone, promptly according to video content definition self-adapted ST zone; Secondly based on the ST zone of having divided, extract the characteristic specified parameter; Contrast the characteristic parameter of original series and impaired sequences at last, the calculated mass degree of injury, and provide final objective evaluation result.

The video content here mainly comprises following a few part: the area-of-interest of camera lens switching, human face region, user's appointment etc.

In order to guarantee that objective evaluation result and subjective assessment result have higher correlation, the video sequence playout length that participates in objective evaluation is about ten seconds.

Fig. 5 is according to three grades of ST dividing region method schematic diagrames of the present invention.With reference to figure 5, for same picture, the central region of definition picture is as than important area, i.e. zone 1.In this zone, the size definition in ST zone is 8*8*5; In inferior interesting areas, the size definition in ST zone is 16*16*5, i.e. zone 2; At the neighboring area or the non-area-of-interest of screen, the size definition in ST zone is 32*32*5, i.e. zone 3.

Fig. 6 is for participating in the high definition test video sequence sectional drawing of experiment test.As shown in Figure 6, the selection of high definition test video image sequence should be satisfied the basic standard that the required test pattern material of subjective evaluation is selected.

Fig. 7 is based on three grades of ST zone and fitness analysis chart based on the objective evaluation result and the subjective assessment result in single ST zone.(a) be three grades of ST administrative division maps, (b) be divided into the ST administrative division map of even size for whole sequence based on middle section.As shown in Figure 7, to the calculating of error of fitting, the error of fitting in multistage as can be seen ST zone is less than the error of fitting of even ST zone with subjective result.

Fig. 8 realizes highly-clear video image quality evaluation apparatus system frame diagram for the self-adapted ST area that utilizes according to the present invention.As shown in Figure 8, the self-adapted ST area that utilizes of the present invention realizes that the highly-clear video image quality evaluation device comprises:

Self-adapted ST zone definition module, its input is the high clear video image original video sequence, according to the video content of original series, and considers that the user watches viewpoint, the division methods in definition self-adapted ST zone.

Module is divided in the self-adapted ST zone, and input connects self-adapted ST zone definition module, and according to defined ST area dividing criterion, respectively original series and impaired sequences is divided the self-adapted ST zone, as the benchmark that extracts characteristic parameter.

The characteristic parameter extraction module connects the self-adapted ST zone and divides the output of module, and the characteristic parameter according to American National telecommunication bureau (NTIA) universal model definition extracts the characteristic parameter group of two sequences based on the ST zone respectively.Time/the space characteristic parameter that extracts is as follows:

A) the mean square deviation f of each ST regional luminance amplitude _SI13=std{R (i, j, t) }, wherein, (i j) is the space coordinate of pixel in this ST zone, and t is the relative coordinate of pixel on time shaft; R is the edge amplitude information of the pixel after boundary filter (strengthen the edge, reduce the filter of noise simultaneously) is handled; Std is the mean square deviation of this ST area pixel brightness.

B) different directions can cause different types of image lesion, the subjective assessment result of human eye is had different influences, so this parameter of direction play a different role in the objective evaluation of image.Consider directivity, can be calculated as follows two parameters:

And definition f _HV13=mean[HV (i, j, t)] } | _P/ { mean[HV (i, j, t)] }, this parameter-definition is level, vertical direction and the ratio of non-level, vertical edge energy, it is (fuzzy if reconstructed image has occurred also to get rid of ill-defined influence, then the edge energy of level, vertical direction and non-level, non-perpendicular direction all can descend, and their ratio then can not be subjected to big influence).

C) analysis of the chrominance information in ST zone, adopt following formula:

f _{COLOR_COHER}＝{mean(C _B(i，j，t))，W _R*mean(C _R(i，j，t))}

Wherein: i, j, t ∈ { ST zone }, C _B(i, j, t) the blue difference signal value of each picture element in the expression ST zone, C _R(i, j, t) the red color difference signal value of each picture element in the expression ST zone, mean represents to average.

D) for the analysis of luminance contrast, can adopt following formula:

f _CONTRAST＝std[Y(i，j，t)]，

Wherein: i, j, t ∈ { ST zone }, Y (i, j, the t) brightness signal value of each picture element in the expression ST zone, std represents to ask mean-square value.

E) information analysis on the time-domain is as follows:

f _ATI＝std|Y(i，j，t)-Y(i，j，t-1)|

Wherein: i, j, t ∈ { ST zone }, Y (i, j, the t) brightness signal value of each picture element in the expression present frame ST zone, Y (i, j, the t-1) brightness signal value of each picture element in the ST zone of expression former frame and present frame correspondence position, std represents to ask mean-square value.

F) the intersection information analysis of contrast and absolute time:

f _{CONTRAST_ATI}＝f _CONTRAST×f _ATI

f _{CONTRAST_ATI}＝log ₁₀?f _CONTRAST×log ₁₀?f _ATI

Quality degree of injury computing module, its input connection features parameter extraction module, and two sequences extracting according to the characteristic parameter extraction module are calculated the quality degree of injury of video based on the feature ginseng group in ST zone.By the characteristic parameter that relatively extracts, calculate the right difference value of list entries, i.e. the measurement of quality degree of injury.The method of calculating the quality degree of injury of impaired video is: for the ST zone of original series and impaired sequences division, calculate the parameters of above-mentioned definition respectively, different in kind (general video, videoconference, Standard Definition Television etc.) according to the sequence that participates in test, the video measuring model that substitution is different can obtain The ultimate results.The universal model of this algorithm substitution NTIA in implementation procedure, that is, and NTIA General VQM model.

Fig. 9 realizes the highly-clear video image quality evaluation method flow chart for the self-adapted ST area that utilizes according to the present invention.Hereinafter will realize that to the self-adapted ST area that utilizes of the present invention highly-clear video image quality evaluation method is described in detail as follows with reference to figure 9:

At first, in step 901, input video sequence, and carry out filtering operation, to strengthen the edge, remove noise.Video sequence comprises original series and impaired sequences.

In step 902, the original video sequence of importing is carried out the camera lens change detection.The high definition test video sequence then mostly is shot-cut if comprise shot transition, adopts the single threshold detection method to carry out the camera lens change detection at this: at first to read in pending video sequence; Next calculates the color histogram of every width of cloth image; Find the solution the average and the mean square deviation of whole sequence color histogram at last.If the histogram difference of front and back two frames, then judges have shot-cut to take place greater than three times mean square deviation.

In step 903, judge whether the video sequence of input exists camera lens to switch, then forward step 905 to as not existing camera lens to switch.

In step 904, exist camera lens to switch as video sequence, 5 frames after then camera lens being switched are given up, and do not participate in following ST area dividing.

In step 905, the video sequence of importing is carried out human face region detect.Appear at people's face camera lens in the cycle tests as Face Detection, how to occur, and occupy the certain limit of screen with the front, otherwise, do not consider.This detection is carried out after camera lens is cut apart, and therefore, only considers that a human face region within the camera lens detects.The human face region detection method is as follows: at first define the human face region template of a rectangle frame, its ratio of width to height R (R=W/H) is [0.4,0.85] within the specific limits; The degree of filling P=area/WH of human face region is [0.65,0.8] within the specific limits also; Color value in the template is the reference value of the colour of skin, and a given changes of threshold scope; Template size is defined as 200*140 (line number * columns); Secondly for the current image that reads in, sliding form, when pixel in the template and template elements difference during less than assign thresholds, then the position is a human face region; Be initial with this zone once more, people's face complete area is determined in diffusion; Last in the ST zone definitions in whole camera lens, is center interested with different human face regions respectively then if picture exists more than one to satisfy the human face region that template requires, and divides three grades of ST zones.At a time promptly, owing to the viewpoint reason, can only define an area-of-interest.

In step 906, judge whether video sequence exists human face region, do not exist human face region then to forward step 908 to as video sequence;

In step 907, there is human face region as video sequence, then the position with human face region is the center, divides three level adaptation ST zones by Fig. 5;

In step 908, judge whether video sequence exists other region-of-interest, do not exist other region-of-interest then to forward step 910 to as video sequence;

In step 909, there is other region-of-interest as video sequence, be the center then with user-defined position, divide three grades of ST zones.The user provides the position of area-of-interest, and as (800,600), or from time shaft frame (50:800,600), promptly since the 50th frame, it is the picture quality in three grades of ST zones at center that the user will pay close attention to screen coordinate (800,600).

In step 910, when all setting up as if three kinds of Rule of judgment, the position in ST zone and progression can be adjusted according to practical application; Three kinds of Rule of judgment all do not satisfy, and then with the screen center are to divide mid point, generate the zone of three grades of ST different resolutions.Consider the mobility of user's viewpoint, in same camera lens inside, we carry out following viewpoint and move in the division of reality, and five-pointed star as shown in Figure 11 along with the broadcast of video flowing, respectively as ST dividing region center, is realized three grades of ST dividing region.

Figure 10 strengthens filter schematic for level and the vertical edge that the video sequence to input according to the present invention carries out filtering operation.As shown in figure 10, in order to reduce The noise, adopt 13 * 13 edges in spatial domain to strengthen filtering method, video image is carried out the edge respectively strengthen on vertical direction and horizontal direction, be respectively level and vertical edge and strengthen filter, two filters are transposition each other.

w_{x} = k * (\frac{x}{c}) * {- (\frac{1}{2}) {(\frac{x}{c})}^{2}}

Wherein: the value of x is from 0-N, and value is from 0-6 the filter of using in this design; C is a constant, is passband width, and in the test, the c value is 2 at present; K is for gain, and is identical with the gain of Sobel filter.Calculate gained weight coefficient w _iAs follows:

w1＝0.0696751，w2＝0.0957739，w3＝0.0768961

w4＝0.0427401，w5＝0.0173446，w6＝0.0052652

Set f _x(x, y) and f _y(x, y) be respectively level and vertical edge strengthen the back image (x, pixel value y), then

f_{x} (x, y) = H * f = Σ_{j = - 6}^{6} Σ_{i = - 6}^{6} (H (i, j) \times f (x + i, y + j))

f_{y} (x, y) = V * f = Σ_{j = - 6}^{6} Σ_{i = - 6}^{6} (V (i, j) \times f (x + i, y + j))

Further, it is as follows to calculate the marginal information of each pixel according to polar coordinates:

R (i, j, t) = \sqrt{H {(i, j, t)}^{2} + V {(i, j, t)}^{2}}

θ (i, j, t) = arctg [\frac{V (i, j, t)}{H (i, j, t)}]

Wherein: i, j, t ∈ { ST subregion }, (i, j t) represent the amplitude information of the marginal information of each pixel to R, and (i, j t) represent the directional information of the marginal information of each pixel to θ.

The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within the claim scope of the present invention.

Claims

1. highly-clear video image quality evaluation method based on adaptive space-time ST district, this method may further comprise the steps:

1) input video sequence;

2) according to the division benchmark in original video sequence definition self-adapted ST zone;

5) characteristic parameter of contrast original series and impaired sequences, the calculated mass degree of injury, and provide final objective evaluation result, wherein

The step in described generation self-adapted ST zone comprises:

3.1) read in original video sequence and impaired sequences, and it is carried out filtering operation;

3.2) judge whether the original video sequence of input exists camera lens to switch;

3.3) exist camera lens to switch as video sequence, 5 frames after then camera lens being switched are given up, and do not participate in the ST area dividing;

3.4) judge whether video sequence exists human face region;

3.5) there is human face region as video sequence, then the position with human face region is the center, divides three level adaptation ST zones;

3.6) judge whether video sequence exists other area-of-interest;

3.7) there is other area-of-interest as video sequence, then the region-of-interest with current definition is the center, divides three level adaptation ST zones.

2, method according to claim 1 is characterized in that, the playout length of the video sequence in the described step 1 is about ten seconds.

3, method according to claim 1 is characterized in that, the method in described division three level adaptation ST zones is:

Be regional 1 than important area in the definition picture, the size definition in ST zone is 8*8*5;

Definition time interesting areas is zone 2, and the size definition in ST zone is 16*16*5;

The neighboring area of definition screen is zone 3, and the size definition in ST zone is 32*32*5;

Wherein 5 represent 5 continuous frames.

4, method according to claim 1 is characterized in that, described filtering operation is to adopt 13 * 13 edges in spatial domain to strengthen filtering method, video image is carried out the edge respectively strengthen on vertical direction and horizontal direction.

5, method according to claim 4 is characterized in that, described to carry out on vertical direction and horizontal direction that the edge strengthens be that employing level and vertical edge strengthen filter, and two filters transposition each other.

6, method according to claim 1 is characterized in that, whether the video sequence of described detection input exists camera lens to switch may further comprise the steps:

1) reads in pending video sequence;

2) color histogram of the every width of cloth image of calculating;

3) find the solution the average and the mean square deviation of whole sequence color histogram;

4) if the histogram difference of front and back two frames greater than three times mean square deviation, judges that then having camera lens to switch takes place.

7, a kind of highly-clear video image quality evaluation device based on adaptive space-time ST district comprises self-adapted ST zone definition module, and module is divided in the self-adapted ST zone, the characteristic parameter extraction module, and quality degree of injury computing module is characterized in that,

Described self-adapted ST zone definition module, based on video content and user's viewpoint of input original series, the division methods in definition self-adapted ST zone;

Described quality degree of injury computing module by the characteristic parameter that relatively extracts, calculates the right difference value of list entries, i.e. the measurement of quality degree of injury; Wherein:

Described self-adapted ST zone is divided module and is generated the self-adapted ST zone as follows:

Read in original video sequence and impaired sequences, and it is carried out filtering operation;

Whether the original video sequence of judging input exists camera lens to switch;

Exist camera lens to switch as video sequence, 5 frames after then camera lens being switched are given up, and do not participate in the ST area dividing;

Judge whether video sequence exists human face region;

Have human face region as video sequence, then the position with human face region is the center, divides three level adaptation ST zones;

Judge whether video sequence exists other area-of-interest;

Have other area-of-interest as video sequence, then the region-of-interest with current definition is the center, divides three level adaptation ST zones.

8, the highly-clear video image quality evaluation device based on self-adapted ST area according to claim 7 is characterized in that, the characteristic parameter that described characteristic parameter extraction module is extracted comprises:

The mean square deviation of each ST regional luminance amplitude; The ratio of level, vertical direction and non-level, non-perpendicular edge energy; The chrominance information in ST zone; The luminance contrast information in ST zone; The information of time-domain; And the intersection information of contrast and absolute time.