CN104243970A

CN104243970A - 3D drawn image objective quality evaluation method based on stereoscopic vision attention mechanism and structural similarity

Info

Publication number: CN104243970A
Application number: CN201310565886.XA
Authority: CN
Inventors: 张冬冬; 黄佳禾; 臧笛; 刘典; 陈艳毓
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2013-11-14
Filing date: 2013-11-14
Publication date: 2014-12-24

Abstract

The invention discloses a 3D drawn image objective quality evaluation method based on a stereoscopic vision attention mechanism and structural similarity and relates to the technical field of image quality evaluation. The position relation between an original reference image and a 3D drawn image is obtained in a matching mode through an SIFT (scale-invariant feature transform) model, and multiple concrete distortions such as the compression distortion, the transmission distortion and the drawing distortion are measured through an SSIM model. The stereoscopic vision attention mechanism of a human eye vision system is reflected by drawing a 3D saliency map of the image through calculation, the saliency average value of all SIFT matching windows is worked out to serve as the saliency factor, the saliency factor serves as the weight of each window to perform weighted summation on all SSIM values, and then the final quality evaluation result is obtained. The result shows that the objective quality evaluation method is highly relevant to the subjective feeling.

Description

Objective quality evaluation method of 3D (three-dimensional) drawing image based on stereoscopic vision attention mechanism and structural similarity

Technical Field

The invention relates to the technical field of image quality evaluation.

Technical Field

Image quality evaluation methods can be roughly classified into two types: the first category is subjective quality assessment methods, and many standards have been established so far, such as in document 1 (see International Telecommunication Union (ITU) Radio communication Sector: 'method for the objective assessment of the quality of images', ITU-R BT.500-11, January 2002). These standards require a common observer without prior experience, in the context of which the image to be evaluated (and in some cases the original image as a reference) is observed and then scored for quality. However, the subjective quality evaluation method requires a strict test environment, a considerable number of observers, and time and labor consumption, so that the method cannot be applied in real time and is often used for research in laboratories.

The second category is objective quality assessment methods, which use algorithms instead of humans as the quality criteria for assessment. The Video Quality Experts Group (VQEG) to which the international telecommunications union belongs proposes 3 types of Video Quality testing methods, i.e., Video Quality metrics of full reference, partial reference, and no reference data. The full reference method provides all original video for reference, and the quality of the distorted video is obtained by comparing the distorted video with the original video. The partial reference evaluation method only needs to use partial original sequence information of the video and judge the quality of the video by comparing the characteristics of the video. In the non-reference evaluation method, no reference video can be provided, and the quality of data is judged by analyzing the data of the video to be tested, so that the method is difficult to realize, and the research result is less.

Currently, most image quality assessment studies focus on full-reference and partial-reference assessment methods. For example, in document 2 (see z. Wang, a.c. Bovik, h.r. Sheikh and e.simocell, "IQA: From error visibility to Structural Similarity", IEEE trans. Image Process, vol. 13, No. 4, pp. 600 + 612, apr. 2004.), a Structural Similarity (SSIM) model is proposed, which simulates the theory of Structural Similarity in the human visual system as a whole and can effectively evaluate the perceived quality of a 2D Image by using the measure of Structural information as an approximation of the perceived quality of the Image. Document 3 (c.t.e.r. headwage, s.t. world, s. dog, et al, "Quality evaluation of color plus depth map-based stereoscopic video", IEEE Journal of Selected videos in Signal Processing, 2009, 3(2): 304-318.) evaluates the Quality of left and right viewpoint videos of a stereoscopic video represented by a color video plus depth video based on the SSIM model. The method is based on the premise that the drawn virtual viewpoint video has an original reference video, and the quality evaluation model is designed. In practical application, an original reference video of a virtual viewpoint often does not exist, and the model cannot be applied.

With the development of the visual attention mechanism research in recent years, scholars propose various visual attention detection models, such as: document 4 (L. Itti, C. Koch, E. Niebur, et al, "" A model of saliency-based visual engagement for rapid scene analysis "".IEEE Trans. on Pattern Analysis and Machine IntelligenceITTI model of vol.20, No.11, pp. 1254-1259, 1998), reference 5 (X. Hou, L. Zhang, "clinical detection: A spectral residual approach"IEEE Conference on Computer Vision and Pattern Recognition,2007. CVPR'07, 2007: 1-8), document 6 (j. Harel, c. Koch, p. Perona, "Graph-based visual representation",Advances in neural information processing systems. 2006: 545-552) GBVS (Graph-based visual representation). Since visual attention is an important influence factor of image quality evaluation, there have been started trainees who add a visual attention mechanism to the process of quality evaluation, for example, document 7 (gazing, anping, gazing, zhuyu, zhangyang, stereoscopic image quality evaluation of visual attention [ J]The China graphic image newspaper 2012, 17(006): 722) 725) detects texture maps and depth maps of two paths of images of a stereo image respectively based on an ITTI model to extract interested areas, performs weighted quality evaluation on the left path of image and the right path of image according to the interested areas, and takes the average value of the quality evaluation results of the two paths of images as the final evaluation result of the stereo image. The method also assumes that two paths of images of the stereo image have original reference images.

Since many existing 3D cameras usually only acquire a color image of one view angle and a depth image corresponding to the color image, another view angle image required for synthesizing a stereoscopic image is often generated by drawing using a dibr (depth image based rendering) technology. In this case, the quality of the 3D rendering image is evaluated without an original rendering view angle image being available for reference, and the original color image and the 3D rendering image collected by the 3D camera available for reference are not a single view angle, and have a certain parallax. This situation renders the conventional 3D rendering image quality evaluation method useless. It is therefore necessary to design a new objective quality assessment scheme for 3D rendered images.

Disclosure of Invention

The invention provides a novel objective quality evaluation method of a 3D drawing image by combining a stereoscopic vision attention mechanism, the position relation of an original reference image and the 3D drawing image is matched through an SIFT (Scale-invariant feature transform) model, and various specific distortions such as compression distortion, transmission distortion, drawing distortion and the like are measured by using an SSIM (small Scale-invariant feature transform) model. The three-dimensional visual attention mechanism of a human visual system is reflected by calculating a 3D saliency map of a drawn image, the mean value of the saliency of each SIFT matching window is calculated as a saliency factor, and the mean value is used as the weight of each window to perform weighted summation on all SSIM values to obtain a final evaluation quality result.

Therefore, the implementation steps of the technical scheme provided by the invention are as follows:

A3D rendering image objective quality evaluation method based on a stereoscopic vision attention mechanism adopts the following technical scheme, and comprises the following steps:

step S1: and drawing the original color image and the original depth image into a reference virtual viewpoint image for reference.

Step S2: SIFT points are respectively extracted from the original color image and the reference virtual viewpoint image.

Step S3: and (5) establishing a matching relation between the SIFT points by using the characteristic values of the SIFT points obtained in the step (S2) to form a matching point pair. In order to reduce the error, an abnormal point removing operation is performed.

Step S4: and carrying out saliency detection on the original color image and the depth image by using the stereoscopic vision attention model to obtain a saliency map of the stereoscopic image.

Step S5: taking windows from the original color image and the 3D rendered image to be measured, respectively, with the pixel position of the SIFT matching point pair obtained in step S3 as the center. Only one pixel window is reserved with overlapping pixels, and repeated calculation is avoided.

Step S6: SSIM calculation is performed in the matching window obtained in step S5 to measure various distortions in the window.

Step S7: the weight value of the matching window pair obtained in step S5, that is, the weight of the SSIM value in this window pair, is defined as the average value of the saliency values of the window in the original color image obtained in step S4, divided by the average value of the ownership weight values.

Step S8: weighted averaging is performed on all SSIM values obtained in step S6.

The key technical points reflected by the technical scheme are as follows:

1. in many practical 3D rendered images, there is generally a small displacement between the 3D rendered image to be measured and the original color image, which makes the conventional objective quality evaluation method ineffective. The invention proposes to use a SIFT model to establish a matching relationship between an original color image and a 3D rendered image. Each pair of SIFT points marks that the two pixels are at the same position in the actual scene.

2. The invention considers the stereoscopic vision attention mechanism of the human eye vision system in the process of objective quality evaluation. And calculating a 3D saliency map of the drawn image, using the average value of the saliency of each window as a saliency factor, and using the final saliency factor as the weight of each SSIM value. The method can effectively reflect the three-dimensional attention mechanism of the human eye vision system, so that the final objective quality evaluation result is more matched with the perception result of the human eye vision system.

The method has the beneficial effects that: the SIFT model is used for establishing the matching relation between the original color image and the 3D drawn image, so that the condition that the traditional objective quality evaluation method is ineffective is avoided. Meanwhile, the invention also considers the stereoscopic vision attention mechanism, so that the objective evaluation quality is more matched with the human visual system. The result shows that the objective quality evaluation method provided by the invention has high correlation with subjective feeling.

Drawings

FIG. 1 is a block diagram of the objective quality assessment method of 3D rendered images based on the stereoscopic vision attention mechanism and structural similarity of the present invention.

Fig. 2 is an original color image of middle of an example of the present invention.

FIG. 3 is the original depth image of the middle example of the present invention.

Fig. 4 is a 3D rendering image to be evaluated of the invention example middle.

Fig. 5 is a perspective saliency map result of the example middle of the present invention.

Detailed Description

The invention is further illustrated by the following specific examples in conjunction with the accompanying drawings:

the example provided by the invention adopts MATLAB R2010a as a simulation experiment platform, takes a bmp grayscale image middle of 1396 × 1110 as a selected test image, and is described in detail below by combining each step:

step (1), the original color and original depth images of 1396 × 1110, bmp format are selected as input images, which are rendered as reference virtual viewpoint images for reference.

Step (2), SIFT points are respectively extracted from the original color image and the reference virtual viewpoint image:

and (3) establishing a matching relation between the SIFT points by using the characteristic values of the SIFT points obtained in the step (2) to form a matching point pair. In order to reduce the error, an abnormal point removing operation is performed. When the following formula is satisfied, the pair of points is considered as an abnormal point:

wherein, N is the number of SIFT matching point pairs before removing the abnormality.Is the firstThe physical distance of the matching point pairs, i.e. the physical distance of the positions of the two points in the original color image and the reference virtual image.Andthe mean and standard deviation, respectively, of all distances.

And (4) carrying out significance detection on the original color image and the depth image by using the stereoscopic vision attention model to obtain a significance map of the stereoscopic image. Through a traditional 2D saliency map method, an original color map and a corresponding depth map are utilized to jointly find a saliency map of a stereoscopic image.

In the example, the image is subjected to saliency detection by using a spectral residual method and a GBVS method, and the stereo saliency mapIs the combination of the two:

wherein,is a saliency map obtained from a depth map using the GBVS method,is a saliency map obtained from a color map using a spectral residual method.Is thatThe weight of (a) is determined,is thatIn this example, we set both to 0.5.

And (5) taking windows from the original color image and the 3D drawn image to be detected respectively by taking the pixel position of the SIFT matching point pair obtained in the step (3) as the center. Only one of the overlapping windows remains. In this example, we set the window size to 11 × 11.

And (6) carrying out SSIM calculation in the matching window obtained in the step (5) and measuring distortion in the window. We consider a matching pair of windows as signal X, signal Y, respectively.

Are the average luminance of the signals X and Y respectively,are the standard deviations of the signals X and Y respectively,is the signal difference between signals X and Y.,Set small enough to avoid a denominator of 0.

Step (7), the weight value of the matching window pair, that is, the weight of the SSIM value in this window pair, is defined as the correlation value of the average significant value of the window taken in the original color image on the stereo significant map obtained in step (4):

is the firstThe average of the pairs of matching windows on the stereoscopic saliency map.Is the average of the average saliency values for all matching window pairs. And N is the number of the matching window pairs obtained in the step (5).

Step (8) of averaging all SSIM values obtained in step (6) with weights to obtain final objective quality：

Wherein N is the number of matching windows after the outliers are removed,andare respectively the firstSSIM values and weights for the individual matching window pairs.

The objective quality of the 3D drawn image is obtained by integrating all the steps, the reference environment, the stereoscopic vision attention mechanism and the like in practical application are considered comprehensively by the threshold, and the result matched with the human eye vision system is obtained.

Claims

1. A3D rendering image objective quality evaluation method based on a stereoscopic vision attention mechanism is characterized by comprising the following steps:

step S1: drawing the original color image and the original depth image into a reference virtual viewpoint image for reference;

step S2: SIFT points are respectively extracted from the original color image and the reference virtual viewpoint image;

step S3: establishing a matching relationship between the SIFT points by using the characteristic values of the SIFT points obtained in the step S2 to form matching point pairs, and performing abnormal point removing work for reducing errors;

step S4: carrying out saliency detection on the original color image and the depth image by using a stereoscopic vision attention model to obtain a saliency map of the stereoscopic image;

step S5: taking the pixel position of the SIFT matching point pair obtained in the step S3 as a center, respectively taking windows from the original color image and the 3D drawn image to be detected, and only one window with overlapped pixels is reserved to avoid repeated calculation;

step S6: performing SSIM calculation in the matching window obtained in the step S5 to measure various distortions in the window;

step S7: the weight value of the matching window pair obtained in step S5, that is, the weight of the SSIM value in this window pair, is defined as the average value of the significant values of the window taken in the original color image on the stereoscopic significant image obtained in step S4 divided by the average value of the ownership weight values;