KR20050084940A

KR20050084940A - A method for generating a quality oriented significance map for assessing the quality of an image or video

Info

Publication number: KR20050084940A
Application number: KR1020057007945A
Authority: KR
Inventors: 쫑캉 루; 웨이시 린; 수수 야오; 이 핑 옹
Original assignee: 에이전시 포 사이언스, 테크놀로지 앤드 리서치
Priority date: 2005-05-04
Filing date: 2002-11-06
Publication date: 2005-08-29

Abstract

A method for generating a quality oriented significance camp for assessing the quality of an image or video, comprising the steps of extracting features of the image or video, determining a perceptual quality requirement of at least one extracted feature and integrating the extracted features and the perceptual quality requirement of the at least one extracted feature to form an array of significance level values, thereby generating the quality oriented significance map.

Description

A method for generating a quality oriented significance map for assessing the quality of an image or video}

시각적 왜곡 메트릭스는 브로드캐스트된 이미지/비디오의 품질을 모니터링하며 압축 효율을 제어하고 이미지 향상 프로세스들을 개선하는 데 중요한 역할을 한다. 대체로 두 부류의 품질 또는 왜곡 평가 접근법들이 있다. 제1 부류는 예컨대 널리 사용되고 있는 평균 자승 오차(MSE), 피크 신호 대 잡음 비(PSNR) 등등과 같은 수학적으로 정의된 측정법들에 기초한다. 제2 부류는 인간 시각 시스템(human visual system; HVS) 특성들을 시뮬레이션함으로써 시각적 왜곡을 측정하는 것에 기초한다.Visual distortion metrics play an important role in monitoring the quality of broadcasted images / videos, controlling compression efficiency and improving image enhancement processes. There are generally two classes of quality or distortion assessment approaches. The first class is based on mathematically defined measurements such as, for example, widely used mean squared error (MSE), peak signal to noise ratio (PSNR), and the like. The second class is based on measuring visual distortion by simulating human visual system (HVS) characteristics.

제1 부류의 접근법에서, MSE의 정의는In the first class of approaches, the definition of MSE is

로 주어지며, 상기 식중, 와 는 각각 원래의 이미지 및 왜곡된 이미지의 픽셀 값이다. PSNR의 정의는Given by Wow Are the pixel values of the original image and the distorted image, respectively. The definition of PSNR is

이다. 제1 부류의 접근법의 이점은 수학적으로 간단하고 계산의 복잡도가 낮다는 것이다. 이런 이유로, 제1 부류의 접근법은 널리 채택되고 있다.to be. The advantages of the first class of approaches are that they are mathematically simple and the computational complexity is low. For this reason, the first class of approaches is widely adopted.

그렇지만 제2 부류의 접근법은 인간 시각에 더 근접한 지각 결과들을 목표로 하고 그래서 시각적 평가 및 정보처리시에 정확도가 더 좋아지게 한다. 그러나, HVS의 불완전한 이해와 생리학적 및/또는 심리학적 결과들을 HVS에 포함시키는 것의 지연으로 인해, 제2 부류의 접근법의 성능은 여전히 만족스럽지 않다.The second class of approach, however, targets perceptual results closer to human vision and thus better accuracy in visual assessment and information processing. However, due to the incomplete understanding of HVS and the delay in including physiological and / or psychological results in HVS, the performance of the second class of approaches is still not satisfactory.

이미지나 또는 비디오를 보는 관찰자가 이미지 또는 비디오의 모든 시각적 정보들에 주의를 기울이지 않고 다만 특정 영역들에만 집중한다라는 생리학적 및 심리학적 증거들이 있다. 관찰자로부터의 이러한 시각적 주의 집중 정보는 HVS의 많은 용도들에서, 예컨대 시각적 지각 탐색 프로세스의 계산을 위해, 또는 이미지나 또는 비디오의 품질을 평가를 위해 사용되고 있다.There is physiological and psychological evidence that an observer who sees an image or video does not pay attention to all the visual information of the image or video but focuses only on certain areas. This visual attention information from the observer has been used in many uses of HVS, for example for the calculation of visual perceptual search processes, or for assessing the quality of an image or video.

시각적 주의 집중(visual attention)은 상향식 프로세스 또는 하향식 프로세스 중 어느 것에 의해서도 구현가능하다. 상향식 프로세스에서, 시각적 주의 집중은 이미지/비디오의 시각적 특징들로부터의 자극들에 기초하고, 이미지/비디오를 위한 유의성 맵은 이러한 자극들에 기초하여 형성된다. 시각적 특징 기반 자극들의 예들로는 조명, 컬러, 움직임, 형상 등등이 있다. 하향식 프로세스에서, 이미지/비디오를 위한 유의성 맵은 사전/활동 영역 지식 또는 사운드와 같은 다른 공지의 정보로부터의 표시에 기초하여 형성된다.Visual attention can be implemented by either a bottom-up process or a top-down process. In the bottom-up process, visual attention is based on stimuli from the visual features of the image / video, and a significance map for the image / video is formed based on these stimuli. Examples of visual feature based stimuli include lighting, color, movement, shape, and the like. In the top-down process, a significance map for an image / video is formed based on indications from other known information such as dictionary / active area knowledge or sound.

[1]에는 3가지 인자들, 즉 상관 관계의 손실, 휘도 왜곡 및 콘트라스트 왜곡을 조합하여 이미지의 왜곡을 측정하는 방법이 개시되어 있다.[1] discloses a method for measuring the distortion of an image by combining three factors, namely, loss of correlation, luminance distortion, and contrast distortion.

[2]에는 도 1에 도시된 바 같은 무참조 품질 메트릭스(100)가 제안되어 있다. 인공물 추출 유닛(102)에 의해 왜곡된 이미지/비디오(101)가 수신되고 이미지/비디오(101)의 흐려짐 현상(blurring) 및 블록 현상(blockiness)의 분포가 검출된다. 판별 유닛(104)에서는 이러한 흐려짐 현상 및 블록 현상의 특성들이 판별되고 왜곡된 이미지/비디오(101)의 왜곡 값을 표현하는 출력 신호(105)가 생성된다.[2] proposes a non-reference quality matrix 100 as shown in FIG. The distorted image / video 101 is received by the artifact extraction unit 102 and the distribution of blurring and blockiness of the image / video 101 is detected. The determination unit 104 determines the characteristics of this blurring and block phenomenon and generates an output signal 105 representing the distortion value of the distorted image / video 101.

[1]과 [2]에 의한 방법들은 제1 부류의 접근법에 포함됨으로써 제2 부류의 접근법과 비교해 볼 때 인간의 지각에 근접하는 결과들을 제공하지 못한다.The methods in [1] and [2] are included in the first class of approaches and thus do not provide results close to human perception as compared to the second class of approaches.

[3]에는 도 2에 도시된 바와 같은 비디오 압축해제 및 공간적/시간적 마스킹에 기초한 메트릭스(200)가 제안되어 있다. 신호 압축해제 유닛(203,204)에 의해 참조 이미지/비디오(201) 및 왜곡된 이미지/비디오(202)가 각각 수신된다. 압축 해제된 신호(205,206)의 공간적/시간적 마스킹을 위해 콘트라스트 이득 제어 유닛(207,208)에 의해 대응하는 압축 해제된 신호들(205,206)이 각각 수신된다. 검출 및 풀링 유닛부(111)에 의해 대응하는 처리 신호들(209,210)이 처리되고 왜곡된 이미지/비디오(202)의 왜곡 값을 표현하는 출력 신호(212)가 생성된다.[3] proposes a matrix 200 based on video decompression and spatial / temporal masking as shown in FIG. 2. The reference image / video 201 and the distorted image / video 202 are received by the signal decompression units 203 and 204, respectively. Corresponding decompressed signals 205, 206 are received by contrast gain control unit 207, 208 for spatial / temporal masking of decompressed signals 205, 206, respectively. The corresponding processing signals 209, 210 are processed by the detection and pooling unit 111 and an output signal 212 is generated that represents the distortion value of the distorted image / video 202.

[4]에는 도 3에 도시된 바와 같이 다수의 시각적 특징들을 조합하여 이미지/비디오의 품질을 측정하도록 신경 네트워크가 사용되고 있다. 참조 이미지/비디오(301)와 왜곡된 이미지/비디오(302)가 복수개의 특징 추출 유닛들(303)에 입력되고 복수개의 특징 추출 유닛들(303)에서 상기 이미지/비디오(301,302)의 여러 특징들이 추출된다. 신경 네트워크(304)에 의해 추출된 특징들이 수신되고 왜곡된 이미지/비디오(302)의 왜곡 값(305)이 생성된다.In [4], a neural network is used to measure the quality of an image / video by combining a plurality of visual features as shown in FIG. The reference image / video 301 and the distorted image / video 302 are input to the plurality of feature extraction units 303 and the various features of the image / video 301, 302 are transferred in the plurality of feature extraction units 303. Extracted. Features extracted by neural network 304 are received and a distortion value 305 of distorted image / video 302 is generated.

[5]에는 수개의 시각적 자극들에 다른 가중치들을 할당함으로써 비디오의 지각 품질을 평가하는 방법이 개시되어 있다.[5] discloses a method for evaluating the perceived quality of a video by assigning different weights to several visual stimuli.

참조 문헌 [4] 및 [5]는 전체 이미지 또는 비디오를 동일하게 처리함으로써, 상기 이미지/비디오 중 중요하지 않은 부분들도 처리되므로 계산적으로 효율적이지 않다.References [4] and [5] are not computationally efficient because they process the entire image or video the same, so that non-essential parts of the image / video are also processed.

[6]에는 이미지/비디오에서 시각적 주의 집중이 높은 영역들을 결정하기 위해 수개의 상향식 시각적 자극들이 사용되고 있다. 이러한 상향식 시각적 자극으로부터 판정된 특징들은 가중 및 축적되어 시각적 주의 집중이 높은 영역들을 표시하는 유의성 맵을 형성한다. 이 방법은 단지 상향식 특징들만이 판단되므로 이미지/비디오의 품질 평가가 매우 양호하게 이루어지지는 않는다. 더욱이, 시각적 주의 집중이 높은 영역은 이 영역이 고품질로 부호화되어야 함을 항상 의미하지 않는다.In [6] several bottom-up visual stimuli are used to determine areas of high visual attention in the image / video. Features determined from this bottom-up visual stimulus are weighted and accumulated to form a significance map indicating areas of high visual attention. This method only determines bottom-up features, so the quality evaluation of the image / video is not very good. Moreover, areas of high visual attention do not always mean that these areas should be coded with high quality.

[7]에는 이미지/비디오에서 시각적 주의 집중이 높은 영역들을 판단하기 위해 [6]과 유사하지만 상향식과 하향식 시각적 자극들 양자 모두를 사용하는 방법이 개시되어 있다. 상향식 및 하향식 시각적 자극들로부터 얻어지는 결정된 특징들은 통합 이전에 훈련(training)되어야만 하는 베이즈(Bayes) 네트워크를 사용하여 함께 통합된다. 위에서 언급된 바와 같이, 시각적 주의 집중이 높은 영역은 이 영역이 고품질로 부호화되어야 함을 항상 의미하지 않는다. 더구나, 이미지/비디오의 특징들을 통합하기 위한 베이즈 네트워크의 사용은 복잡한 데, 그 이유는 베이즈 네트워크가 상기 특징들을 통합하기 이전에 훈련되어야 할 필요가 있기 때문이다.[7] discloses a method similar to [6] but using both bottom-up and top-down visual stimuli to determine areas of high visual attention in the image / video. The determined features obtained from bottom-up and top-down visual stimuli are integrated together using a Bayes network that must be trained prior to integration. As mentioned above, an area with high visual attention does not always mean that this area should be coded with high quality. Moreover, the use of Bayesian networks to integrate the features of image / video is complex because the Bayesian network needs to be trained before integrating the features.

그러므로, 이미지 또는 비디오의 품질이나 왜곡을 평가하는 더 정확하고 여전히 강력한 방법이 요망된다.Therefore, a more accurate and still powerful method of assessing the quality or distortion of an image or video is desired.

도 1은 지각 이미지/비디오 왜곡을 측정하기 위한 범용 무참조 메트릭스를 보여주는 블록선도이다.1 is a block diagram illustrating a general purpose non-reference matrix for measuring perceptual image / video distortion.

도 2는 지각 이미지/비디오 왜곡을 측정하기 위한 윙클러(Winkler)의 완전 참조 메트릭스를 보여주는 블록선도이다.FIG. 2 is a block diagram showing Winkler's full reference matrix for measuring perceptual image / video distortion.

도 3은 지각 이미지/비디오 왜곡을 측정하기 위한 야오(Yao)의 완전 참조 메트릭스를 보여주는 블록선도이다.3 is a block diagram illustrating Yao's full reference matrix for measuring perceptual image / video distortion.

도 4는 브로드캐스트 시스템용 비디오의 품질을 모니터링하기 위한 범용 시스템을 보여주는 도면이다.4 shows a general purpose system for monitoring the quality of video for a broadcast system.

도 5는 본 발명에 의한 품질 지향 유의성 맵을 보여주는 도면이다.5 shows a quality oriented significance map according to the present invention.

도 6은 본 발명의 바람직한 실시예에 의한 품질 지향 유의성 맵을 생성하기 위한 범용 시스템을 보여주는 블록선도이다.6 is a block diagram illustrating a general purpose system for generating a quality oriented significance map according to a preferred embodiment of the present invention.

도 7은 본 발명의 바람직한 실시예에 의한 품질 지향 유의성 맵에 대한 구체적인 구현예를 보여주는 도면이다.7 is a view showing a specific embodiment of the quality-oriented significance map according to a preferred embodiment of the present invention.

도 8은 본 발명에 의한 품질 지향 유의성 맵이 포함된 왜곡 메트릭스를 보여주는 블록선도이다.8 is a block diagram showing a distortion matrix including a quality oriented significance map according to the present invention.

도 9는 왜곡 메트릭스의 지각 이미지/비디오 왜곡을 측정하기 위한 범용 무참조 메트릭스로서 본 발명에 의한 품질 지향 유의성 맵이 포함된 범용 무참조 메트릭스를 보여주는 블록선도이다.FIG. 9 is a block diagram illustrating a general purpose referenceless matrix including a quality oriented significance map according to the present invention as a general purpose referenceless measure for perceptual image / video distortion of a distortion matrix.

도 10은 왜곡 메트릭스의 지각 이미지/비디오 왜곡을 측정하기 위한 윙클러의 완전 참조 메트릭스로서 본 발명에 의한 품질 지향 유의성 맵이 포함된 윙클러의 완전 참조 메트릭스를 보여주는 블록선도이다.FIG. 10 is a block diagram showing a full reference matrix of a Winkler with a quality oriented significance map according to the present invention as a full reference matrix of the Winkler for measuring perceptual image / video distortion of the distortion matrix.

도 11은 왜곡 메트릭스의 지각 이미지/비디오 왜곡을 측정하기 위한 야오의 완전 참조 메트릭스로서 본 발명에 의한 품질 지향 유의성 맵이 포함된 야오의 완전 참조 메트릭스를 보여주는 블록선도이다.FIG. 11 is a block diagram showing Yao's full reference metric with a quality oriented significance map according to the present invention as Yao's full reference metric for measuring perceptual image / video distortion of the distortion metric.

본 발명의 목적은 이미지 또는 비디오의 품질이나 왜곡을 평가하는 기존 방법들의 성능을 개선할 수 있는 방법을 제공하는 것이다.It is an object of the present invention to provide a method which can improve the performance of existing methods of assessing the quality or distortion of an image or video.

이 목적은 독립 청구항들의 특징부들에 의해 달성된다. 부가적인 특징들은 종속 청구항들로부터 얻어진다.This object is achieved by the features of the independent claims. Additional features are obtained from the dependent claims.

본 발명은 이미지 또는 비디오의 품질을 평가하기 위한 품질 지향 유의성 맵을 생성하는 방법으로서, 이미지 또는 비디오의 특징들을 추출하는 단계, 적어도 하나의 추출된 특징의 지각 품질 요건을 결정하는 단계, 및 추출된 특징들과 상기 적어도 하나의 추출된 특징의 지각 품질 요건을 통합하여 유의 수준(significance level) 값들의 어레이를 형성함으로써 품질 지향 유의성 맵을 생성하는 단계를 포함하는 방법에 관한 것이다.The present invention provides a method of generating a quality oriented significance map for assessing the quality of an image or video, comprising extracting features of an image or video, determining perceptual quality requirements of at least one extracted feature, and And incorporating features and perceptual quality requirements of the at least one extracted feature to form an array of significance level values to produce a quality oriented significance map.

게다가, 추출된 특징들 중 적어도 하나는 그 특징에 기초하여 이미지/비디오의 지각 품질 요건을 판단하는데 사용된다. 달리 말하면, 추출된 특징들에 기초하여 관찰자에 의해 지각되는 이미지/비디오의 품질의 중요도가 결정된다.In addition, at least one of the extracted features is used to determine perceptual quality requirements of the image / video based on the feature. In other words, the importance of the quality of the image / video perceived by the viewer is determined based on the extracted features.

추출된 특징들과 적어도 하나의 추출된 특징의 지각 품질 요건을 통합하여 얻어지는 유의 수준 값들은 이미지용의 3-D와 비디오용의 4-D를 형성한다. 이러한 유의 수준 값들의 어레이는 이미지 또는 비디오의 품질이나 왜곡을 평가하기 위한 품질 지향 유의성 맵으로서 사용된다.The significance level values obtained by integrating the extracted features with the perceptual quality requirements of the at least one extracted feature form 3-D for the image and 4-D for the video. This array of significance level values is used as a quality oriented significance map for assessing the quality or distortion of an image or video.

여기서 유념해야 할 점은 이미지/비디오 중 시각적 주의 집중이 높은 특정 영역이 이미지/비디오 중 고품질의 동일 영역에 항상 대응하지 않는다는 것이다. 달리 말하면, 이미지/비디오 중 시각적 주의 집중이 높은 특정 영역은 이미지/비디오 중 그러한 영역이 고품질로 부호화되어야 함을 항상 요구하지는 않고 그 역도 성립한다.It should be noted that certain areas of high visual attention in the image / video do not always correspond to the same high-quality areas of the image / video. In other words, a particular area of high visual attention in the image / video does not always require that such area of the image / video be encoded in high quality, and vice versa.

지각 품질 정보가 유의 수준 값들을 결정하는데 사용되므로, 결과적으로 얻어지는 유의성 맵은 이미지/비디오의 지각 품질 요건을 근접하게 따라간다. 그러므로, 맵을 사용하고 시각적 주의 집중에만 기초하는 종래 기술들 중 어느 것과 비교하여도, 이미지 또는 비디오의 품질을 평가하기 위한 보다 정확한 유의성 맵이 얻어진다.Since perceptual quality information is used to determine significance level values, the resulting significance map closely follows the perceptual quality requirements of the image / video. Therefore, a more accurate significance map for evaluating the quality of an image or video is obtained, compared to any of the prior art that uses a map and based only on visual attention.

본 발명에 의해 생성된 유의성 맵은 기존의 왜곡 메트릭스에서 제1 부류의 접근법 및 제2 부류의 접근법 양자 모두를 위해 사용될 수 있음으로써, 이미지/비디오 품질 평가 프로세스의 정확도를 개선한다.The significance map generated by the present invention can be used for both the first and second class approaches in existing distortion metrics, thereby improving the accuracy of the image / video quality estimation process.

본 발명에 의하면, 이미지 또는 비디오의 특징들은 시각적 특징 기반 정보 및 지식 기반 정보를 이용하여 추출된다. 달리 말하면, 상향식 프로세스(시각적 특징 기반) 및 하향식 프로세스(지식 기반) 양자 모두가 사용된다. 이러한 프로세스들은 이미지/비디오의 어떤 특징들이 시각적 주의 집중을 유발시키는 지를 판단하고, 결과적으로 그러한 시각적 주의 집중 유발 특징들을 추출한다. 추출되는 특징들로는 움직임, 조명, 컬러, 콘트라스트, 방위, 텍스쳐(texture) 등등이 있다. 기존의 이미지/비디오 기술자들(descriptors), 예를 들면 MPEG-7 기술자들이 사용될 수도 있다.According to the present invention, features of an image or video are extracted using visual feature based information and knowledge based information. In other words, both bottom-up process (visual feature based) and top-down process (knowledge based) are used. These processes determine which features of an image / video cause visual attention, and consequently extract those visual attention features. Extracted features include motion, lighting, color, contrast, orientation, texture, and so on. Existing image / video descriptors, for example MPEG-7 descriptors, may be used.

본 발명에 의하면, 비디오 또는 이미지들의 시퀀스에서의 객체 움직임은 상대적인 움직임 벡터와 절대적인 움직임 벡터로 분리된다. 상대적인 움직임은 배경 또는 다른 객체들과 비교해 볼 때 객체 움직임이고, 절대적인 움직임은 이미지 또는 비디오 프레임에서 객체의 실제 움직임이다. 결정된 상대적인 움직임 벡터 및 절대적인 움직임 벡터에 기초하여, 객체(픽셀들 또는 영역)의 품질 수준 값이 결정된다. 결정된 품질 수준 값은 유의 수준 값들의 어레이를 형성하기 위해 이미지/비디오로부터의 다른 추출된 특징들과 통합된다.According to the present invention, object motion in a video or sequence of images is separated into a relative motion vector and an absolute motion vector. Relative motion is object motion compared to background or other objects, and absolute motion is the actual motion of an object in an image or video frame. Based on the determined relative and absolute motion vectors, the quality level value of the object (pixels or region) is determined. The determined quality level value is integrated with other extracted features from the image / video to form an array of significance level values.

객체 움직임 분석은 2가지 단계들, 즉 전역 움직임 추정과 움직임 매핑으로 분리될 수 있다. 전역 움직임 추정은 이미지 또는 비디오 카메라의 움직임 추정을 제공하고, 움직임 분석은 객체의 상대적인 움직임 벡터 및 절대적인 움직임 벡터를 추출한다.Object motion analysis can be separated into two stages: global motion estimation and motion mapping. Global motion estimation provides motion estimation of an image or video camera, and motion analysis extracts the relative and absolute motion vectors of the object.

여기서 유념해야 할 점은 이미지/비디오의 다른 특징들이 이미지/비디오의 픽셀 또는 영역의 품질 수준 값들을 결정하는데 이용될 수 있다는 것이다. 다른 특징들의 예들로는 얼굴 검출, 사람 검출 및 텍스쳐 분석이 있다. 이러한 다른 특징들로부터 그와 같이 결정된 품질 수준 값들은 움직임 분석으로부터 얻어진 품질 수준값들과 통합되어 품질 지향 유의 수준을 생성할 수도 있다.It should be noted here that other features of the image / video can be used to determine quality level values of pixels or regions of the image / video. Examples of other features are face detection, human detection and texture analysis. The quality level values thus determined from these other features may be integrated with the quality level values obtained from the motion analysis to produce a quality oriented significance level.

본 발명의 바람직한 실시예에 의하면, 모든 추출된 특징들과 적어도 하나의 특징의 결정된 품질 수준 값들은 비선형 매핑 함수를 사용하여 유의 수준 값들의 어레이를 형성하도록 통합된다.According to a preferred embodiment of the present invention, all extracted features and determined quality level values of at least one feature are integrated to form an array of significance level values using a nonlinear mapping function.

비선형 매핑 함수가 사용되는 경우, 계산적인 요구사항들을 낮추고 구현을 간단화하는 이점이 있다. 게다가, 비선형 매핑 함수용의 알고리즘 또는 시스템은 훈련을 필요로 하지 않는 데, 이것이 [5]에 개시된 시스템에서 사용되는 베이즈 네트워크와는 대조적이다.When a nonlinear mapping function is used, it has the advantage of reducing computational requirements and simplifying the implementation. In addition, algorithms or systems for nonlinear mapping functions do not require training, as opposed to the Bayesian network used in the system described in [5].

여기서 유념해야 할 점은 변형 실시예들에서 신경 네트워크이나 퍼지 규칙들과 유사한 다른 기법들이 추출된 특징들과 적어도 하나의 특징의 결정된 품질 수준 값들을 통합하기 위해 사용될 수 있다는 것이다.It should be noted here that in variant embodiments other techniques similar to neural networks or fuzzy rules may be used to integrate the extracted features with the determined quality level values of the at least one feature.

본 발명의 다른 바람직한 실시예에 의하면, 추출된 특징들의 통합 결과로서의 결합 효과(coupling effects)는 유의 수준 값들의 어레이를 형성할 때 고려된다. 결합 효과의 사용으로 인해, 유의 효과로서 취급될 수 있는 다른 추출된 특징들은 그것들을 선형 방식으로 가산함으로써 통합되지 않는다. 추출된 특징들의 다른 조합들은 다른 결합 효과들을 초래한다.According to another preferred embodiment of the present invention, the coupling effects as a result of the integration of the extracted features are taken into account when forming an array of significance level values. Due to the use of the combining effect, other extracted features that can be treated as significant effects are not integrated by adding them in a linear manner. Different combinations of extracted features result in different coupling effects.

구체적으로 기술하면, 본 발명의 다른 바람직한 실시예에 의한 품질 지향 유의성 맵은 다음의 수학식Specifically, the quality-oriented significance map according to another preferred embodiment of the present invention is represented by the following equation

을 이용하여 얻어질 수 있고, 상기 식중, 는 축척(s), 위치(i, j) 및 시간(t)에서의 품질 지향 유의성 맵의 요소이며, 은 n번째 추출된 특징이고, 는 과 를 조합하는 결합 효과들을 나타내는 결합 계수이며; n은 추출된 특징의 인덱스이고; k는 1 < k < N과 k ≠ L이 되는 추출된 특징의 다른 인덱스이며; N은 추출된 특징들의 총 개수이고; g_l은 g₁(x,y) = min(x,y)로 정의되는 비선형 결합 매핑 함수이며; 그리고 L은 L = arg max()으로 표시되는 의 최대값이다.It can be obtained using the above formula, Is an element of the quality oriented significance map at scale (s), position (i, j) and time (t), Is the nth extracted feature, Is and A coupling coefficient representing the coupling effects of combining n is the index of the extracted feature; k is another index of the extracted feature such that 1 <k <N and k ≠ L; N is the total number of features extracted; g _l is a nonlinear joint mapping function defined by g ₁ (x, y) = min (x, y); And L equals L = arg max ( Represented by) Is the maximum value.

본 발명의 바람직한 변형 실시예에서는, 추출된 특징들의 통합이 비선형 매핑 함수를 추출된 특징들을 가중한 것들의 합에 적용함으로써 수행된다.In a preferred variant embodiment of the invention, the integration of the extracted features is performed by applying a nonlinear mapping function to the sum of the weighted features.

구체적으로 기술하면, 본 발명의 바람직한 변형 실시예에 따른 품질 지향 유의성 맵은 다음의 수학식Specifically, the quality-oriented significance map according to a preferred modified embodiment of the present invention is represented by the following equation

을 이용하여 얻어지며, 상기 식중, g₂는 비선형 매핑 함수이고, 이는 다음 수학식Is obtained using, where g ₂ is a nonlinear mapping function, which is

과 같이 정의되고, 상기 식중, α는 비선형성을 제공하기 위한 매개변수이고, c는 상수이다.Wherein α is a parameter for providing nonlinearity and c is a constant.

본 발명의 바람직한 실시예에 의하면, 품질 지향 유의성 맵을 생성하는 방법은 생성된 품질 지향 유의성 맵을 처리하기 위해 사후 처리 단계를 더 포함한다. 상기 사후 처리 단계는 존재할 수도 있는 임의의 노이즈를 제거함으로써 생성된 유의성 맵의 품질을 향상시킨다. 또한, 상기 사후 처리 단계는 유의성 맵을 평활화(smoothing) 또는 확대하는 단계와, 유의성 맵에 존재하는 인공물들을 제거하는 단계를 포함하는 다른 작업들을 위해 사용되어도 좋다.According to a preferred embodiment of the present invention, the method for generating a quality oriented significance map further comprises a post processing step for processing the generated quality oriented significance map. The post processing step improves the quality of the generated significance map by removing any noise that may be present. The post-processing step may also be used for other tasks, including smoothing or enlarging the significance map and removing artifacts present in the significance map.

특히, 본 발명의 바람직한 실시예에 의한 특징들의 추출 동안 에러들에 의해 야기되는 임펄스 노이즈를 제거하는데 가우스 평활 기법이 사용된다.In particular, a Gaussian smoothing technique is used to remove impulse noise caused by errors during extraction of features according to a preferred embodiment of the present invention.

위에서 언급된 본 발명의 실시예들은 방법 뿐만 아니라 장치, 컴퓨터 판독가능 매체 및 컴퓨터 프로그램에도 적용된다.The above-described embodiments of the present invention apply not only to methods but also to apparatuses, computer readable media, and computer programs.

이미지/비디오 소스(401)는 부호기 유닛(402)에서 부호화되고 부호화된 이미지/비디오(403)는 광역 통신 네트워크(WAN; 404)를 통해 전송된다. WAN(405)로부터의 전송된 이미지/비디오(405)는 복호기 유닛(106)에 의해 수신되어 복호화된다.The image / video source 401 is encoded in the encoder unit 402 and the encoded image / video 403 is transmitted over a wide area communication network (WAN) 404. The transmitted image / video 405 from the WAN 405 is received by the decoder unit 106 and decoded.

왜곡 메트릭스 유닛(410)은 참조이미지/비디오(408)를 가지고/가지지 않고 복호기 유닛(406)으로부터의 복호화된 이미지/비디오(407)를 수신한다. 참조 이미지/비디오(408)는 이미지/비디오 소스 유닛(409)으로부터 생성되고, 이미지/비디오소스 유닛(409)은 원래의 이미지/비디오(401)를 생성하는데 사용된 이미지/비디오 소스 유닛과 동일하여도 좋거나, 참조 이미지/비디오(408)는 부호기 유닛(402)에서 부호화된 원래의 이미지/비디오(401)로부터 단순히 추출되어도 좋다.Distortion matrix unit 410 receives decoded image / video 407 from decoder unit 406 with / without reference image / video 408. The reference image / video 408 is generated from the image / video source unit 409, which is the same as the image / video source unit used to generate the original image / video 401. Alternatively, the reference image / video 408 may simply be extracted from the original image / video 401 encoded in the encoder unit 402.

일반적으로는, 왜곡 메트릭스가 완전 참조(full reference; FR), 축소 참조(reduced reference; RR) 및 무참조(no reference; NR) 모델들로 분류될 수 있다. FR 메트릭스는 왜곡된 이미지 시퀀스와 상응하는 왜곡되지 않은 대응 부분 간에 픽셀 단위 및 프레임 단위의 비교를 수행한다. 축소 참조 메트릭스는 왜곡된 이미지시퀀스로부터의 몇몇 통계 자료들을 계산하고 그것들을 왜곡되지 않은 이미지 시퀀스의 대응하는 저장된 통계 자료들과 비교한다. 통계 자료들은 선택되어 종래의 회귀 분석에 의해 상관된다. 무참조 메트릭스는 왜곡되지 않은 이미지 시퀀스로부터 어떠한 정보도 요구하지 않고 왜곡된 시퀀스에 관한 특징 추출을 수행하여 MPEG 블록 경계들, 포인트형 노이즈, 또는 이미지 흐려짐과 같은 인공물들을 찾아낸다.In general, distortion metrics can be classified into full reference (FR), reduced reference (RR), and no reference (NR) models. The FR matrix performs a pixel-by-pixel and frame-by-frame comparison between the distorted image sequence and the corresponding undistorted corresponding portion. The reduced reference matrix calculates some statistics from the distorted image sequence and compares them with the corresponding stored statistics of the undistorted image sequence. Statistical data are selected and correlated by conventional regression analysis. Unreferenced metrics perform feature extraction on the distorted sequence without requiring any information from the undistorted image sequence to find artifacts such as MPEG block boundaries, pointed noise, or image blur.

왜곡 메트릭스 유닛(410)은 참조 이미지/비디오(408)와 복호화된 이미지/비디오(407) 양자 모두를 비교(FR과 RR의 경우)하거나, 또는 복호화된 이미지/비디오(407)의 인공물들을 분석(NR의 경우)하여 복호화된 이미지/비디오(407)의 품질을 평가한다. 왜곡 메트릭스 유닛(410)에 의해 생성된 출력 신호(411)는 복호화된 이미지/비디오(407)의 품질을 표현한다.The distortion matrix unit 410 compares both the reference image / video 408 and the decoded image / video 407 (for FR and RR) or analyzes artifacts of the decoded image / video 407 ( NR) to assess the quality of the decoded image / video 407. The output signal 411 generated by the distortion matrix unit 410 represents the quality of the decoded image / video 407.

왜곡 메트릭스 유닛(410)은 위에서 앞서 언급된 바와 같은 왜곡 메트릭스(도 1, 도 2 및 도 3 참조)를 사용하여 구현되어도 좋다.The distortion matrix unit 410 may be implemented using the distortion matrix as described above (see FIGS. 1, 2 and 3).

본 발명에 의하면, 품질 지향 유의성 맵, 특히 계층적 품질 지향 유의성 맵(Hierarchical Quality-oriented Significance Map; HQSM)이 제안된다. HQSM은 이미지/비디오의 시각적 주의 집중 및 지각 품질 요건들 양자 모두에 기초하여 생성된다.According to the present invention, a quality-oriented significance map, in particular a hierarchical quality-oriented significance map (HQSM), is proposed. The HQSM is generated based on both visual attention and perceptual quality requirements of the image / video.

상향식 프로세스의 경우, 이미지/비디오의 픽셀들 또는 영역들의 유의 수준값들의 맵은 다음의 여러 규칙들에 기초하여 시각적 특징들로부터 결정될 수 있다:In the bottom-up process, a map of significance level values of pixels or regions of an image / video may be determined from visual features based on several rules:

1. 관찰자 눈의 고정 위치는 시각적 주의 집중이 높은 영역에 항상 고정되는 것이 아니고 눈의 움직임은 시각적 주의 집중이 높은 영역을 추종한다;1. The observer's eye position is not always fixed in areas of high visual attention, and eye movements follow areas of high visual attention;

2. 이미지/비디오의 다른 특징들은 축적 효과를 제공하도록 선형적으로 가산되지는 않는다;2. Other features of the image / video are not added linearly to provide a cumulative effect;

3. 관찰자 눈들이 집중 또는 주의 집중 영역 바깥의 세상을 보게 된다;3. The observer's eyes see the world outside the concentration or attention zone;

4. 이미지/비디오 특징들의 선택은 공간 기반 또는 객체 기반이라도 좋다; 그리고4. The selection of image / video features may be space based or object based; And

5. 시각적 주의 집중을 유발시키는 자극들의 통합 및 선택은 한 순간의 "승자 독점(winner-takes-all; WTA)" 프로세스의 계층 구조에 의존한다.5. The integration and selection of the stimuli that cause visual attention depends on the hierarchical structure of the "winner-takes-all" process.

여기서 유념해야 할 점은 마지막 규칙 5)가 특정 순간 동일한 관찰자에 대해서만 적용된다는 것이다. 관찰자들의 집단에 대해, 주의 집중 영역은 통계 맵으로 표현될 수 있다. 단일 관찰자에 대해서도, 이미지/비디오가 어느 기간에 걸쳐 관찰될 때 하나의 유의 영역보다 많은 유의 영역들이 얻어진다. 상기 유의 영역들은 통계 맵으로 표현될 수도 있다.Note that the last rule 5) applies only to the same observer at any given moment. For a group of observers, the attention area can be represented by a statistical map. Even for a single observer, more significant regions than one significant region are obtained when the image / video is observed over a period of time. The significant regions may be represented by a statistical map.

하향식 프로세스의 경우, 이미지/비디오의 픽셀들 또는 영역들의 유의 수준 값들의 다른 맵이 다른 매체로부터의 활동 영역 또는 사전 지식을 이용하여 정의되어도 좋다. 예를 들어, 비행기의 오디오 사운드는 관찰자가 자신의 주의를 이미지/비디오의 비행기 객체에 집중하게 할 것이다.For the top-down process, another map of significance level values of pixels or regions of the image / video may be defined using active regions or prior knowledge from other media. For example, the audio sound of an airplane will cause the observer to focus his attention on the plane object of the image / video.

위에서 생성된 유의성 맵들은 통합되어 HQSM을 형성한다. 본 발명에 의한 HQSM은 도 5에 예시된 바와 같이 이미지용의 3차원 어레이, 또는 비디오용의 4차원어레이이다.The significance maps generated above are integrated to form the HQSM. The HQSM according to the invention is a three dimensional array for images, or a four dimensional array for video, as illustrated in FIG.

HQSM은 다음의 수학식을 이용하여 표현될 수 있다:HQSM can be represented using the following equation:

여기서 M은 HQSM을 나타내며, 는 축척(s), 위치(i,j) 및 시간(t)에서의 HQSP의 맵 요소를 나타내고, W_s는 비디오의 한 프레임 또는 이미지의 폭이며, L_s는 비디오의 한 프레임 또는 이미지의 높이이고, N_t는 비디오의 시간 간격(비디오에만 적용가능)이다.Where M stands for HQSM, Represents the map elements of the HQSP at scale (s), position (i, j) and time (t), W _s is the width of one frame or image of the video, and L _s is the height of one frame or image of the video And N _t is the time interval of video (applicable only to video).

맵 요소()의 큰 값(high value)은 이미지/비디오의 픽셀 또는 영역의 높은 유의 수준을 표현하고, 높은 가중치가 그것의 대응 픽셀 또는 영역의 왜곡 측정에 할당되어야 하고 그 역도 성립한다.Map elements ( A high value of) represents a high level of significance of a pixel or region of the image / video, and a high weight should be assigned to the distortion measurement of its corresponding pixel or region and vice versa.

HQSM의 생성은 도 6에 도시된 바와 같은 3가지 단계들을 포함한다. 이미지/비디오(601)의 시각적 특징들은 특징 추출 유닛(602)에서 다음의 자극들에 기초하여 추출된다:The creation of the HQSM includes three steps as shown in FIG. The visual features of the image / video 601 are extracted in the feature extraction unit 602 based on the following stimuli:

1. 시각적 주의 집중 자극들, 예를 들면 움직임, 조명, 컬러, 콘트라스트, 방위, 텍스쳐 등등1. Visual attention stimuli, eg movement, lighting, color, contrast, orientation, texture, etc.

2. 지식 기반 자극들, 예를 들면 얼굴, 사람, 형상 등등2. knowledge-based stimuli, eg face, person, shape, etc.

3. 사용자 정의 자극들3. Custom stimuli

여기서 유념해야 할 점은 MPEG-7 기술자들과 같은 기존의 이미지/비디오 기술자들이 특징들의 추출을 위해 통합되어도 좋다는 것이다.It should be noted here that existing image / video descriptors, such as MPEG-7 descriptors, may be integrated to extract features.

추출된 특징들(603)은 추출된 특징들(603)을 통합하기 위한 결정 유닛(604)에 의해 수신되어 총체적인 HQSM(605)을 생성한다. 총체적인 HQSM(605)은 총체적인 HQSM(605)의 품질을 향상시키는 사후 처리 유닛(606)에 의해 추가로 처리되어 본 발명의 바람직한 실시예에 의한 최종 HQSM(607)을 생성한다.The extracted features 603 are received by the decision unit 604 for incorporating the extracted features 603 to generate an overall HQSM 605. The overall HQSM 605 is further processed by a post processing unit 606 that improves the quality of the overall HQSM 605 to produce the final HQSM 607 according to a preferred embodiment of the present invention.

도 7은 본 발명의 바람직한 실시예에 의한 HQSM의 생성에 대해 상세하게 보여주는 도면이다. 본 발명의 바람직한 실시예에 따라 추출되는 다른 특징들은 아래에 요약되어 있다.7 is a diagram showing in detail the generation of the HQSM according to a preferred embodiment of the present invention. Other features extracted in accordance with a preferred embodiment of the present invention are summarized below.

움직임 분석Motion analysis

비디오 또는 이미지의 시퀀스에서의 객체 움직임은 2가지 벡터들, 즉 상대적인 움직임 벡터와 절대적인 움직임 벡터로 분리될 수 있다. 상대적인 움직임은 배경 또는 다른 객체들에 대한 객체의 움직임이다. 절대적인 움직임은 이미지 또는 비디오 내에서 움직여진 정확한 위치이다.Object motion in a sequence of video or images may be separated into two vectors, a relative motion vector and an absolute motion vector. Relative movement is the movement of an object relative to the background or other objects. Absolute movement is the exact position moved within the image or video.

움직임 분석은 이미지/비디오용으로 사용되는 카메라의 움직임을 판단하는 전역 움직임 추정과 상대적인 움직임 벡터 및 절대적인 움직임 벡터들을 추출하는 움직임 매핑으로 분리될 수 있다.Motion analysis can be separated into global motion estimation for determining the motion of the camera used for the image / video and motion mapping for extracting relative motion vectors and absolute motion vectors.

전역 움직임 추정은 다음 수학식Global motion estimation is

과 같이 모델링되며, 상기 식중, (ΔX,ΔY)가 비디오의 픽셀 또는 영역 (X,Y)의 추정된 움직임 벡터이며, C_f가 줌 계수이고, (t_x,t_y)가 병진 벡터인 3-매개변수(three-parameter) 법을 이용하여 추정될 수 있다.Where (ΔX, ΔY) is the estimated motion vector of the pixel or region (X, Y) of the video, C _f is the zoom factor, and (t _x , t _y ) is the translation vector It can be estimated using a three-parameter method.

여기서 유념해야 할 점은 추정된 움직임 벡터(ΔX,ΔY)가 절대적인 움직임 벡터이기도 하다는 것이다.Note that the estimated motion vectors ΔX and ΔY are also absolute motion vectors.

3-매개변수 법이 바람직한데, 그 이유는 3-매개변수 법이 6-매개변수 모델 또는 4-매개변수 모델과 같은 다른 모델링 방법들에 비하여 노이즈에 덜 민감하기 때문이다.The three-parameter method is preferred because the three-parameter method is less susceptible to noise than other modeling methods, such as a six-parameter model or a four-parameter model.

전역 움직임 추정의 오차는 다음과 같이 정의될 수 있다:The error of global motion estimation can be defined as follows:

C_f, t_x 및 t_y의 값들은 3개의 수학식들을 아래에 주어진 바와 같이 최소화함으로써 얻어질 수 있다:The values of C _f , t _x and t _y can be obtained by minimizing the three equations as given below:

완화 알고리즘이 C_f, t_x 및 t_y의 최소화된 값들을 결정하는데 사용될 수 있고 이는 다음의 단계들로 요약된다:A mitigation algorithm can be used to determine the minimized values of C _f , t _x, and t _y , which are summarized in the following steps:

1. 이미지/비디오 상에서 큰 변화들이 있는 픽셀들 또는 영역들을 선택한다;1. Select pixels or areas with large changes in the image / video;

2. 선택된 픽셀들 가운데 수학식 4를 만족시키는 (C_f,t_x,t_y)를 결정한다;2. Determine (C _f , t _x , t _y ) that satisfies Equation 4 among the selected pixels;

3. 수학식 3을 사용하여 매 픽셀마다 오차(ε)를 구한다.3. Using Equation 3, the error? Is obtained for every pixel.

4. 오차 범위([ε-Δ,ε+Δ]) 내에서 이미지/비디오의 픽셀들을 선택한다.4. Select the pixels of the image / video within the error range ([ε-Δ, ε + Δ]).

5. (C_f,t_x,t_y)가 소정의 값보다 작을 때까지 단계 2 및 단계 3을 반복한다.5. Repeat Step 2 and Step 3 until (C _f , t _x , t _y ) is less than the predetermined value.

(C_f,t_x,t_y)가 얻어진 후, 상대적인 움직임은 다음의 수학식을 이용하여 결정될 수 있다:After (C _f , t _x , t _y ) is obtained, the relative motion can be determined using the following equation:

주의 집중 수준과 상대적인 움직임 벡터 간의 관계는 비선형 단조 증가 함수이다. 주의 집중 수준은 상대적인 움직임이 증가함에 따라 증가한다. 상대적인 움직임이 특정 값에 도달한 때에, 주의 집중 수준은 임의의 추가적인 상대적 움직임 증가로는 증가하지 않는다. 그러므로, 상대적인 움직임 벡터와 주의 집중 수준 간의 관계는The relationship between the attention level and the relative motion vector is a nonlinear monotonically increasing function. Attention levels increase as relative movement increases. When relative motion reaches a certain value, the attention level does not increase with any additional relative motion increase. Therefore, the relationship between relative motion vector and attention level

에 의해 표현될 수 있으며, 상기 식중, x_r은 로 정의되는 상대적인 움직임 벡터이고, a와 b는 a>0, b<1 및 a·10^b = 1인 매개변수들이다.Can be represented by the formula, wherein x _r is Is a relative motion vector, where a and b are parameters where a> 0, b <1 and a · 10 ^b = 1.

마찬가지로, 주의 집중 수준과 절대적인 움직임 간의 관계는 비선형 함수이다. 절대적인 움직임이 증가할 때, 주의 집중 수준은 그에 대응하여 증가하고 그 후로는 감소한다. 주의 집중 수준과 절대적인 움직임 벡터 간의 관계는 다음과 같이 정의될 수 있다:Similarly, the relationship between attention level and absolute movement is a nonlinear function. As absolute movement increases, the level of attention increases correspondingly and then decreases. The relationship between the attention level and the absolute motion vector can be defined as follows:

상기 식중, x_a는 로 정의되는 절대적인 움직임 벡터이고, c와 d는 이 되도록 할당된 매개변수들이다.Wherein x _a is Is the absolute motion vector defined by Are the parameters assigned to be

수학식 7로부터 알 수 있듯이, x = 1/d일 때, f_a(x)는 최대이므로 c = de이다.As can be seen from equation (7), when x = 1 / d, f _a (x) is the maximum, so c = de.

그후, 전체 움직임 주의 집중 수준은 다음과 같이 결정될 수 있다.Thereafter, the overall movement attention level can be determined as follows.

상대적인 움직임, 절대적인 움직임, 주의 집중 수준 및 지각 품질 수준 간의 관계는 이하 표 1에서 보여주고 있는 바와 같이 요약될 수 있다.The relationship between relative motion, absolute motion, attention level and perceptual quality level can be summarized as shown in Table 1 below.

상대적인 움직임 Relative movement 절대적인 움직임 Absolute movement 주의 집중 수준Attention level 품질 수준Quality level 낮음lowness 낮음lowness 낮음lowness 낮음lowness 높음height 낮음lowness 높음height 높음height 낮음lowness 높음height 낮음lowness 낮음lowness 높음height 높음height 높음height 중간middle

표 1에서 알 수 있는 바와 같이, 절대적인 움직임이 높은 객체는 관찰자의 시각적 주의 집중을 모은다. 그러나, 관찰자는 객체의 품질에 신경을 쓰지 않을 것이다. 예를 들어, 관찰자는 비디오 또는 이미지들의 시퀀스에서 날아가는 공의 움직임을 따라가겠지만 날아가는 공의 형상(품질)에는 많은 주의 집중을 쏟지 않는다. 날아가는 공의 상대적인 움직임이 높을 때, 절대적인 움직임은 낮고, 관찰자는 날아가는 공의 형상(품질)에 많은 주의 집중을 쏟는다.As can be seen in Table 1, an object with a high absolute motion attracts the viewer's visual attention. However, the observer will not care about the quality of the object. For example, an observer may follow a flying ball's movement in a video or sequence of images, but does not pay much attention to the shape (quality) of the flying ball. When the relative movement of the flying ball is high, the absolute movement is low, and the observer pays much attention to the shape (quality) of the flying ball.

주목해야 할 중요한 점은 주의 집중 수준이 지각 품질 수준과는 항상 동일하지 않다는 것이다. 시각적 주의 집중 수준 외에도 지각 품질 요건 또한 본 발명에 의한 유의성 맵의 어레이를 형성하는데 사용됨에 따라서 종래 기술 중 어느 것에 비교해서도 이미지/비디오의 품질을 평가함에 있어서 HQSM이 더욱 정확하고 강력하다.It is important to note that the level of attention is not always the same as the perceived quality level. In addition to the level of visual attention, perceptual quality requirements are also used to form an array of significance maps according to the present invention, so HQSM is more accurate and powerful in assessing the quality of images / videos compared to any of the prior art.

조명 매핑Lighting mapping

이미지/비디오의 영역의 높은 조명 또는 콘트라스트는 대개 높은 시각적 주의 집중을 생성한다. 예를 들어, 무대 상의 스포트 라이트 조명은 관객들의 시각적 주의 집중을 이끌어낸다. 조명은 가우스 평활 필터를 이미지/비디오에 적용함으로써 추정될 수 있다. 다른 조명 추정 수단이 사용되어도 좋다.High illumination or contrast in the area of the image / video usually produces high visual attention. For example, spotlight lighting on the stage draws the viewer's visual attention. Illumination can be estimated by applying a Gaussian smoothing filter to the image / video. Other lighting estimation means may be used.

컬러 매핑Color mapping /Of 피부 컬러 매핑Skin color mapping

컬러 매핑은 이미지/비디오의 다른 영역들 간의 값들의 차이가 현재의 픽셀 또는 영역의 값을 결정하는데 사용될 수도 있다는 점을 제외하면 조명 매핑과 유사하다.Color mapping is similar to lighting mapping except that differences in values between different regions of the image / video may be used to determine the value of the current pixel or region.

피부 컬러는 많은 상황들에서 시각적 주의 집중을 모으고 피부 컬러의 검출은 Cb-Cr 활동 영역에서 수행될 수 있다. 특히, 조사표가 이미지/비디오의 각각의 픽셀 또는 영역에 가능한 컬러 값을 할당하는데 사용되어도 좋다. 픽셀 값이 77 < Cb < 127과 133 < Cr < 173의 범위에 들 때에 피부 컬러이 검출된다.Skin color attracts visual attention in many situations and detection of skin color can be performed in the Cb-Cr active area. In particular, a lookup table may be used to assign possible color values to each pixel or region of the image / video. Skin color is detected when the pixel value falls within the range of 77 <Cb <127 and 133 <Cr <173.

얼굴 검출Face detection

얼굴 검출은 이미지/비디오로부터 얼굴형 영역들을 검출하는 것이며, 이미지/비디오에서 사람의 얼굴은 관찰자의 높은 시각적 주의 집중을 일으키기 쉽다. 피부 컬러와 형상 정보는 얼굴 검출에 유용하다.Face detection is the detection of facial regions from an image / video, where a human face is likely to cause a high visual attention of the observer. Skin color and shape information is useful for face detection.

눈/입 검출Eye / mouth detection

얼굴 영역에서, 얼굴의 눈과 입은 대개 얼굴의 다른 부위들보다 높은 시각적 주의 집중을 모은다. 얼굴 검출 및 형상 정보는 눈/입 검출에 사용될 수 있다.In the face area, the eyes and mouth of the face usually attract higher visual attention than other parts of the face. Face detection and shape information can be used for eye / mouth detection.

형상 분석/Geometry Analysis 매핑Mapping

형상 분석은 시각적 주의 집중을 일으키는 이미지/비디오에서 객체 형상들을 결정하는데 유용하다. 형상 분석에 관한 정보는 얼굴, 눈/입, 대사 자막 등등과 같은 다른 정보를 검출하기에도 유용하다. 형상 분석은 워터쉐드(Watershed) 알고리즘을 이미지/비디오 프레임에 적용하고 이미지를 보다 작은 영역들로 분할함으로써 수행될 수 있다. [9]에 언급된 바와 같은 병합-분할법과 형상기술/클러스터링법들은 이미지/비디오에서 객체들의 형상을 결정하기 위해 수행될 수 있다.Shape analysis is useful for determining object shapes in images / videos that cause visual attention. Information about shape analysis is also useful for detecting other information such as face, eyes / mouth, metabolic subtitles and the like. Shape analysis can be performed by applying a Watershed algorithm to an image / video frame and dividing the image into smaller regions. Merge-division methods and shape description / clustering methods as mentioned in [9] can be performed to determine the shape of the objects in the image / video.

인체 검출Human body detection

형상 분석, 얼굴 검출 및 눈/입 검출로부터 얻어진 정보를 이용한 인체의 검출이 가능하다.Human body detection is possible using information obtained from shape analysis, face detection and eye / mouth detection.

자막 검출Caption detection

이미지/비디오의 자막들은 중요한 정보를 가지고 있고, 그러므로 높은 시각적 주의 집중을 가진다. 자막들은 [8]에 개시된 바와 같은 방법을 사용하여 검출될 수 있다.Subtitles in images / videos have important information and therefore have high visual attention. Subtitles can be detected using the method as disclosed in [8].

텍스쳐 분석Texture Analysis /Of 매핑Mapping

텍스쳐(texture)는 유의 수준 값의 전체 가치, 결과적으로는 생성된 HQSM에 부정적인 영향을 준다. 달리 말하면 텍스쳐는 HQSM의 맵 요소들의 전체 가치를 감소시킨다. 구체적으로는, 이미지/비디오의 텍스쳐 특징을 나타내는 에 대해 다음이 만족된다.Texture negatively affects the overall value of the significance level value, and consequently the generated HQSM. In other words, the texture reduces the overall value of the map elements of the HQSM. Specifically, it represents the texture characteristics of the image / video. For

이미지/비디오의 텍스쳐의 부정적 영향이 고려되므로, 유의 수준 값들의 어레이를 형성할 때 텍스쳐 특징을 고려하면 유의성 맵의 전체 정확도가 증대된다. 그러므로, 본 발명에 의해 생성된 HQSM은 종래 기술의 어떠한 것에 의해 생성된 유의성 맵에 비교해서도 높은 정확도를 가진다.Since the negative effects of the texture of the image / video are taken into account, considering the texture features when forming an array of significance level values, the overall accuracy of the significance map is increased. Therefore, the HQSM generated by the present invention has a high accuracy compared to the significance map generated by any of the prior art.

사용자 정의 주의 집중Custom attention

이러한 특징에서는, 이미지/비디오의 일부 또는 모든 픽셀들 또는 영역들의 유의 수준이 오디오, 특정 객체에 대한 의도적인 주의 집중 등등과 같은 다른 정보에 기초하여 인공적으로 정의된다.In this feature, the significance level of some or all pixels or regions of an image / video is artificially defined based on other information such as audio, intentional attention to a particular object, and the like.

여기서 유념해야 할 점은 특징 추출물들 중 일부만이 언급되어 있지만 본 발명이 특정한 특징 추출 방법들에 국한되지 않을 것이고 또한 이미지/비디오의 다른 특징들이 본 발명에 의한 HQSM의 생성 방법에 추가로 포함될 수 있다는 것이다.It should be noted here that only a few of the feature extracts are mentioned but the invention is not limited to specific feature extraction methods and that other features of the image / video can be further included in the method of generating HQSM according to the invention. will be.

이미지/비디오의 모든 특징들이 추출된 후, 그것들은 결정 유닛(604)에서 통합된다. 본 발명의 바람직한 실시예에 의하면, 비선형 매핑 함수가 추출된 특징들을 통합하는데 사용된다.After all the features of the image / video are extracted, they are integrated in decision unit 604. According to a preferred embodiment of the present invention, a nonlinear mapping function is used to integrate the extracted features.

임의의 추출 특징들의 쌍을 조합한 결과로서의 결합 효과들은 서로 다르고, 결합 효과를 고려한 한쌍의 추출된 특징들을 통합하기 위한 모델은 다음과 같이 주어지며,The combining effects as a result of combining any pair of extraction features are different, and a model for integrating a pair of extracted features taking into account the combining effect is given by

상기 식중, 는 품질 지향 유의성 맵의 요소이고; c¹²는 결합 효과를 표현하는 결합 계수이며; 와 는 한쌍의 추출된 특징들이고; n은 n번째 추출된 특징이며; 그리고 g₁은 비선형 함수를 나타낸다.In the above formula, Is an element of the quality oriented significance map; c ¹² is a binding coefficient representing the binding effect; Wow Is a pair of extracted features; n is the nth extracted feature; And g ₁ represents a nonlinear function.

이 비선형 함수가 바람직하게는 다음과 같이 정의될 수 있다.This nonlinear function may preferably be defined as follows.

본 발명의 다른 바람직한 실시예에서는, 3개 이상의 추출된 특징들은 다음의 수학식을 이용하여 통합된다.In another preferred embodiment of the present invention, three or more extracted features are integrated using the following equation.

상기 식중, 는 n번째 추출된 특징이고, c^Lk는 와 를 조합하는 결합 효과를 나타내는 결합 계수이며; n은 추출된 특징의 인덱스이고; k는 1 < k < N과 k ≠ L이 되는 추출된 특징의 다른 인덱스이며; N은 추출된 특징들의 총 개수이고; 그리고 L은 의 최대값이며 다음과 같이 표시된다.In the above formula, Is the n th extracted feature, and c ^Lk is And A binding coefficient indicating the binding effect to combine; n is the index of the extracted feature; k is another index of the extracted feature such that 1 <k <N and k ≠ L; N is the total number of features extracted; And L is The maximum value of and is expressed as:

여기서 수학식 12로부터 유념해야 할 점은 최대값을 갖는 추출된 특징과 다른 추출된 특징들 간의 결합 효과만이 고려된다는 것이다. 다른 추출된 특징들 간의 결합 효과들은 무시된다.It should be noted from Equation 12 that only the coupling effect between the extracted feature having the maximum value and other extracted features is considered. Combined effects between other extracted features are ignored.

본 발명의 바람직한 변형 실시예에서는, 추출된 특징들의 통합은 다음의 수학식을 이용하여 수행된다:In a preferred variant embodiment of the invention, the integration of the extracted features is performed using the following equation:

상기 식중, w₁, w₂, ..., w_n은 추출된 특징들의 가중치들이고 g₂는 비선형 매핑 함수이다.Where w ₁ , w ₂ , ..., w _n are the weights of the extracted features and g ₂ is a non-linear mapping function.

이 비선형 매핑 함수가 바람직하게는 다음과 같다.This nonlinear mapping function is preferably as follows.

상기 식중, α는 비선형성을 만족시키는 값 α=2인 매개변수이며, 그리고 C는 관찰자의 눈들이 집중 또는 주의 집중 영역 바깥의 세상을 보게 되는 것을 고려하는 값 C=1인 상수이다.Where α is a parameter with a value of α = 2 that satisfies nonlinearity, and C is a constant with a value of C = 1 that allows the observer's eyes to see the world outside the concentration or attention zone.

변형 실시예들에서는, 신경 네트워크, 퍼지 규칙들과 같은 다른 기법들이 추출된 특징들을 통합하여 유의성 맵(605)을 형성하기 위해 사용되어도 좋다.In alternative embodiments, other techniques, such as neural networks, fuzzy rules, may be used to integrate the extracted features to form the significance map 605.

추출된 특징들의 통합으로부터 생성된 유의성 맵(605)은 생성된 유의성 맵(605)의 품질을 추가로 향상시키는 사후 처리 유닛(606)에 의해 수신되어 최종 HQSM(607)을 형성한다.The significance map 605 generated from the integration of the extracted features is received by the post processing unit 606 which further improves the quality of the generated significance map 605 to form the final HQSM 607.

사후 처리 유닛(606)에서는, 특징 추출 프로세스(602)에서의 에러들에 의해 야기된 임펄스 노이즈들을 제거하기 위해 가우스 평활 필터가 생성된 유의성 맵(605)에 적용될 수 있다.In post processing unit 606, a Gaussian smoothing filter may be applied to generated significance map 605 to remove impulse noises caused by errors in feature extraction process 602.

본 발명에 의해 생성된 HQSM은 제1 부류의 접근법과 제2 부류의 접근법 양자 모두에 적용되어도 좋다. 구체적으로 기술하면, HQSM은 다음에 의해 주어진 바와 같이 MSE에 통합되어도 좋다.The HQSM generated by the present invention may be applied to both the first class of approaches and the second class of approaches. Specifically, the HQSM may be integrated into the MSE as given by

상기 식중, 는 HQSM을 통합한 수정된 MSE이다.In the above formula, Is a modified MSE that incorporates HQSM.

수정된 MSE의 결과로서의 PSNR은 결과적으로 다음과 같이 주어진다.The PSNR as a result of the modified MSE is consequently given as follows.

상기 식중, 는 HQSM을 통합한 수정된 PSNR값이다.In the above formula, Is the modified PSNR value incorporating HQSM.

HQSM은 본 발명에 따라 생성된 후 왜곡 측정 또는 품질 평가의 정확도를 개선하도록 임의의 기존 왜곡 메트릭스에 적용될 수 있다.The HQSM can be applied to any existing distortion matrix after being generated in accordance with the present invention to improve the accuracy of distortion measurement or quality assessment.

도 8은 생성된 HQSM(801)이 기존의 왜곡 메트릭스(802)에 어떻게 편입될 수 있는 지를 보여준다. 여기서 유념해야 할 점은 HQSM(801)이 왜곡 메트릭스(802)에 의한 이미지/비디오의 처리로부터 분리되며 HQSM(801) 및 왜곡 메트릭스(802)로부터의 출력들이 통합 유닛(803)에서 통합된다는 것이다.8 shows how the generated HQSM 801 can be incorporated into an existing distortion matrix 802. It should be noted here that the HQSM 801 is separated from the processing of the image / video by the distortion matrix 802 and the outputs from the HQSM 801 and the distortion matrix 802 are integrated in the integration unit 803.

도 9, 도 10 및 도 11은 도 1, 도 2 및 도 3에 도시된 바와 같은 왜곡 메트릭스에 HQSM이 어떻게 적용될 수 있는 지를 보여준다. 왜곡 메트릭스에 대한 HQSM의 적용이 왜곡 메트릭스에 의한 이미지/비디오의 처리에 독립적이므로, HQSM은 품질 평가/왜곡 측정 프로세스의 어느 단계에서라도(점선으로 표시된 바와 같음) 왜곡 메트릭스에 적용되어도 좋다.9, 10 and 11 show how the HQSM can be applied to the distortion matrix as shown in FIGS. 1, 2 and 3. Since the application of the HQSM to the distortion matrix is independent of the processing of the image / video by the distortion matrix, the HQSM may be applied to the distortion matrix at any stage of the quality assessment / distortion measurement process (as indicated by the dashed line).

본 발명에 의한 HQSM과 기존의 왜곡 메트릭스의 성능을 판단하기 위한 실험들이 수행된다.Experiments are performed to determine the performance of the HQSM and the existing distortion matrix according to the present invention.

이러한 실험들에서, HQSM은 조명 매핑, 움직임 분석, 피부 컬러 매핑 및 얼굴 검출에 기초하여 추출된 특징들을 이용하여 생성된다. 생성된 HQSM은 PSNR법에 적용되고, 왜곡 메트릭스는 [1] (왕(Wang)의 메트릭스) 및 [2] (윙클러의 메트릭스)에 개시되어 있다. "하프"와 "가을잎"으로 표시된 2개의 비디오 시퀀스들이 비디오 시퀀스들의 품질의 평가를 위한 테스트 비디오 시퀀스들로서 사용된다.In these experiments, HQSM is generated using features extracted based on light mapping, motion analysis, skin color mapping and face detection. The generated HQSM is applied to the PSNR method, and the distortion metrics are disclosed in [1] (Wang's matrix) and [2] (Winkler's matrix). Two video sequences, labeled "half" and "autumn leaf," are used as test video sequences for evaluation of the quality of the video sequences.

상기 실험들의 결과들이 표 2에 요약되어 있다.The results of the above experiments are summarized in Table 2.

왜곡 메트릭스Distortion matrix PSNRPSNR HQSM 기반 PSNRHQSM based PSNR 왕의 메트릭스King's matrix HQSM 기반 왕의 메트릭스HQSM-based king metrics 윙클러의 메트릭스Winkler's Matrix HQSM 기반 윙클러의 메트릭스HQSM-based Winkler Metrics 하프harp 0.81180.8118 0.850.85 0.67060.6706 0.68530.6853 0.69120.6912 0.74120.7412 가을잎Fall leaves 0.13240.1324 0.54410.5441 0.93240.9324 0.92650.9265 0.82350.8235 0.86470.8647

표 2의 결과들에서 알 수 있는 바와 같이, HQSM이 통합된 왜곡 메트릭스는 비디오 품질 평가에 있어 보다 양호한 성능을 제공한다. 유일한 예외는 비디오 시퀀스 "가을잎"에 관한 왕의 메트릭스이다.As can be seen from the results in Table 2, the distortion matrix with integrated HQSM provides better performance in video quality evaluation. The only exception is the king's metric on the video sequence "autumn leaves".

이에 대한 이유는 비디오 시퀀스 "가을잎"에 대한 높은 스피어만(Spearman)상관값에 기인하기 때문이다. 또한, 왕의 메트릭스를 이용한 "가을잎"용 품질 수준의 값은 이미 매우 높고(최대값이 1임), 그러므로 관찰자들의 집단에 의한 비디오 시퀀스들의 주관적인 등급 매김이 이 경우에는 크게 변한다.The reason for this is due to the high Spearman correlation for the video sequence "autumn leaves". In addition, the value of the quality level for "autumn leaves" using the King's matrix is already very high (maximum is 1), and therefore the subjective grading of video sequences by a group of observers varies greatly in this case.

그러므로, 본 발명에 의해 생성된 HQSM은 기존의 비디오 품질 평가 방법들의 성능을 개선할 수 있다.Therefore, the HQSM generated by the present invention can improve the performance of existing video quality estimation methods.

앞서 언급된 본 발명의 실시예들은 방법 뿐만 아니라 장치, 컴퓨터 판독가능 매체 및 컴퓨터 프로그램에도 적용된다.The above-described embodiments of the present invention apply not only to a method but also to an apparatus, a computer readable medium, and a computer program.

지금까지 본 발명의 실시예들이 언급되었지만, 그러한 실시예들은 단지 본 발명의 원리들을 예시한 것뿐이다. 다른 실시예들과 구성들이 본 발명의 정신과 첨부된 청구항들의 범위로부터 벗어나지 않고서도 고안될 수 있다.Although embodiments of the present invention have been mentioned so far, such embodiments are merely illustrative of the principles of the present invention. Other embodiments and configurations may be devised without departing from the spirit of the invention and the scope of the appended claims.

다음의 문헌들은 본 명세서에서 언급된 것들이다.The following documents are mentioned herein.

[1] Z. Wang, A.C. Bovik, "A universal image quality index", IEEE Signal Processing Letters, Vol.9, No. 3, March 2002, Pg. 81-84.[1] Z. Wang, A.C. Bovik, "A universal image quality index", IEEE Signal Processing Letters, Vol. 9, No. 3, March 2002, Pg. 81-84.

[2] Z. Wang, H. R. Sheikh and A. C. Bovik, "No Reference perceptual quality assessment of JPEG compressed images", IEEE International Conference on Image Processing, 2002.[2] Z. Wang, H. R. Sheikh and A. C. Bovik, "No Reference perceptual quality assessment of JPEG compressed images", IEEE International Conference on Image Processing, 2002.

[3] Stefan Winkler, "Vision Models and Quality Metrics for Image Processing Applications", ph.D. Thesis #2313, Swiss Federal Institute of Technology, Lausanne, Switzerland, 2000.[3] Stefan Winkler, "Vision Models and Quality Metrics for Image Processing Applications", ph.D. Thesis # 2313, Swiss Federal Institute of Technology, Lausanne, Switzerland, 2000.

[4] S. Yao, et al, "Perceptual visual quality evaluation with multi-features", submitted to IEE Electric Letters.[4] S. Yao, et al, "Perceptual visual quality evaluation with multi-features", submitted to IEE Electric Letters.

[5] WO 99/21173[5] WO 99/21173

[6] 미국 특허공보 제2000/0126891호[6] US Patent Publication No. 2000/0126891

[7] EP 1109132[7] EP 1 109 132

[8] US 6243419[8] US 6243419

[9] Miroslaw Bober, "MPEG-7 Visual Shape Descriptors", IEEE Transaction on circuits and systems for video technology, Vol.11, No. 6, June 2001.[9] Miroslaw Bober, "MPEG-7 Visual Shape Descriptors", IEEE Transaction on circuits and systems for video technology, Vol. 11, No. 6, June 2001.

Claims

In the method for generating a quality-oriented significance map for evaluating the quality of an image or video,

Extracting features of an image or video;

Determining perceptual quality requirements of the at least one extracted feature; And

Generating a quality directed significance map by integrating the extracted features with the perceptual quality requirements of the at least one extracted feature to form an array of significance level values.

The method of claim 1, wherein the features of the image or video are extracted using visual feature-based information and knowledge-based information.

3. The method of claim 2, wherein absolute motion and relative motion are determined and used to determine a quality level value of a pixel or region of an image or video, wherein the determined quality level value is a perceptual quality requirement used to generate a quality oriented significance map. A method of generating a quality oriented significance map, characterized in that.

The method of claim 2, wherein the extracted features and perceptual quality requirements of the at least one extracted feature are integrated to form an array of significance level values using a nonlinear mapping function. .

5. The method of claim 4, wherein the combining effects as a result of the integration of the extracted features are used when forming an array of significance level values.

6. The quality oriented significance map of claim 5, wherein

Obtained using the above formula,

Is an element of the quality oriented significance map at scale (s), position (i, j) and time (t);

Is the nth extracted feature;

c ^Lk is and Is a coupling coefficient representing the coupling effects of combining N;

n is the index of the extracted feature;

k is another index of the extracted feature such that 1 <k <N and k ≠ L;

N is the total number of features extracted; And

L is Indicated by And a maximum value of.

7. The non-linear coupling mapping function of claim 6 wherein Method of generating a quality-oriented significance map, characterized in that defined by.

The method of claim 4, wherein the incorporation of the extracted features is performed to determine weights for each of the extracted features, add weighted extraction features, and apply a nonlinear mapping function to the accumulated features, thereby providing a visual significance level value. And a method for generating a quality oriented significance map.

10. The method of claim 8, wherein the quality oriented significance map is

Obtained using the above formula,

Is the nth extracted feature;

n is the index of the extracted feature; And

g ₂ is a non-linear mapping function.

10. The nonlinear mapping function of claim 9 wherein

As defined above, wherein

α is a parameter for providing nonlinearity, and

and c is a constant.

The method of claim 1, wherein the generated quality oriented significance map is further processed in a post processing step to improve the quality of the generated quality oriented significance map.

12. The method of claim 11, wherein the post processing step is performed using a Gaussian smoothing technique.

An apparatus for generating a quality oriented significance map for evaluating the quality of an image or video, the apparatus comprising:

A feature extraction unit for extracting features of an image or video;

A judging unit for determining the perceptual quality requirement of the at least one extracted feature; And

Generating a quality oriented significance map comprising an integrated unit for integrating the extracted features with the perceptual quality requirements of the at least one extracted feature to form an array of significance level values, thereby generating a quality oriented significance map. Device.

A computer readable medium having a program recorded thereon, wherein the program causes a computer to execute a procedure for generating a quality oriented significance map for evaluating the quality of an image or video.

The procedure is

Extracting features of an image or video;

And incorporating the extracted features and perceptual quality requirements of the at least one extracted feature to form an array of significance level values, thereby generating a quality oriented significance map.

A computer program element in which a computer executes a procedure to generate a quality oriented significance map for evaluating the quality of an image or video.

The procedure is

Extracting features of an image or video;