KR20200064930A

KR20200064930A - Method and apparatus for measuring video quality based on detection of difference of perceptual sensitivity regions

Info

Publication number: KR20200064930A
Application number: KR1020190153898A
Authority: KR
Inventors: 정세윤; 이대열; 조승현; 고현석; 김연희; 김종호; 석진욱; 이주영; 임웅; 김휘용; 최진수
Original assignee: 한국전자통신연구원
Priority date: 2018-11-29
Filing date: 2019-11-27
Publication date: 2020-06-08
Also published as: KR102401340B1

Abstract

Disclosed are a method for measuring video quality based on cognition sensitive region and a device thereof. The video quality can be measured based on the cognition sensitive region and change in the cognition sensitive region. The cognition sensitive region includes a spatial cognition sensitive region, a temporal cognition sensitive region, and a spatio-temporal cognition sensitive region. A cognitive weighting can be applied to the detected cognition sensitive region and the detected change in the cognition sensitive region. Based on the cognition sensitive region and the change in the cognition sensitive region, distortion is calculated and a result of measuring the image quality of a video is generated in accordance with the calculated distortion.

Description

METHOD AND APPARATUS FOR MEASURING VIDEO QUALITY BASED ON DETECTION OF DIFFERENCE OF PERCEPTUAL SENSITIVITY REGIONS}

아래의 실시예들은 비디오 화질 측정을 위한 방법 및 장치에 관한 것으로, 보다 상세히는 인지 민감 영역의 변화의 검출에 기반하는 비디오 화질 측정 방법 및 장치가 개시된다.The following embodiments relate to a method and apparatus for video quality measurement, and more particularly, to a video quality measurement method and apparatus based on detection of a change in a cognitive sensitive region.

비디오 처리 또는 비디오 압축 분야에 있어서, 개발의 대상인 기술의 성능의 검증을 위해 또는 기술 내에서의 최적화를 위해 비디오의 화질(quality)을 나타내는 지표가 요구된다.In the field of video processing or video compression, an indicator indicating the quality of a video is required for verification of performance of a technology to be developed or optimization within a technology.

지표가 주관적 화질 성능을 잘 반영할 수록 기술이 우수한 주관적 화질 성능을 제공할 수 있다.The better the indicator reflects subjective image quality performance, the better the technology can provide.

주관적 비디오 화질 측정은 다수의 평가자들이 참여하는 화질 평가 실험을 통해 획득된 개별적인 화질 평가 값들에 대한 통계 처리를 통해 비디오에 대한 화질 측정 값을 구하는 방법이다. 이러한 주관적 비디오 화질 측정의 시각적 비용 및 경제적 비용은 모두 상대적으로 높다고 볼 수 있다.Subjective video quality measurement is a method of obtaining quality measurement values for video through statistical processing of individual quality evaluation values obtained through quality evaluation experiments involving multiple evaluators. The visual and economic costs of these subjective video quality measurements are relatively high.

비디오 화질 자동 측정 방법은 상대적으로 적은 시각적 비용 및 경제적 비용으로 화질을 측정할 수 있다. 비디오 화질 자동 측정 방법의 목적은 주관적 화질 평가를 대체하는 것이다.The automatic video quality measurement method can measure image quality at a relatively low visual cost and economic cost. The purpose of the automatic video quality measurement method is to replace subjective image quality evaluation.

이러한 대체를 위해서는, 비디오 화질 자동 측정 방법이 높은 측정 신뢰도를 제공해야 한다. 통상적으로, 측정 신뢰도는 피어슨 상관 계수(Pearson's Correlation Coefficient; PCC) 또는 스피어만 랭크 상관 계수(Spearman's Rank Correlation Coefficient; SRCC) 등을 통해 평가될 수 있다.For this replacement, the automatic video quality measurement method must provide high measurement reliability. Typically, measurement reliability may be evaluated through Pearson's Correlation Coefficient (PCC) or Spearman's Rank Correlation Coefficient (SRCC).

일 실시예는 인지 민감 영역에 기반하여 비디오의 화질을 측정하는 방법 및 장치를 제공할 수 있다.One embodiment may provide a method and apparatus for measuring the image quality of a video based on a cognitive sensitive area.

일 실시예는 인지 민감 영역의 변화에 기반하여 비디오의 화질을 측정하는 방법 및 장치를 제공할 수 있다.One embodiment may provide a method and apparatus for measuring a video quality based on a change in a cognitive sensitive region.

일 실시예는 인지적 가중치를 사용하여 영상의 특징 정보를 도출하는 방법 및 장치를 제공할 수 있다.One embodiment may provide a method and apparatus for deriving feature information of an image using cognitive weights.

일 측에 있어서, 기준 비디오 및 비교 비디오를 수신하는 통신부; 및 상기 비교 비디오에 대한 화질 측정의 결과를 생성하는 처리부를 포함하고, 상기 처리부는 상기 기준 비디오의 특징 정보, 상기 비교 비디오의 특징 정보, 상기 기준 비디오의 인지 민감 영역, 상기 비교 비디오의 인지 민감 영역, 상기 기준 비디오의 인지 민감 영역 및 상기 비교 비디오의 인지 민감 영역 간의 변화를 이용하여 왜곡을 계산하고, 상기 왜곡에 기반하여 상기 화질 측정의 결과를 생성하는 영상 처리 장치가 제공된다.In one side, the communication unit for receiving the reference video and the comparison video; And a processing unit that generates a result of measuring image quality for the comparison video, wherein the processing unit is characterized by the reference video feature information, the comparison video characteristic information, the reference video recognition sensitive area, and the comparison video recognition sensitive area. An image processing apparatus is provided that calculates a distortion using a change between the cognitive sensitive region of the reference video and the cognitive sensitive region of the comparison video, and generates the result of the image quality measurement based on the distortion.

상기 기준 비디오의 특징 정보는 상기 기준 비디오의 공간적 특징 정보, 상기 기준 비디오의 시간적 특징 정보 및 상기 기준 비디오의 시공간적 특징 정보 중 하나 이상을 포함할 수 있다.The feature information of the reference video may include one or more of spatial feature information of the reference video, temporal feature information of the reference video, and spatial and temporal feature information of the reference video.

상기 비교 비디오의 특징 정보는 상기 비교 비디오의 공간적 특징 정보, 상기 비교 비디오의 시간적 특징 정보 및 상기 비교 비디오의 시공간적 특징 정보 중 하나 이상을 포함할 수 있다.The feature information of the comparison video may include at least one of spatial feature information of the comparison video, temporal feature information of the comparison video, and spatiotemporal feature information of the comparison video.

상기 처리부는 입력 비디오의 시공간 슬라이스로부터 상기 시공간적 특징 정보를 추출할 수 있다.The processing unit may extract the spatiotemporal feature information from the spatiotemporal slice of the input video.

상기 입력 비디오는 상기 기준 비디오 및 상기 비교 비디오 중 하나 이상일 수 있다.The input video may be one or more of the reference video and the comparison video.

상기 시공간 슬라이스는 상기 입력 비디오의 픽처 그룹(Group of Picture; GOP)가 공간적으로 분할된 구조일 수 있다.The space-time slice may be a structure in which a group of pictures (GOP) of the input video is spatially divided.

상기 처리부는 입력 비디오의 영상으로부터 상기 공간적 특징 정보의 수평 특징 및 수직 특징을 검출할 수 있다.The processing unit may detect horizontal and vertical features of the spatial feature information from an image of the input video.

상기 처리부는 상기 영상으로부터 수평 방향 및 수직 방향 외의 방향에 대한 특징을 검출할 수 있다.The processing unit may detect a characteristic for a direction other than a horizontal direction and a vertical direction from the image.

상기 처리부는 엣지 검출 방법을 사용하여 상기 영상으로부터 인지적 민감도가 높은 영역을 도출할 수 있다.The processing unit may derive a region having high cognitive sensitivity from the image using an edge detection method.

상기 엣지 검출 방법은 소벨 연산일 수 있다.The edge detection method may be a Sobel operation.

상기 수평 특징 및 수직 특징은 수평-수직 엣지 맵일 수 있다.The horizontal feature and the vertical feature may be horizontal-vertical edge maps.

상기 수평 방향 및 수직 방향 외의 방향에 대한 특징은 수평 엣지 및 수직 엣지에 대한 정보가 제외된 엣지 맵일 수 있다.The characteristics for the directions other than the horizontal direction and the vertical direction may be edge maps in which information on horizontal edges and vertical edges is excluded.

상기 처리부는 제1 인지적 가중치를 사용하여 상기 수평-수직 엣지 맵을 갱신할 수 있다.The processing unit may update the horizontal-vertical edge map using a first cognitive weight.

상기 처리부는 제2 인지적 가중치를 사용하여 상기 수평 엣지 및 수직 엣지에 대한 정보가 제외된 엣지 맵을 갱신할 수 있다.The processor may update an edge map in which information on the horizontal edge and the vertical edge is excluded using the second cognitive weight.

상기 제1 인지적 가중치 및 제2 인지적 가중치는 공간적 인지 민감 영역의 변화를 반영할 수 있다.The first cognitive weight and the second cognitive weight may reflect changes in the spatial cognitive sensitive area.

상기 기준 비디오의 인지 민감 영역은 상기 기준 비디오의 공간적 인지 민감 영역, 상기 기준 비디오의 시간적 인지 민감 영역 및 상기 기준 비디오의 시공간적 인지 민감 영역 중 하나 이상을 포함할 수 있다.The cognitive sensitive region of the reference video may include one or more of a spatial cognitive sensitive region of the reference video, a temporal cognitive sensitive region of the reference video, and a spatial and temporal cognitive sensitive region of the reference video.

상기 비교 비디오의 인지 민감 영역은 상기 비교 비디오의 공간적 인지 민감 영역, 상기 비교 비디오의 시간적 인지 민감 영역 및 상기 비교 비디오의 시공간적 인지 민감 영역 중 하나 이상을 포함할 수 있다.The cognitive sensitive region of the comparison video may include one or more of a spatial cognitive sensitive region of the comparison video, a temporal cognitive sensitive region of the comparison video, and a spatiotemporal cognitive sensitive region of the comparison video.

상기 처리부는 입력 비디오의 영상의 픽셀들 또는 제1 블록들의 공간 복잡도들을 계산함으로써 공간 복잡도 맵을 생성할 수 있다.The processor may generate a spatial complexity map by calculating spatial complexity of pixels or first blocks of an image of an input video.

상기 입력 비디오는 상기 기준 비디오 또는 상기 비교 비디오일 수 있다.The input video may be the reference video or the comparison video.

상기 처리부는 상기 입력 비디오의 영상의 제2 블록들의 배경 평탄도들을 계산함으로써 평탄도 맵을 생성할 수 있다.The processor may generate a flatness map by calculating background flatnesses of the second blocks of the image of the input video.

상기 변화는 상기 기준 비디오의 인지 민감 영역 및 상기 비교 비디오의 인지 민감 영역 간의 차이일 수 있다.The change may be a difference between the cognitive sensitive region of the reference video and the cognitive sensitive region of the comparison video.

상기 왜곡은 공간적 왜곡, 시간적 왜곡 및 시공간적 왜곡 중 하나 이상을 포함할 수 있다.The distortion may include one or more of spatial distortion, temporal distortion, and spatial and temporal distortion.

상기 처리부는 상기 기준 비디오의 공간적 특징 정보, 상기 기준 비디오의 공간적 인지 민감 영역 및 공간적 인지 민감 영역의 변화를 사용하여 상기 기준 비디오의 대표 공간적 정보를 추출할 수 있다.The processing unit may extract representative spatial information of the reference video by using spatial feature information of the reference video, spatial cognitive sensitive areas and spatial cognitive sensitive areas of the reference video.

상기 처리부는 상기 비교 비디오의 공간적 특징 정보, 상기 비교 비디오의 공간적 인지 민감 영역 및 공간적 인지 민감 영역의 변화를 사용하여 상기 비교 비디오의 대표 공간적 정보를 추출할 수 있다.The processing unit may extract representative spatial information of the comparison video by using the spatial characteristic information of the comparison video, the spatial cognitive sensitive area and the spatial cognitive sensitive area of the comparison video.

상기 처리부는 상기 공간적 왜곡을 계산할 수 있다.The processing unit may calculate the spatial distortion.

상기 공간적 왜곡은 상기 기준 비디오의 대표 공간적 정보 및 상기 비교 비디오의 대표 공간적 정보 간의 차이일 수 있다.The spatial distortion may be a difference between representative spatial information of the reference video and representative spatial information of the comparison video.

상기 공간적 인지 민감 영역의 변화는 상기 기준 비디오의 공간적 인지 민감 영역 및 상기 비교 비디오의 공간적 인지 민감 영역 간의 변화일 수 있다.The change of the spatial cognitive sensitive region may be a change between the spatial cognitive sensitive region of the reference video and the spatial cognitive sensitive region of the comparison video.

상기 처리부는 상기 기준 비디오의 공간적 특징 정보, 상기 기준 비디오의 공간적 인지 민감 영역 및 상기 공간적 인지 민감 영역의 변화를 결합함으로써 상기 기준 비디오의 결합된 공간적 정보를 생성할 수 있다.The processing unit may generate combined spatial information of the reference video by combining the spatial characteristic information of the reference video, the spatial cognitive sensitive area of the reference video, and the changes of the spatial cognitive sensitive area.

상기 처리부는 상기 결합된 공간적 정보로부터 상기 기준 비디오의 각 영상의 대표 공간적 정보를 추출할 수 있다.The processor may extract representative spatial information of each image of the reference video from the combined spatial information.

상기 처리부는 상기 기준 비디오의 각 영상의 대표 공간적 정보로부터 상기 기준 비디오의 대표 공간적 정보를 추출할 수 있다.The processing unit may extract representative spatial information of the reference video from representative spatial information of each image of the reference video.

상기 처리부는 상기 기준 비디오의 시간적 특징 정보, 상기 기준 비디오의 시간적 인지 민감 영역 및 시간적 인지 민감 영역의 변화를 사용하여 상기 기준 비디오의 대표 시간적 정보를 추출할 수 있다.The processing unit may extract representative temporal information of the reference video by using temporal characteristic information of the reference video, temporal cognitive sensitive areas and temporal cognitive sensitive areas of the reference video.

상기 처리부는 상기 비교 비디오의 시간적 특징 정보, 상기 비교 비디오의 시간적 인지 민감 영역 및 시간적 인지 민감 영역의 변화를 사용하여 상기 비교 비디오의 대표 시간적 정보를 추출할 수 있다.The processing unit may extract representative temporal information of the comparison video by using temporal characteristic information of the comparison video, temporal cognitive sensitive area and temporal cognitive sensitive area of the comparative video.

상기 처리부는 상기 시간적 왜곡을 계산할 수 있다.The processing unit may calculate the temporal distortion.

상기 시간적 왜곡은 상기 기준 비디오의 대표 시간적 정보 및 상기 비교 비디오의 대표 시간적 정보 간의 차이일 수 있다.The temporal distortion may be a difference between representative temporal information of the reference video and representative temporal information of the comparison video.

상기 시간적 인지 민감 영역의 변화는 상기 기준 비디오의 시간적 인지 민감 영역 및 상기 비교 비디오의 시간적 인지 민감 영역 간의 변화일 수 있다.The change in the temporal cognitive sensitive region may be a change between the temporal cognitive sensitive region of the reference video and the temporal cognitive sensitive region of the comparison video.

상기 처리부는 상기 기준 비디오의 시간적 특징 정보, 상기 기준 비디오의 시간적 인지 민감 영역 및 상기 시간적 인지 민감 영역의 변화를 결합함으로써 상기 기준 비디오의 결합된 시간적 정보를 생성할 수 있다.The processor may generate combined temporal information of the reference video by combining temporal feature information of the reference video, temporal cognitive sensitive area of the reference video, and changes in the temporal cognitive sensitive area.

상기 처리부는 상기 결합된 시간적 정보로부터 상기 기준 비디오의 각 영상의 대표 시간적 정보를 추출할 수 있다.The processing unit may extract representative temporal information of each image of the reference video from the combined temporal information.

상기 처리부는 상기 기준 비디오의 각 영상의 대표 시간적 정보로부터 상기 기준 비디오의 대표 시간적 정보를 추출할 수 있다.The processing unit may extract representative temporal information of the reference video from representative temporal information of each image of the reference video.

다른 일 측에 있어서, 기준 비디오의 특징 정보를 추출하는 단계; 상기 기준 비디오의 인지 민감 영역을 검출하는 단계: 비교 비디오의 특징 정보를 추출하는 단계; 상기 비교 비디오의 인지 민감 영역을 검출하는 단계; 상기 기준 비디오의 인지 민감 영역 및 상기 비교 비디오의 인지 민감 영역 간의 변화를 계산하는 단계; 상기 기준 비디오의 특징 정보, 상기 비교 비디오의 특징 정보, 상기 기준 비디오의 인지 민감 영역, 상기 비교 비디오의 인지 민감 영역 및 상기 변화를 이용하여 왜곡을 계산하는 단계; 및 상기 왜곡에 기반하여 화질 측정의 결과를 생성하는 단계를 포함하는 영상 처리 방법이 제공된다.In another aspect, extracting feature information of the reference video; Detecting a cognitive sensitive region of the reference video: extracting feature information of the comparison video; Detecting a cognitive sensitive region of the comparison video; Calculating a change between a cognitive sensitive region of the reference video and a cognitive sensitive region of the comparison video; Calculating distortion using feature information of the reference video, feature information of the comparison video, a cognitive sensitive region of the reference video, a cognitive sensitive region of the comparison video, and the change; And generating a result of image quality measurement based on the distortion.

또 다른 일 측에 있어서, 상기 영상 처리 방법을 처리하는 프로그램을 수록한 컴퓨터 판독 가능 기록 매체가 제공된다.In another aspect, a computer readable recording medium containing a program for processing the image processing method is provided.

또 다른 일 측에 있어서, 기준 비디오 및 비교 비디오를 수신하는 통신부; 및 인지 화질 측정 심층 신경망을 구동하는 처리부를 포함하고, 상기 처리부는 상기 기준 비디오의 인지 민감 영역 및 상기 비교 비디오의 인지 민감 영역을 검출하고, 기준 비디오의 인지 민감 영역 및 비교 비디오의 인지 민감 영역 간의 변화를 계산하고, 상기 기준 비디오의 인지 민감 영역, 상기 비교 비디오의 인지 민감 영역 및 상기 변화는 상기 인지 화질 측정 심층 신경망으로 입력되고, 상기 인지 화질 측정 심층 신경망은 상기 기준 비디오의 인지 민감 영역, 상기 비교 비디오의 인지 민감 영역 및 상기 변화를 사용하여 화질 측정의 결과를 생성하는 영상 처리 장치가 제공된다.In another aspect, the communication unit for receiving a reference video and a comparison video; And a processing unit for driving a deep neural network for measuring cognitive image quality, wherein the processing unit detects a cognitive sensitive region of the reference video and a cognitive sensitive region of the comparison video, and between a cognitive sensitive region of the reference video and a cognitive sensitive region of the comparative video. Calculate the change, the cognitive sensitive area of the reference video, the cognitive sensitive area of the comparison video and the change are input into the cognitive quality measurement deep neural network, and the cognitive quality measurement deep neural network is the cognitive sensitive area of the reference video, the An image processing apparatus for generating a result of image quality measurement using the cognitive sensitive area of the comparison video and the change is provided.

인지 민감 영역에 기반하여 비디오의 화질을 측정하는 방법 및 장치가 제공된다.A method and apparatus for measuring a video quality based on a cognitive sensitive area is provided.

인지 민감 영역의 변화에 기반하여 비디오의 화질을 측정하는 방법 및 장치가 제공된다.A method and apparatus for measuring a video quality based on a change in a cognitive sensitive region is provided.

인지적 가중치를 사용하여 영상의 특징 정보를 도출하는 방법 및 장치가 제공된다.A method and apparatus for deriving feature information of an image using cognitive weights are provided.

도 1은 일 실시예에 따른 영상 처리 장치를 나타낸다.
도 2는 일 실시예에 따른 화질 측정을 위한 구성을 나타낸다.
도 3은 일 실시예에 따른 화질 측정 방법의 흐름도이다.
도 4는 일 예에 따른 공간적/시간적 특징 정보 추출부의 구조를 나타낸다.
도 5는 일 예에 따른 공간적/시간적 특징 정보 추출 방법의 흐름도이다.
도 6은 일 예에 따른 공간적/시간적 인지 민감 영역 검출부의 구조를 나타낸다.
도 7은 일 예에 따른 공간적/시간적 인지 민감 영역 검출 방법의 흐름도이다.
도 8은 일 예에 따른 공간적/시간적 인지 민감 영역 변화 계산부의 구조를 나타낸다.
도 9은 일 예에 따른 공간적/시간적 인지 민감 영역 변화 계산 방법의 흐름도이다.
도 10은 일 예에 따른 왜곡 계산부의 구조를 나타낸다.
도 11은 일 예에 따른 왜곡 계산 방법의 흐름도이다.
도 12는 일 예에 따른 공간적 왜곡 계산부의 구조를 나타낸다.
도 13은 일 예에 따른 공간적 왜곡 계산의 흐름도이다.
도 14는 일 예에 따른 공간적 특징 정보 추출부의 구조를 나타낸다.
도 15는 일 예에 따른 공간적 특징 정보 추출의 흐름도이다.
도 16은 일 예에 따른 공간적 인지 민감 영역 검출부의 구조를 나타낸다.
도 17은 일 예에 따른 공간적 인지 민감 영역 검출의 흐름도이다.
도 18은 일 예에 따른 영상의 공간 복잡도 측정을 나타낸다.
도 19는 일 예에 따른 시간적 왜곡 계산부의 구조를 나타낸다.
도 20은 일 예에 따른 시간적 왜곡 계산의 흐름도이다.
도 21는 일 예에 따른 시간적 특징 정보 추출부의 구조를 나타낸다.
도 22는 일 예에 따른 시간적 특징 정보 추출의 흐름도이다.
도 23은 일 예에 따른 시간적 인지 민감 영역 검출부의 구조를 나타낸다.
도 24는 일 예에 따른 시간적 인지 민감 영역 검출의 흐름도이다.
도 25는 일 실시예에 따른 인지 화질 측정 심층 신경망을 사용하는 화질 측정을 위한 구성을 나타낸다.
도 26은 일 예에 따른 일 실시예에 따른 인지 화질 측정 심층 신경망을 사용하는 화질 측정 방법의 흐름도이다.1 shows an image processing apparatus according to an embodiment.
2 shows a configuration for measuring image quality according to an embodiment.
3 is a flowchart of a method for measuring image quality according to an embodiment.
4 shows a structure of a spatial/temporal feature information extraction unit according to an example.
5 is a flowchart of a method for extracting spatial/temporal feature information according to an example.
6 shows a structure of a spatial/temporal cognitive sensitive region detection unit according to an example.
7 is a flowchart of a method for detecting a spatial/temporal cognitive sensitive area according to an example.
8 shows the structure of the spatial/temporal cognitive sensitive area change calculator according to an example.
9 is a flowchart of a method for calculating spatial/temporal cognitive sensitive area change according to an example.
10 shows a structure of a distortion calculation unit according to an example.
11 is a flowchart of a distortion calculation method according to an example.
12 shows a structure of a spatial distortion calculator according to an example.
13 is a flowchart of spatial distortion calculation according to an example.
14 shows a structure of a spatial feature information extracting unit according to an example.
15 is a flowchart of spatial feature information extraction according to an example.
16 shows a structure of a spatial cognitive sensitive region detection unit according to an example.
17 is a flowchart of spatial cognitive sensitive area detection according to an example.
18 shows a measurement of spatial complexity of an image according to an example.
19 shows the structure of a temporal distortion calculation unit according to an example.
20 is a flowchart of temporal distortion calculation according to an example.
21 shows a structure of a temporal feature information extracting unit according to an example.
22 is a flowchart of temporal feature information extraction according to an example.
23 shows a structure of a temporal cognitive sensitive region detection unit according to an example.
24 is a flowchart of temporal cognitive sensitive area detection according to an example.
25 illustrates a configuration for measuring image quality using a deep neural network for measuring cognitive image quality according to an embodiment.
26 is a flowchart of a method for measuring image quality using a deep neural network for measuring cognitive image quality according to an embodiment.

후술하는 예시적 실시예들에 대한 상세한 설명은, 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 실시예를 실시할 수 있기에 충분하도록 상세히 설명된다. 다양한 실시예들은 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 실시예의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 예시적 실시예들의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다.For detailed description of exemplary embodiments described below, reference is made to the accompanying drawings that illustrate specific embodiments as examples. These embodiments are described in detail enough to enable those skilled in the art to practice the embodiments. It should be understood that the various embodiments are different, but need not be mutually exclusive. For example, certain shapes, structures, and properties described herein may be implemented in other embodiments without departing from the spirit and scope of the invention in relation to one embodiment. In addition, it should be understood that the location or placement of individual components within each disclosed embodiment can be changed without departing from the spirit and scope of the embodiment. Therefore, the following detailed description is not intended to be taken in a limiting sense, and the scope of exemplary embodiments, if appropriately described, is limited only by the appended claims, along with all ranges equivalent to those claimed.

도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다. 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.In the drawings, similar reference numerals refer to the same or similar functions throughout several aspects. The shape and size of elements in the drawings may be exaggerated for a clearer explanation.

실시예에서 사용된 용어는 실시예를 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 실시예에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않으며, 추가적인 구성이 예시적 실시예들의 실시 또는 예시적 실시예들의 기술적 사상의 범위에 포함될 수 있음을 의미한다. 어떤 구성요소(component)가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 상기의 2개의 구성요소들이 서로 간에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있으나, 상기의 2개의 구성요소들의 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.The terms used in the examples are for describing the examples and are not intended to limit the present invention. In an embodiment, the singular form also includes the plural form unless specifically stated in the text. As used herein, "comprises" and/or "comprising" refers to the components, steps, operations and/or elements mentioned above, the presence of one or more other components, steps, operations and/or elements. Or it does not exclude addition, it means that additional configurations may be included in the scope of the technical spirit of the exemplary embodiments or the exemplary embodiments. When a component is said to be "connected" or "connected" to another component, the two components may be directly connected to or connected to each other, but the above 2 It should be understood that other components may exist in the middle of the dog components.

제1 및 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기의 구성요소들은 상기의 용어들에 의해 한정되어서는 안 된다. 상기의 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하여 지칭하기 위해서 사용된다. 예를 들어, 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Terms such as first and second may be used to describe various elements, but the above elements should not be limited by the above terms. The above terms are used to distinguish one component from another component. For example, the first component may be referred to as a second component without departing from the scope of rights, and similarly, the second component may also be referred to as a first component.

또한, 실시예들에 나타나는 구성요소들은 서로 다른 특징적인 기능들을 나타내기 위해 독립적으로 도시되는 것으로, 각 구성요소가 분리된 하드웨어나 하나의 소프트웨어 구성 단위로만 이루어짐을 의미하지 않는다. 즉, 각 구성요소는 설명의 편의상 각각의 구성요소로 나열된 것이다. 예를 들면, 구성요소들 중 적어도 두 개의 구성요소들이 하나의 구성요소로 합쳐질 수 있다. 또한, 하나의 구성요소가 복수의 구성요소들로 나뉠 수 있다. 이러한 각 구성요소의 통합된 실시예 및 분리된 실시예 또한 본질에서 벗어나지 않는 한 권리범위에 포함된다.In addition, the components shown in the embodiments are shown independently to indicate different characteristic functions, and do not mean that each component is composed of only separate hardware or one software configuration unit. That is, each component is listed as each component for convenience of description. For example, at least two of the components may be combined into one component. Also, one component may be divided into a plurality of components. The consolidated and separate embodiments of each of these components are also included in the scope of the claims, without departing from the essence.

또한, 일부의 구성요소는 본질적인 기능을 수행하는 필수적인 구성요소는 아니고 단지 성능을 향상시키기 위한 선택적 구성요소일 수 있다. 실시예들은 실시예의 본질을 구현하는데 필수적인 구성부만을 포함하여 구현될 수 있고, 예를 들면, 단지 성능 향상을 위해 사용되는 구성요소와 같은, 선택적 구성요소가 제외된 구조 또한 권리 범위에 포함된다.Also, some of the components are not essential components for performing essential functions, but may be optional components for improving performance. Embodiments may be implemented including only components necessary to implement the essence of the embodiments, and structures in which optional components are excluded, such as components used to improve performance, are also included in the scope of rights.

이하에서는, 기술분야에서 통상의 지식을 가진 자가 실시예들을 용이하게 실시할 수 있도록 하기 위하여, 첨부된 도면을 참조하여 실시예들을 상세히 설명하기로 한다. 실시예들을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 명세서의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings in order to enable those skilled in the art to easily implement the embodiments. In describing the embodiments, when it is determined that a detailed description of a related known configuration or function may obscure the subject matter of the present specification, the detailed description will be omitted.

평균 제곱 에러(Mean Squared Error; MSE), 최대 신호 대 잡음 비(Peak Signal to Noise Ratio; PSNR) 또는 구조적 유사도(Structural SIMilarity; SSIM) 등과 같은 기존의 완전 참조(Full Reference; FR) 화질 지표들은 기준 비디오(reference video) 및 비교 비디오(comparison video) 간의 차이를 측정하는 방법으로서 직관적이라는 장점을 가질 수 있다.Existing Full Reference (FR) quality indicators such as Mean Squared Error (MSE), Peak Signal to Noise Ratio (PSNR) or Structural SIMilarity (SSIM) are standard As a method of measuring the difference between a reference video and a comparison video, it can have an advantage of being intuitive.

그러나, 이러한 완전 참조 화질 지표에 의한 측정 값 및 인지 화질(perceptual quality) 간의 차이가 큰 경우가 있으며, 따라서 서로 다른 특성의 영상들 간의 비교에 있어서는 완전 참조 화질 지표의 측정 신뢰도가 낮다는 단점이 있을 수 있다.However, there is a case where the difference between the measured value and the perceptual quality by the full reference picture quality index is large, and thus there is a disadvantage that the measurement reliability of the full reference picture quality index is low in comparison between images of different characteristics. Can be.

나아가, 이러한 완전 참조 화질 지표를 사용하는 방법들이 비디오에 적용되는 경우, 비디오의 각 영상에 대한 측정을 통해 영상의 결과 값이 생성되고, 영상들의 결과 값들의 평균이 비디오에 대한 결과 값이 될 수 있다. 따라서, 완전 참조 화질 지표가 비디오에 적용되는 경우, 완전 참조 화질 지표의 측정 신뢰도가 더욱 떨어질 수 있다.Furthermore, when methods using the full reference quality index are applied to a video, a result value of the image is generated through measurement of each image of the video, and an average of the result values of the images can be a result value of the video. have. Therefore, when the full reference picture quality index is applied to a video, measurement reliability of the full reference picture quality index may be further deteriorated.

여러 가지의 비디오 화질 자동 측정 방법들이 제안되었고, 비디오 화질 모델-가변 프레임 딜레이들(Video Quality Model-Variable Frame Delays; VQM-VFD)은 국제 표준으로도 채택되었다. 그러나 이러한 방법들 역시 충분히 높은 신뢰도를 갖지 못해 널리 사용되지 못하고 있다.Several automatic video quality measurement methods have been proposed, and Video Quality Model-Variable Frame Delays (VQM-VFD) have also been adopted as international standards. However, these methods are also not widely used because they are not sufficiently reliable.

전술된 것과 같은 비디오 화질 자동 측정 방법들은 시각적 인지 특성을 효과적으로 활용하지 못하고 있다. 따라서, 시각적 인지 특성을 효과적으로 활용함으로써 비디오 화질 자동 측정 방법의 성능이 크게 향상될 수 있다.The automatic video quality measurement methods as described above do not effectively utilize visual cognitive characteristics. Therefore, the performance of the automatic video quality measurement method can be greatly improved by effectively utilizing the visual recognition characteristics.

도 1은 일 실시예에 따른 영상 처리 장치를 나타낸다.1 shows an image processing apparatus according to an embodiment.

비디오 처리 장치(100)는 처리부(110), 통신부(120), 메모리(130), 저장소(140) 및 버스(190) 중 적어도 일부를 포함할 수 있다. 처리부(110), 통신부(120), 메모리(130) 및 저장소(140) 등과 같은 비디오 처리 장치(100)의 구성요소들은 버스(190)를 통해 서로 간에 통신할 수 있다.The video processing apparatus 100 may include at least a part of the processing unit 110, the communication unit 120, the memory 130, the storage 140, and the bus 190. Components of the video processing apparatus 100, such as the processing unit 110, the communication unit 120, the memory 130, and the storage 140, may communicate with each other through the bus 190.

처리부(110)는 메모리(130) 또는 저장소(140)에 저장된 프로세싱(processing) 명령어(instruction)들을 실행하는 반도체 장치일 수 있다. 예를 들면, 처리부(110)는 적어도 하나의 하드웨어 프로세서(processor)일 수 있다.The processing unit 110 may be a semiconductor device that executes processing instructions stored in the memory 130 or the storage 140. For example, the processing unit 110 may be at least one hardware processor.

처리부(110)는 비디오 처리 장치(100)의 동작을 위해 요구되는 작업을 처리할 수 있다. 처리부(110)는 실시예들에서 설명된 처리부(110)의 동작 또는 단계의 코드를 실행(execute)할 수 있다.The processing unit 110 may process a task required for the operation of the video processing apparatus 100. The processing unit 110 may execute (execute) the code of the operation or step of the processing unit 110 described in the embodiments.

처리부(110)는 후술될 실시예에서 설명될 정보의 생성, 저장 및 출력을 수행할 수 있으며, 기타 비디오 처리 장치(100)에서 이루어지는 단계의 동작을 수행할 수 있다.The processor 110 may generate, store, and output information to be described in the embodiments to be described later, and may perform operations of steps performed in other video processing devices 100.

통신부(120)는 네트워크(199)에 연결될 수 있다. 비디오 처리 장치(100)의 동작을 위해 요구되는 데이터 또는 정보를 수신할 수 있으며, 비디오 처리 장치(100)의 동작을 위해 요구되는 데이터 또는 정보를 전송할 수 있다. 통신부(120)는 네트워크(199)를 통해 다른 장치로 데이터를 전송할 수 있고, 다른 장치로부터 데이터를 수신할 수 있다. 예를 들면, 통신부(120)는 네트워크 칩(chip) 또는 포트(port)일 수 있다.The communication unit 120 may be connected to the network 199. Data or information required for the operation of the video processing apparatus 100 may be received, and data or information required for the operation of the video processing apparatus 100 may be transmitted. The communication unit 120 may transmit data to other devices through the network 199 and receive data from other devices. For example, the communication unit 120 may be a network chip or a port.

메모리(130) 및 저장소(140)는 다양한 형태의 휘발성 또는 비휘발성 저장 매체일 수 있다. 예를 들어, 메모리(130)는 롬(ROM)(131) 및 램(RAM)(132) 중 적어도 하나를 포함할 수 있다. 저장소(140)는 램, 플레시(flash) 메모리 및 하드 디스크(hard disk) 등과 같은 내장형의 저장 매체를 포함할 수 있고, 메모리 카드 등과 같은 탈착 가능한 저장 매체를 포함할 수 있다.The memory 130 and the storage 140 may be various types of volatile or nonvolatile storage media. For example, the memory 130 may include at least one of a ROM (ROM) 131 and a RAM (RAM) 132. The storage 140 may include a built-in storage medium such as RAM, flash memory, and hard disk, and may include a removable storage medium such as a memory card.

비디오 처리 장치(100)의 기능 또는 동작은 처리부(110)가 적어도 하나의 프로그램 모듈을 실행함에 따라 수행될 수 있다. 메모리(130) 및/또는 저장소(140)는 적어도 하나의 프로그램 모듈을 저장할 수 있다. 적어도 하나의 프로그램 모듈은 처리부(110)에 의해 실행되도록 구성될 수 있다.The function or operation of the video processing apparatus 100 may be performed as the processing unit 110 executes at least one program module. The memory 130 and/or the storage 140 may store at least one program module. At least one program module may be configured to be executed by the processing unit 110.

전술된 비디오 처리 장치(100)의 구성요소들 중 적어도 일부는 적어도 하나의 프로그램 모듈일 수 있다.At least some of the components of the video processing apparatus 100 described above may be at least one program module.

예를 들면, 후술될 부(unit)들 및 신경망(neural network)은 처리부(110)에 의해 실행되는 프로그램 모듈일 수 있다.For example, units and neural networks to be described later may be program modules executed by the processing unit 110.

프로그램 모듈들은 운영 체제(Operating System), 어플리케이션 모듈, 라이브러리(library) 및 기타 프로그램 모듈의 형태로 비디오 처리 장치(100)에 포함될 수 있으며, 물리적으로는 여러 가지 공지의 기억 장치 상에 저장될 수 있다. 또한, 이러한 프로그램 모듈 중 적어도 일부는 비디오 처리 장치(100)와 통신 가능한 원격 기억 장치에 저장될 수도 있다. 한편, 이러한 프로그램 모듈들은 실시예에 따른 특정 동작 또는 특정 테스크(task)를 수행하거나 특정 추상 데이터 타입(abstract data type)을 실행하는 루틴(routine), 서브루틴(subroutine), 프로그램(program), 오브젝트(object), 컴포넌트(component) 및 데이터 구조(data structure) 등을 포괄할 수 있고, 이러한 것들에 제한되지는 않을 수 있다.The program modules may be included in the video processing apparatus 100 in the form of an operating system, an application module, a library, and other program modules, and may be physically stored on various known storage devices. . Also, at least some of these program modules may be stored in a remote storage device capable of communicating with the video processing device 100. On the other hand, these program modules are routines, subroutines, programs, objects that perform specific operations or tasks according to embodiments or execute specific abstract data types (object), component (component) and data structure (data structure), and the like, but may not be limited to these.

비디오 처리 장치(100)는 사용자 인터페이스(User Interface; UI) 입력 디바이스(150) 및 UI 출력 디바이스(160)를 더 포함할 수 있다. UI 입력 디바이스(150)는 비디오 처리 장치(100)의 동작을 위해 요구되는 사용자의 입력을 수신할 수 있다. UI 출력 디바이스(160)는 비디오 처리 장치(100)의 동작에 따른 정보 또는 데이터를 출력할 수 있다.The video processing apparatus 100 may further include a user interface (UI) input device 150 and a UI output device 160. The UI input device 150 may receive a user input required for the operation of the video processing apparatus 100. The UI output device 160 may output information or data according to the operation of the video processing apparatus 100.

비디오 처리 장치(100)는 센서(170)를 더 포함할 수 있다. 센서(170)는 영상을 연속적으로 촬영함으로써 비디오를 생성할 수 있다.The video processing apparatus 100 may further include a sensor 170. The sensor 170 may generate a video by continuously capturing an image.

이하에서, 용어 "영상" 및 용어 "프레임"은 동일한 의미로 사용될 수 있고, 서로 간에 대체될 수 있다.Hereinafter, the terms "image" and the term "frame" may be used in the same sense, and may be replaced with each other.

도 2는 일 실시예에 따른 화질 측정을 위한 구성을 나타낸다.2 shows a configuration for measuring image quality according to an embodiment.

처리부(110)는 공간적/시간적 특징 정보 추출부(210), 공간적/시간적 인지 민감 영역 검출부(220), 공간적/시간적 인지 민감 영역 변화 계산부(230), 왜곡 계산부(240) 및 지표 계산부(250)를 포함할 수 있다.The processing unit 110 includes a spatial/temporal feature information extraction unit 210, a spatial/temporal cognitive sensitive area detection unit 220, a spatial/temporal cognitive sensitive area change calculation unit 230, a distortion calculation unit 240, and an index calculation unit 250.

영상 처리 장치(100)는 비디오의 인지 화질을 자동으로 측정할 수 있다. 영상 처리 장치(100)는 비디오의 인지 화질에 대한 측정에 있어서 높은 측정 신뢰도를 제공하기 위해 비디오에서 인지적 민감(sensitivity) 영역을 검출할 수 있고, 검출된 인지적 민감 영역을 화질 측정에 대하여 활용할 수 있다. 이하에서, 화질은 인지 화질을 의미할 수 있다.The image processing apparatus 100 may automatically measure the perceived image quality of the video. The image processing apparatus 100 may detect a cognitive sensitivity region in a video in order to provide high measurement reliability in measuring the cognitive image quality of the video, and utilize the detected cognitive sensitivity region for image quality measurement Can be. Hereinafter, image quality may mean cognitive image quality.

주관적 화질 평가 측정에서는, 각 평가자는 기준 비디오의 화질 및 비교 비디오의 화질을 각각 평가할 수 있다. 각 평가자가 기준 비디오의 화질에 대해 부여한 평가 값 및 비교 비디오의 화질에 대해 부여한 평가 값 간의 차이가 도출될 수 있다. 평균 여론 점수(Mean Opinion Score; MOS)는 평가자들에 의한 평가 값들의 차이들의 평균 값일 수 있다.In the subjective picture quality evaluation measurement, each evaluator can evaluate the picture quality of the reference video and the picture quality of the comparative video, respectively. The difference between the evaluation value given by each evaluator to the quality of the reference video and the evaluation value given to the quality of the comparison video may be derived. The mean opinion score (Mean Opinion Score; MOS) may be an average value of differences of evaluation values by evaluators.

이 때, 각 평가자는 기준 비디오를 평가함에 있어서 인지적으로 민감한 영역의 화질을 상대적으로 더 고려할 수 있고, 비교 비디오를 평가함에 있어서도 인지적으로 민감한 영역의 화질을 상대적으로 더 고려할 수 있다.At this time, each evaluator may consider the image quality of the cognitively sensitive region relatively in evaluating the reference video, and may further consider the image quality of the cognitively sensitive region in evaluating the comparative video.

일반적으로, 기준 비디오에서의 인지적으로 민감한 영역 및 비교 비디오에서의 인지적으로 민감한 영역은 서로 다를 수 있다.Generally, the cognitively sensitive region in the reference video and the cognitively sensitive region in the comparison video may be different.

또한, 이중 자극 연속 품질 척도 평가법(Double Stimulus Continuous Quality Scale; DSCQS)과 같이, 기준 비디오 및 평가 비디오의 쌍에 대해서 화질들을 평가할 경우, 양 비디오들에서 민감한 영역의 변화도 더 고려될 수 있다. 이러한 인지적 특성을 고려하기 위해, 실시예에서는 공간적/시간적 인지 민감 영역 검출부(220) 및 공간적/시간적 인지 민감 영역 변화 계산부(230)가 도입될 수 있다. 이러한 공간적/시간적 인지 민감 영역 검출부(220) 및 공간적/시간적 인지 민감 영역 변화 계산부(230)을 이용하는 화질 측정을 통해 보다 높은 측정 신뢰도가 제공될 수 있다.In addition, when evaluating image quality with respect to a pair of a reference video and an evaluation video, such as a double stimulus continuous quality scale (DSCQS), a change in a sensitive region in both videos may be further considered. In order to consider these cognitive characteristics, in an embodiment, the spatial/temporal cognitive sensitive region detection unit 220 and the spatial/temporal cognitive sensitive region change calculation unit 230 may be introduced. Higher measurement reliability can be provided through image quality measurement using the spatial/temporal cognitive sensitive region detection unit 220 and the spatial/temporal cognitive sensitive region change calculation unit 230.

공간적/시간적 특징 정보 추출부(210), 공간적/시간적 인지 민감 영역 검출부(220), 공간적/시간적 인지 민감 영역 변화 계산부(230), 왜곡 계산부(240) 및 지표 계산부(250)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.The functions of the spatial/temporal feature information extraction unit 210, the spatial/temporal cognitive sensitive region detection unit 220, the spatial/temporal cognitive sensitive region change calculation unit 230, the distortion calculation unit 240, and the index calculation unit 250 And the operation is described in detail below.

실시예들에 있어서, 용어 "공간적/시간적"은 "공간적(spatial), 시간적(temporal) 및 시공간적(spatio-temporal)" 또는 "공간적(spatial), 시간적(temporal) 및 시공간적(spatio-temporal) 중 적어도 하나"를 의미할 수 있다.In embodiments, the term "spatial/temporal" may be "spatial, temporal and spatio-temporal" or "spatial, temporal and spatio-temporal" At least one.

또한, 실시예들에 있어서 공간적 특징 및 시간적 특징은 시공간적 특징으로서 결합될 수 있다.In addition, spatial and temporal features in embodiments may be combined as spatial and temporal features.

또한, 실시예들에 있어서, "공간적 정보"는 비디오의 영상에 대한 것일 수 있다. 말하자면, "공간적 정보"는 비디오의 전체가 아닌 비디오의 영상에 대한 것일 수 있다. 실시예에서, "비디오"는 "영상"으로 대체될 수 있다. 예를 들면, "기준 비디오"는 "기준 비디오의 기준 영상"으로 대체될 수 있다. "비교 비디오"는 "비교 비디오의 비교 영상"으로 대체될 수 있다.Further, in embodiments, "spatial information" may be for an image of a video. In other words, "spatial information" may be related to an image of a video rather than the entire video. In an embodiment, "video" may be replaced with "video". For example, "reference video" may be replaced with "reference video of reference video". "Comparative video" may be replaced with "Comparative video of comparative video".

도 3은 일 실시예에 따른 화질 측정 방법의 흐름도이다.3 is a flowchart of a method for measuring image quality according to an embodiment.

단계(310)의 이전에, 통신부(120)는 기준 비디오 및 비교 비디오를 수신할 수 있다. 또는, 통신부(120)는 기준 비디오 및 비교 비디오 중 하나를 수신할 수 있고, 센서(170)는 다른 하나를 촬영할 수 있다.Prior to step 310, the communication unit 120 may receive a reference video and a comparison video. Alternatively, the communication unit 120 may receive one of the reference video and the comparison video, and the sensor 170 may photograph the other.

단계(310)에서, 공간적/시간적 특징 정보 추출부(210)는 기준 비디오(reference video)의 공간적/시간적 특징 정보를 추출할 수 있다.In step 310, the spatial/temporal feature information extractor 210 may extract spatial/temporal feature information of a reference video.

공간적/시간적 특징 정보는 공간적(spatial) 특징 정보, 시간적(temporal) 특징 정보 및 시공간적(spatio-temporal) 특징 정보의 하나 이상을 의미할 수 있다. 또한, 공간적/시간적 특징 정보는 특징 정보로 약술될 수 있다.The spatial/temporal characteristic information may mean one or more of spatial characteristic information, temporal characteristic information, and spatio-temporal characteristic information. In addition, spatial/temporal feature information may be abbreviated as feature information.

단계(320)에서, 공간적/시간적 인지 민감 영역 검출부(220)는 기준 비디오의 공간적/시간적 인지 민감 영역을 검출할 수 있다.In step 320, the spatial/temporal cognitive sensitive region detection unit 220 may detect the spatial/temporal cognitive sensitive region of the reference video.

공간적/시간적 인지 민감 영역은 공간적 인지 민감 영역, 시간적 인지 민감 영역 및 시공간적 인지 민감 영역의 하나 이상을 의미할 수 있다. 또한, 공간적/시간적 인지 민감 영역은 인지 민감 영역으로 약술될 수 있다.The spatial/temporal cognitive sensitive region may mean one or more of the spatial cognitive sensitive region, the temporal cognitive sensitive region and the spatiotemporal cognitive sensitive region. In addition, the spatial/temporal cognitive sensitive area can be abbreviated as a cognitive sensitive area.

공간적/시간적 인지 민감 영역 검출부(220)는 기준 비디오에서 공간적/시간적으로 덜 민감한 영역 및 공간적/시간적으로 민감한 영역을 구분할 수 있다.The spatial/temporal cognitive sensitive region detection unit 220 may distinguish a spatially/temporally less sensitive region and a spatially/temporally sensitive region from the reference video.

예를 들면, 공간적/시간적으로 덜 민감한 영역은 공간적/시간적 마스킹 효과가 큰 영역을 의미할 수 있다.For example, a region that is less spatially and temporally sensitive may mean a region having a large spatial/temporal masking effect.

공간적/시간적 마스킹 효과는 공간적 마스킹 효과, 시간적 마스킹 효과 및 시공간적 마스킹 효과의 하나 이상을 의미할 수 있다. 또한, 공간적/시간적 마스킹 효과는 마스킹 효과로 약술될 수 있다.The spatial/temporal masking effect may mean one or more of a spatial masking effect, a temporal masking effect, and a spatiotemporal masking effect. In addition, the spatial/temporal masking effect can be abbreviated as the masking effect.

말하자면, 공간적/시간적 인지 민감 영역 검출부(220)는 기준 비디오의 영역들에 대하여 각 영역의 민감도를 측정할 수 있다.In other words, the spatial/temporal cognitive sensitive region detector 220 may measure the sensitivity of each region with respect to regions of the reference video.

단계(330)에서, 공간적/시간적 특징 정보 추출부(210)는 비교 비디오의 공간적/시간적 특징 정보를 추출할 수 있다.In step 330, the spatial/temporal feature information extraction unit 210 may extract spatial/temporal feature information of the comparison video.

단계(340)에서, 공간적/시간적 인지 민감 영역 검출부(220)는 비교 비디오의 공간적/시간적 인지 민감 영역을 검출할 수 있다.In operation 340, the spatial/temporal cognitive sensitive region detection unit 220 may detect the spatial/temporal cognitive sensitive region of the comparison video.

공간적/시간적 인지 민감 영역 검출부(220)는 비교 비디오에서 공간적/시간적으로 덜 민감한 영역 및 공간적/시간적으로 민감한 영역을 구분할 수 있다.The spatial/temporal cognitive sensitive region detection unit 220 may distinguish between spatially/temporally less sensitive regions and spatial/temporally sensitive regions in the comparison video.

말하자면, 공간적/시간적 인지 민감 영역 검출부(220)는 비교 비디오의 영역들에 대하여 각 영역의 민감도를 측정할 수 있다.In other words, the spatial/temporal cognitive sensitive region detector 220 may measure the sensitivity of each region with respect to regions of the comparison video.

단계(350)에서, 공간적/시간적 인지 민감 영역 변화 계산부(230)는 기준 비디오의 공간적/시간적 인지 민감 영역 및 비교 비디오의 공간적/시간적 인지 민감 영역 간의 변화를 계산할 수 있다.In step 350, the spatial/temporal cognitive sensitive region change calculator 230 may calculate a change between the spatial/temporal cognitive sensitive region of the reference video and the spatial/temporal cognitive sensitive region of the comparison video.

여기에서, 용어 "변화" 및 용어 "차이"는 서로 동일한 의미로 사용될 수 있고, 서로 간에 교체될 수 있다. 말하자면, 변화는 기준 비디오의 공간적/시간적 인지 민감 영역 및 비교 비디오의 공간적/시간적 인지 민감 영역 간의 차이를 의미할 수 있다. 또한, 변화는 기준 비디오 및 비교 비디오의 서로 대응하는 인지 민감 영역들 간의 차이를 의미할 수 있다.Here, the terms "change" and the term "difference" may be used interchangeably with each other and may be interchanged with each other. In other words, the change may mean a difference between the spatial/temporal cognitive sensitive region of the reference video and the spatial/temporal cognitive sensitive region of the comparison video. Also, the change may mean a difference between cognitive sensitive regions corresponding to each other of the reference video and the comparison video.

또한, "기준 비디오의 공간적/시간적 인지 민감 영역 및 비교 비디오의 공간적/시간적 인지 민감 영역 간의 변화"는 "기준 비디오의 공간적/시간적 인지 민감 영역 및 비교 비디오의 공간적/시간적 인지 민감 영역에서의 변화"로 해석될 수도 있다.In addition, "the change between the spatial/temporal cognitive sensitive area of the reference video and the spatial/temporal cognitive sensitive area of the reference video" is "the change in the spatial/temporal cognitive sensitive area of the reference video and the spatial/temporal cognitive sensitive area of the reference video" It can also be interpreted as

공간적/시간적 인지 민감 영역 변화 계산부(230)는 민감도가 변한 영역을 검출할 수 있다.The spatial/temporal cognitive sensitive region change calculator 230 may detect a region in which the sensitivity has changed.

민감도가 변한 영역은 1) 기준 비디오에서는 높은 민감도를 갖고, 비교 비디오에서는 낮은 민감도를 갖는 영역 및 2) 기준 비디오에서는 낮은 민감도를 갖고, 비교 비디오에서는 높은 민감도를 갖는 영역일 수 있다.The region where the sensitivity is changed may be 1) a region having high sensitivity in the reference video, a region having low sensitivity in the comparison video, and 2) a region having low sensitivity in the reference video, and a region having high sensitivity in the comparison video.

또는, 기준 비디오에서의 영역의 민감도 및 비교 비디오에서의 상기의 영역의 민감도가 서로 다르면 영역의 민감도가 변한 것으로 간주될 수 있다.Alternatively, if the sensitivity of the region in the reference video and the sensitivity of the region in the comparison video are different, the sensitivity of the region may be considered to have changed.

여기에서, 기준 비디오에서는 높은 민감도를 갖고 비교 비디오에서는 낮은 민감도를 갖는 영역은 인지적으로 중요한 정보가 손실된 영역일 수 있다. Here, a region having high sensitivity in the reference video and a low sensitivity in the comparison video may be a region in which cognitively important information is lost.

또한, 기준 비디오에서는 낮은 민감도를 갖고 비교 비디오에서는 높은 민감도를 갖는 영역은 인지적으로 중요한 정보가 발생한 영역일 수 있다. 이러한 영역의 대표적인 예로서, 높은 압축률로 비디오가 압축될 경우에 압축된 비디오에서 발생하는 블록킹 아티팩트(blocking artifact)가 있다.In addition, a region having low sensitivity in the reference video and a high sensitivity in the comparison video may be a region in which cognitively important information is generated. A representative example of this area is blocking artifacts that occur in compressed video when the video is compressed at a high compression rate.

말하자면, 공간적/시간적 인지 민감 영역 변화 계산부(230)는 인지적으로 중요한 정보가 손실된 영역 및 인지적으로 중요한 정보가 발생된 영역을 검출할 수 있다.In other words, the spatial/temporal cognitive sensitive region change calculator 230 may detect a region in which cognitively important information is lost and a region in which cognitively important information is generated.

단계(360)에서, 왜곡 계산부(240)는 공간적/시간적 특징 정보 추출부(210)로부터의 결과, 공간적/시간적 인지 민감 영역 검출부(220)로부터의 결과 및 공간적/시간적 인지 민감 영역 변화 계산부(230)로부터의 결과를 이용하여 공간적/시간적 왜곡을 계산할 수 있다.In step 360, the distortion calculation unit 240 results from the spatial/temporal feature information extraction unit 210, results from the spatial/temporal cognitive sensitive area detection unit 220, and spatial/temporal cognitive sensitive area change calculation unit Spatial/temporal distortion can be calculated using the results from (230).

여기에서, 공간적/시간적 특징 정보 추출부(210)로부터의 결과는 기준 비디오의 공간적/시간적 특징 정보 및 비교 비디오의 공간적/시간적 특징 정보일 수 있다.Here, the result from the spatial/temporal feature information extraction unit 210 may be spatial/temporal feature information of the reference video and spatial/temporal feature information of the comparison video.

공간적/시간적 인지 민감 영역 검출부(220)로부터의 결과는 기준 비디오의 공간적/시간적 인지 민감 영역 및 비교 비디오의 공간적/시간적 인지 민감 영역일 수 있다.The result from the spatial/temporal cognitive sensitive region detection unit 220 may be a spatial/temporal cognitive sensitive region of the reference video and a spatial/temporal cognitive sensitive region of the comparison video.

공간적/시간적 인지 민감 영역 변화 계산부(230)로부터의 결과는 기준 비디오의 공간적/시간적 인지 민감 영역 및 비교 비디오의 공간적/시간적 인지 민감 영역 간의 변화일 수 있다.The result from the spatial/temporal cognitive sensitive region change calculation unit 230 may be a change between the spatial/temporal cognitive sensitive region of the reference video and the spatial/temporal cognitive sensitive region of the comparison video.

공간적/시간적 왜곡은 공간적 왜곡, 시간적 왜곡 및 시공간적 왜곡의 하나 이상을 의미할 수 있다. 또한, 공간적/시간적 왜곡은 왜곡으로 약술될 수 있다.Spatial/temporal distortion may mean one or more of spatial distortion, temporal distortion, and spatial and temporal distortion. Also, spatial/temporal distortion can be abbreviated as distortion.

단계(370)에서, 지표 계산부(250)는 왜곡 계산부(240)로부터의 결과인 왜곡에 기반하여 비교 비디오에 대한 화질 측정의 결과를 생성할 수 있다. 예를 들면, 지표 계산부(250)는 왜곡 계산부(240)로부터의 결과인 왜곡을 화질 측정의 결과로 변환할 수 있다.In step 370, the index calculator 250 may generate a result of measuring the quality of the comparison video based on the distortion that is the result from the distortion calculator 240. For example, the index calculator 250 may convert the distortion resulting from the distortion calculator 240 into a result of image quality measurement.

여기에서, 왜곡 계산부(240)로부터의 결과는 공간적/시간적 왜곡일 수 있다.Here, the result from the distortion calculator 240 may be spatial/temporal distortion.

도 4는 일 예에 따른 공간적/시간적 특징 정보 추출부의 구조를 나타낸다.4 shows a structure of a spatial/temporal feature information extraction unit according to an example.

공간적/시간적 특징 정보 추출부(210)는 공간적 특징 정보 추출부(410), 시간적 특징 정보 추출부(420) 및 시공간적 특징 정보 추출부(430)를 포함할 수 있다.The spatial/temporal feature information extractor 210 may include a spatial feature information extractor 410, a temporal feature information extractor 420 and a spatiotemporal feature information extractor 430.

도 2 및 도 3에서는 하나의 공간적/시간적 특징 정보 추출부(210)가 기준 비디오 및 비교 비디오로부터 공간적/시간적 특징 정보를 추출하는 것으로 설명되었다. 반면, 2 개의 공간적/시간적 특징 정보 추출부들이 기준 비디오 및 비교 비디오에 대하여 각각 구성될 수도 있다.2 and 3, it has been described that one spatial/temporal feature information extractor 210 extracts spatial/temporal feature information from a reference video and a comparative video. On the other hand, two spatial/temporal feature information extractors may be configured for the reference video and the comparison video, respectively.

실시예에서는, 기준 비디오와 비교 비디오에 대해 하나의 공간적/시간적 특징 정보 추출부(210)가 사용되는 경우가 설명된다.In an embodiment, a case in which one spatial/temporal feature information extraction unit 210 is used for a reference video and a comparison video is described.

공간적 특징 정보 추출부(410), 시간적 특징 정보 추출부(420) 및 시공간적 특징 정보 추출부(430)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.The functions and operations of the spatial feature information extractor 410, the temporal feature information extractor 420, and the spatiotemporal feature information extractor 430 will be described in detail below.

도 5는 일 예에 따른 공간적/시간적 특징 정보 추출 방법의 흐름도이다.5 is a flowchart of a method for extracting spatial/temporal feature information according to an example.

도 3을 참조하여 전술된 단계(310)는 단계들(510, 520 및 530)을 포함할 수 있다. 이 때, 입력 비디오는 기준 비디오일 수 있다.The step 310 described above with reference to FIG. 3 may include steps 510, 520 and 530. At this time, the input video may be a reference video.

도 3을 참조하여 전술된 단계(330)는 단계들(510, 520 및 530)을 포함할 수 있다. 이 때, 입력 비디오는 비교 비디오일 수 있다.Step 330 described above with reference to FIG. 3 may include steps 510, 520 and 530. At this time, the input video may be a comparison video.

단계(510)에서, 공간적 특징 정보 추출부(410)는 입력 비디오의 각 영상으로부터 공간적 특징 정보를 추출할 수 있다. 공간적 특징 정보 추출부(410)는 입력 비디오의 각 영상 별로 정보를 추출할 수 있다.In step 510, the spatial feature information extractor 410 may extract spatial feature information from each image of the input video. The spatial feature information extractor 410 may extract information for each image of the input video.

단계(520)에서, 시간적 특징 정보 추출부(420)는 입력 비디오의 복수의 영상들로부터 시간적 특징 정보를 추출할 수 있다. 복수의 영상들은 픽처 그룹(Group Of Picture; GOP) 또는 특성 그룹(Group of Feature)일 수 있다. 공간적 특징 정보 추출부(410)는 입력 비디오의 복수의 영상들의 단위로 정보를 추출할 수 있다.In operation 520, the temporal characteristic information extraction unit 420 may extract temporal characteristic information from a plurality of images of the input video. The plurality of images may be a group of pictures (GOP) or a group of features. The spatial feature information extractor 410 may extract information in units of a plurality of images of the input video.

단계(530)에서, 시공간적 특징 정보 추출부(430)는 입력 비디오의 시공간 슬라이스(Spatio-Temporal Slice)로부터 시공간적 특징 정보를 추출할 수 있다. 시공간 슬라이스는 GOP가 공간적으로 분할된 구조일 수 있다. 시공간적 특징 정보 추출부(430)는 입력 비디오의 시공간 슬라이스의 단위로 정보를 추출할 수 있다.In step 530, the spatiotemporal feature information extractor 430 may extract the spatiotemporal feature information from the spatio-temporal slice of the input video. The space-time slice may be a structure in which the GOP is spatially divided. The spatiotemporal feature information extraction unit 430 may extract information in units of the spatiotemporal slice of the input video.

또는, 시공간적 특징 정보 추출부(430)는 공간적 특징 정보 추출부(410) 및 시간적 특징 정보 추출부(420)의 조합을 통해 구성될 수도 있다.Alternatively, the spatiotemporal feature information extractor 430 may be configured through a combination of the spatial feature information extractor 410 and the temporal feature information extractor 420.

도 6은 일 예에 따른 공간적/시간적 인지 민감 영역 검출부의 구조를 나타낸다.6 shows a structure of a spatial/temporal cognitive sensitive region detection unit according to an example.

공간적/시간적 인지 민감 영역 검출부(220)는 공간적 인지 민감 영역 검출부(610), 시간적 인지 민감 영역 검출부(620) 및 시공간적 인지 민감 영역 검출부(630)를 포함할 수 있다.The spatial/temporal cognitive sensitive region detector 220 may include a spatial cognitive sensitive region detector 610, a temporal cognitive sensitive region detector 620, and a spatiotemporal cognitive sensitive region detector 630.

도 2 및 도 3에서는 하나의 공간적/시간적 인지 민감 영역 검출부(220)가 기준 비디오 및 비교 비디오로부터 공간적/시간적 특징 정보를 추출하는 것으로 설명되었다. 반면, 2 개의 공간적/시간적 인지 민감 영역 검출부들이 기준 비디오 및 비교 비디오에 대하여 각각 구성될 수도 있다.2 and 3, it has been described that one spatial/temporal cognitive sensitive region detection unit 220 extracts spatial/temporal feature information from a reference video and a comparison video. On the other hand, two spatial/temporal cognitive sensitive region detection units may be configured for reference video and comparative video, respectively.

실시예에서는, 기준 비디오와 비교 비디오에 대해 하나의 공간적/시간적 인지 민감 영역 검출부(220)가 사용되는 경우가 설명된다.In an embodiment, a case in which one spatial/temporal cognitive sensitive region detection unit 220 is used for the reference video and the comparative video is described.

주관적 화질 평가에 있어서, 각 평가자는 기준 비디오의 화질 및 비교 비디오의 화질을 각각 평가할 수 있다. 이 때, 평가자는 각 비디오의 화질을 평가함에 있어서 비디오 내의 인지적으로 민감한 영역을 상대적으로 더 고려할 수 있다. 이러한 특성을 이용하기 위해, 공간적/시간적 인지 민감 영역 검출부(220)는 기준 비디오 및 비교 비디오에서 인지적으로 민감한 영역들을 각각 검출할 수 있다.In the subjective image quality evaluation, each evaluator can evaluate the quality of the reference video and the quality of the comparison video, respectively. At this time, the evaluator may consider the cognitively sensitive region in the video relatively in evaluating the quality of each video. In order to use this characteristic, the spatial/temporal cognitive sensitive region detection unit 220 may detect cognitively sensitive regions in the reference video and the comparative video, respectively.

공간적 인지 민감 영역 검출부(610), 시간적 인지 민감 영역 검출부(620) 및 시공간적 인지 민감 영역 검출부(630)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.The functions and operations of the spatial cognitive sensitive region detector 610, the temporal cognitive sensitive region detector 620, and the spatio-temporal cognitive sensitive region detector 630 will be described in detail below.

도 7은 일 예에 따른 공간적/시간적 인지 민감 영역 검출 방법의 흐름도이다.7 is a flowchart of a method for detecting a spatial/temporal cognitive sensitive area according to an example.

도 3을 참조하여 전술된 단계(320)는 단계들(710, 720 및 730)을 포함할 수 있다. 이 때, 입력 비디오는 기준 비디오일 수 있다.The step 320 described above with reference to FIG. 3 may include steps 710, 720 and 730. At this time, the input video may be a reference video.

도 3을 참조하여 전술된 단계(340)는 단계들(710, 720 및 730)을 포함할 수 있다. 이 때, 입력 비디오는 비교 비디오일 수 있다.Step 340 described above with reference to FIG. 3 may include steps 710, 720 and 730. At this time, the input video may be a comparison video.

단계(710)에서, 공간적 인지 민감 영역 검출부(610)는 입력 비디오의 영상으로부터 공간적 인지 민감 영역을 검출할 수 있다. 공간적 인지 민감 영역 검출부(610)는 입력 비디오의 각 영상 별로 정보를 추출할 수 있다.In operation 710, the spatial cognitive sensitive region detector 610 may detect a spatial cognitive sensitive region from an image of an input video. The spatial cognitive sensitive region detection unit 610 may extract information for each image of the input video.

단계(720)에서, 시간적 인지 민감 영역 검출부(620)는 입력 비디오의 복수의 영상들로부터 시간적 인지 민감 영역을 검출할 수 있다. 복수의 영상들은 픽처 그룹(Group Of Picture; GOP) 또는 특성 그룹(Group of Feature)일 수 있다. 시간적 인지 민감 영역 검출부(620)는 입력 비디오의 복수의 영상들의 단위로 정보를 추출할 수 있다.In operation 720, the temporal cognitive sensitive region detector 620 may detect the temporal cognitive sensitive region from a plurality of images of the input video. The plurality of images may be a group of pictures (GOP) or a group of features. The temporal cognitive sensitive region detection unit 620 may extract information in units of a plurality of images of the input video.

단계(730)에서, 시공간적 인지 민감 영역 검출부(630)는 입력 비디오의 시공간 슬라이스(Spatio-Temporal Slice)로부터 시공간적 인지 민감 영역을 검출할 수 있다. 시공간 슬라이스는 GOP가 공간적으로 분할된 구조일 수 있다. 시공간적 인지 민감 영역 검출부(630)는 입력 비디오의 시공간 슬라이스의 단위로 정보를 추출할 수 있다.In operation 730, the spatiotemporal cognitive sensitive region detector 630 may detect the spatiotemporal cognitive sensitive region from the spatio-temporal slice of the input video. The space-time slice may be a structure in which the GOP is spatially divided. The space-time cognitive sensitive region detector 630 may extract information in units of space-time slices of the input video.

또는, 시공간적 인지 민감 영역 검출부(630)는 공간적 인지 민감 영역 검출부(610) 및 시간적 인지 민감 영역 검출부(620)의 조합을 통해 구성될 수도 있다.Alternatively, the spatio-temporal cognitive sensitive region detector 630 may be configured through a combination of the spatial cognitive sensitive region detector 610 and the temporal cognitive sensitive region detector 620.

도 8은 일 예에 따른 공간적/시간적 인지 민감 영역 변화 계산부의 구조를 나타낸다.8 shows the structure of the spatial/temporal cognitive sensitive area change calculator according to an example.

공간적/시간적 인지 민감 영역 변화 계산부(230)는 공간적 인지 민감 영역 변화 계산부(810), 시간적 인지 민감 영역 변화 계산부(820) 및 시공간적 인지 민감 영역 변화 계산부(830)를 포함할 수 있다.The spatial/temporal cognitive sensitive region change calculator 230 may include a spatial cognitive sensitive region change calculator 810, a temporal cognitive sensitive region change calculator 820, and a spatiotemporal cognitive sensitive region change calculator 830. .

DSCQS과 같이, 기준 비디오 및 평가 비디오의 쌍에 대해서 화질들을 평가하는 주관적 화질 평가에서, 평가자는 2 개의 비디오들에서의 인지적으로 민감한 영역의 변화를 상대적으로 더 고려하여 화질을 평가할 수 있다. 이러한 특성을 이용하기 위해 공간적/시간적 인지 민감 영역 변화 계산부(230)는 기준 비디오의 인지적으로 민감한 영역 및 비교 비디오의 인지적으로 민감한 영역 간의 차이와 같은 변화를 검출할 수 있다.In the subjective picture quality evaluation, which evaluates picture quality on a pair of reference video and evaluation video, such as DSCQS, the evaluator can evaluate picture quality by relatively considering changes in a cognitively sensitive region in two videos. In order to use these characteristics, the spatial/temporal cognitive sensitive area change calculator 230 may detect a change such as a difference between the cognitively sensitive area of the reference video and the cognitively sensitive area of the comparison video.

공간적 인지 민감 영역 변화 계산부(810), 시간적 인지 민감 영역 변화 계산부(820) 및 시공간적 인지 민감 영역 변화 계산부(830)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.The functions and operations of the spatial cognitive sensitive region change calculator 810, the temporal cognitive sensitive region change calculator 820, and the spatiotemporal cognitive sensitive region change calculator 830 are described in detail below.

도 9은 일 예에 따른 공간적/시간적 인지 민감 영역 변화 계산 방법의 흐름도이다.9 is a flowchart of a method for calculating spatial/temporal cognitive sensitive area change according to an example.

도 3을 참조하여 전술된 단계(350)는 단계들(910, 920 및 930)을 포함할 수 있다.The step 350 described above with reference to FIG. 3 may include steps 910, 920 and 930.

단계(910)에서, 공간적 인지 민감 영역 변화 계산부(810)는 기준 비디오 및 비교 비디오의 영상들로부터 공간적 인지 민감 영역의 변화를 계산할 수 있다. 공간적 인지 민감 영역 변화 계산부(810)는 기준 비디오 및 비교 비디오의 각 영상 별로 정보를 추출할 수 있다.In operation 910, the spatial cognitive sensitive region change calculator 810 may calculate a change in the spatial cognitive sensitive region from images of the reference video and the comparative video. The spatial cognitive sensitive region change calculator 810 may extract information for each image of the reference video and the comparison video.

단계(920)에서, 시간적 인지 민감 영역 변화 계산부(820)는 기준 비디오 및 비교 비디오의 복수의 영상들로부터 시간적 인지 민감 영역의 변화를 계산할 수 있다. 복수의 영상들은 픽처 그룹(Group Of Picture; GOP) 또는 특성 그룹(Group of Feature)일 수 있다. 시간적 인지 민감 영역 변화 계산부(820)는 기준 비디오 및 비교 비디오의 복수의 영상들의 단위로 정보를 추출할 수 있다.In operation 920, the temporal cognitive sensitive region change calculator 820 may calculate a change in the temporal cognitive sensitive region from a plurality of images of the reference video and the comparison video. The plurality of images may be a group of pictures (GOP) or a group of features. The temporal cognitive sensitive region change calculator 820 may extract information in units of a plurality of images of a reference video and a comparison video.

단계(930)에서, 시공간적 인지 민감 영역 변화 계산부(830)는 기준 비디오 및 비교 비디오의 시공간 슬라이스(Spatio-Temporal Slice)로부터 시공간적 인지 민감 영역의 변화를 계산할 수 있다. 시공간 슬라이스는 GOP가 공간적으로 분할된 구조일 수 있다. 시공간적 인지 민감 영역 변화 계산부(830)는 기준 비디오 및 비교 비디오의 시공간 슬라이스의 단위로 정보를 추출할 수 있다.In operation 930, the spatial and temporal cognitive sensitive region change calculator 830 may calculate the spatial and temporal cognitive sensitive region change from the spatio-temporal slice of the reference video and the comparative video. The space-time slice may be a structure in which the GOP is spatially divided. The spatial and temporal cognitive sensitive region change calculator 830 may extract information in units of space-time slices of the reference video and the comparison video.

또는, 시공간적 인지 민감 영역 변화 계산부(830)는 공간적 인지 민감 영역 변화 계산부(810) 및 시간적 인지 민감 영역 변화 계산부(820)의 조합을 통해 구성될 수도 있다.Alternatively, the spatiotemporal cognitive sensitive region change calculator 830 may be configured through a combination of the spatial cognitive sensitive region change calculator 810 and the temporal cognitive sensitive region change calculator 820.

도 10은 일 예에 따른 왜곡 계산부의 구조를 나타낸다.10 shows a structure of a distortion calculation unit according to an example.

왜곡 계산부(240)는 공간적 왜곡 계산부(1010) 및 시간적 왜곡 계산부(1020)를 포함할 수 있다.The distortion calculation unit 240 may include a spatial distortion calculation unit 1010 and a temporal distortion calculation unit 1020.

공간적 왜곡 계산부(1010) 및 시간적 왜곡 계산부(1020)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.The functions and operations of the spatial distortion calculator 1010 and the temporal distortion calculator 1020 will be described in detail below.

도 11은 일 예에 따른 왜곡 계산 방법의 흐름도이다.11 is a flowchart of a distortion calculation method according to an example.

단계(1110)에서, 공간적 왜곡 계산부(1010)는 공간적/시간적 특징 정보 추출부(210)로부터의 결과, 공간적/시간적 인지 민감 영역 검출부(220)로부터의 결과 및 공간적/시간적 인지 민감 영역 변화 계산부(230)로부터의 결과를 이용하여 공간적 왜곡을 계산할 수 있다.In step 1110, the spatial distortion calculation unit 1010 calculates the result from the spatial/temporal feature information extraction unit 210, the result from the spatial/temporal cognitive sensitive area detection unit 220, and the spatial/temporal cognitive sensitive area change calculation The spatial distortion can be calculated using the result from the unit 230.

단계(1120)에서, 시간적 왜곡 계산부(1020)는 공간적/시간적 특징 정보 추출부(210)로부터의 결과, 공간적/시간적 인지 민감 영역 검출부(220)로부터의 결과 및 공간적/시간적 인지 민감 영역 변화 계산부(230)로부터의 결과를 이용하여 시간적 왜곡을 계산할 수 있다.In step 1120, the temporal distortion calculation unit 1020 calculates the result from the spatial/temporal feature information extraction unit 210, the result from the spatial/temporal cognitive sensitive region detection unit 220, and the spatial/temporal cognitive sensitive region change calculation The temporal distortion can be calculated using the result from the unit 230.

도 12는 일 예에 따른 공간적 왜곡 계산부의 구조를 나타낸다.12 shows a structure of a spatial distortion calculator according to an example.

공간적 왜곡 계산부(1010)는 기준 비디오 대표 공간적 정보 추출부(1210), 비교 비디오 대표 공간적 정보 추출부(1220) 및 대표 공간적 정보 차이 계산부(1230)를 포함할 수 있다.The spatial distortion calculation unit 1010 may include a reference video representative spatial information extraction unit 1210, a comparison video representative spatial information extraction unit 1220, and a representative spatial information difference calculation unit 1230.

기준 비디오 대표 공간적 정보 추출부(1210)는 기준 비디오 공간적 정보 결합부(1211), 공간적 풀링부(1212) 및 시간적 풀링부(1213)를 포함할 수 있다.The reference video representative spatial information extraction unit 1210 may include a reference video spatial information combination unit 1211, a spatial pooling unit 1212, and a temporal pooling unit 1213.

비교 비디오 대표 공간적 정보 추출부(1220)는 비교 비디오 공간적 정보 결합부(1221), 공간적 풀링부(1222) 및 시간적 풀링부(1223)를 포함할 수 있다.The comparison video representative spatial information extraction unit 1220 may include a comparison video spatial information combination unit 1221, a spatial pooling unit 1222, and a temporal pooling unit 1223.

기준 비디오 대표 공간적 정보 추출부(1210), 비교 비디오 대표 공간적 정보 추출부(1220) 및 대표 공간적 정보 차이 계산부(1230)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.The functions and operations of the reference video representative spatial information extraction unit 1210, the comparison video representative spatial information extraction unit 1220, and the representative spatial information difference calculation unit 1230 will be described in detail below.

도 13은 일 예에 따른 공간적 왜곡 계산의 흐름도이다.13 is a flowchart of spatial distortion calculation according to an example.

설명의 편의 상, 실시예는 공간적/시간적 특징 정보 추출부(210)가 공간적 특징 정보 추출부(410)만을 포함하는 경우에 대해 설명될 수 있다. 이러한 설명과는 달리, 공간적/시간적 특징 정보 추출부(210)는 공간적 특징 정보 추출부(410), 시간적 특징 정보 추출부(420) 및 시공간적 특징 정보 추출부(430) 중 2 개 이상을 포함할 수도 있다.For convenience of description, the embodiment may be described for a case where the spatial/temporal feature information extractor 210 includes only the spatial feature information extractor 410. Unlike this description, the spatial/temporal feature information extractor 210 may include two or more of the spatial feature information extractor 410, the temporal feature information extractor 420, and the spatiotemporal feature information extractor 430. It might be.

도 11을 참조하여 전술된 단계(1110)는 단계들(1310, 1320 및 1330)를 포함할 수 있다.The step 1110 described above with reference to FIG. 11 may include steps 1310, 1320 and 1330.

단계(1310)에서, 기준 비디오 대표 공간적 정보 추출부(1210)는 1) 기준 비디오로부터 추출된 기준 비디오의 공간적 특징 정보, 2) 기준 비디오로부터 검출된 기준 비디오의 공간적 인지 민감 영역 및 3) 기준 비디오 및 비교 비디오로부터 계산된 기준 비디오의 공간적 인지 민감 영역 및 비교 비디오의 공간적 인지 민감 영역 간의 변화를 사용하여 기준 비디오의 대표 공간적 정보를 추출할 수 있다.In step 1310, the reference video representative spatial information extraction unit 1210 includes: 1) spatial feature information of the reference video extracted from the reference video, 2) spatial perception sensitive areas of the reference video detected from the reference video, and 3) reference video And representative spatial information of the reference video may be extracted by using a change between the spatial cognitive sensitive area of the reference video and the spatial cognitive sensitive area of the comparative video calculated from the comparison video.

단계(1310)는 단계들(1311, 1312 및 1313)를 포함할 수 있다.Step 1310 may include steps 1311, 1312 and 1313.

단계(1311)에서, 기준 비디오 공간적 정보 결합부(1211)는 1) 기준 비디오의 공간적 특징 정보, 2) 기준 비디오의 공간적 인지 민감 영역 및 3) 공간적 인지 민감 영역의 변화를 결합함으로써 기준 비디오의 결합된 공간적 정보를 생성할 수 있다. 공간적 인지 민감 영역의 변화는 기준 비디오의 공간적 인지 민감 영역 및 비교 비디오의 공간적 인지 민감 영역 간의 변화를 의미할 수 있다.In step 1311, the reference video spatial information combining unit 1211 combines the reference video by combining 1) spatial feature information of the reference video, 2) spatial cognitive sensitive area of the reference video, and 3) spatial cognitive sensitive area change. Spatial information can be generated. The change of the spatial cognitive sensitive region may mean a change between the spatial cognitive sensitive region of the reference video and the spatial cognitive sensitive region of the comparative video.

말하자면, 기준 비디오의 결합된 공간적 정보는 영역에 대한 인지 중요도가 반영된 기준 비디오의 공간적 특징 정보일 수 있다.In other words, the combined spatial information of the reference video may be spatial feature information of the reference video in which the recognition importance of the region is reflected.

기준 비디오 공간적 정보 결합부(1211)는 1) 기준 비디오의 공간적 인지 민감 영역 및 2) 공간적 인지 민감 영역의 변화를 이용하여 기준 비디오의 공간적 특징 정보가 영역에 대한 인지 중요도를 반영하게 할 수 있다. 말하자면, 기준 비디오 공간적 정보 결합부(1211)에 의해 기준 비디오의 공간적 특징 정보가 영역 별로 인지 중요도를 반영할 수 있다.The reference video spatial information combining unit 1211 may allow the spatial characteristic information of the reference video to reflect the cognitive importance of the area by using 1) spatial cognitive sensitive areas of the reference video and 2) spatial cognitive sensitive areas. In other words, the spatial characteristic information of the reference video may be reflected per region by the reference video spatial information combining unit 1211.

단계(1312)에서, 공간적 풀링부(1212)는 기준 비디오 공간적 정보 결합부(1211)로부터의 결과인 기준 비디오의 결합된 공간적 정보로부터 기준 비디오의 대표 공간적 정보를 추출할 수 있다.In step 1312, the spatial pooling unit 1212 may extract representative spatial information of the reference video from the combined spatial information of the reference video resulting from the reference video spatial information combining unit 1211.

공간적 풀링부(1212)는 영역에 대한 인지 중요도가 반영된 기준 비디오의 공간적 특징 정보로부터 기준 비디오의 각 영상의 대표 공간적 정보를 추출할 수 있다.The spatial pooling unit 1212 may extract representative spatial information of each image of the reference video from the spatial feature information of the reference video in which the recognition importance of the region is reflected.

예를 들면, 영상의 대표 공간적 정보는 영상의 공간적 정보들의 평균 값일 수 있다.For example, representative spatial information of the image may be an average value of spatial information of the image.

예를 들면, 영상의 대표 공간적 정보는 영상의 공간적 정보들의 표준 편차일 수 있다.For example, the representative spatial information of the image may be a standard deviation of the spatial information of the image.

예를 들면, 공간적 풀링부(1212)는 자동 화질 측정 방법에서 사용되는 공간적 풀링을 사용하여 영상의 대표 공간적 정보를 추출할 수 있다.For example, the spatial pooling unit 1212 may extract representative spatial information of an image using spatial pooling used in an automatic image quality measurement method.

단계(1313)에서, 시간적 풀링부(1213)는 기준 비디오(또는, 기준 비디오의 GOP)의 각 영상의 대표 공간적 정보로부터 기준 비디오(또는, 기준 비디오의 GOP)의 대표 공간적 정보를 추출할 수 있다.In step 1313, the temporal pooling unit 1213 may extract representative spatial information of the reference video (or GOP of the reference video) from representative spatial information of each image of the reference video (or GOP of the reference video). .

예를 들면, 기준 비디오(또는, 기준 비디오의 GOP)의 대표 공간적 정보는 기준 비디오(또는, 기준 비디오의 GOP)의 공간적 정보들의 평균 값일 수 있다.For example, the representative spatial information of the reference video (or GOP of the reference video) may be an average value of spatial information of the reference video (or GOP of the reference video).

예를 들면, 기준 비디오(또는, 기준 비디오의 GOP)의 대표 공간적 정보는 기준 비디오(또는, 기준 비디오의 GOP)의 공간적 정보들의 표준 편차일 수 있다.For example, the representative spatial information of the reference video (or GOP of the reference video) may be a standard deviation of the spatial information of the reference video (or GOP of the reference video).

예를 들면, 시간적 풀링부(1213)는 자동 화질 측정 방법에서 사용되는 시간적 풀링을 사용하여 기준 비디오(또는, 기준 비디오의 GOP)의 대표 공간적 정보를 추출할 수 있다.For example, the temporal pooling unit 1213 may extract representative spatial information of the reference video (or GOP of the reference video) using temporal pooling used in the automatic image quality measurement method.

단계(1320)에서, 비교 비디오 대표 공간적 정보 추출부(1220)는 비교 비디오로부터 추출된 비교 비디오의 공간적 특징 정보, 2) 비교 비디오로부터 검출된 비교 비디오의 공간적 인지 민감 영역 및 3) 기준 비디오 및 비교 비디오로부터 계산된 기준 비디오의 공간적 인지 민감 영역 및 비교 비디오에서의 공간적 인지 민감 영역 간의 변화를 사용하여 비교 비디오의 대표 공간적 정보를 추출할 수 있다.In step 1320, the representative video representative spatial information extraction unit 1220 includes spatial characteristic information of the comparison video extracted from the comparison video, 2) spatial recognition sensitive area of the comparison video detected from the comparison video, and 3) reference video and comparison. Representative spatial information of the comparison video can be extracted using the change between the spatial cognitive sensitive area of the reference video calculated from the video and the spatial cognitive sensitive area of the comparison video.

단계(1320)는 단계들(1321, 1322 및 1323)를 포함할 수 있다.Step 1320 may include steps 1321, 1322, and 1323.

단계(1321)에서, 비교 비디오 공간적 정보 결합부(1221)는 1) 비교 비디오의 공간적 특징 정보, 2) 비교 비디오의 공간적 인지 민감 영역 및 3) 공간적 인지 민감 영역의 변화를 결합함으로써 비교 비디오의 결합된 공간적 정보를 생성할 수 있다. 공간적 인지 민감 영역의 변화는 기준 비디오의 공간적 인지 민감 영역 및 비교 비디오의 공간적 인지 민감 영역 간의 변화를 의미할 수 있다.In step 1321, the comparison video spatial information combining unit 1221 combines the comparison video by combining 1) spatial feature information of the comparison video, 2) spatial cognitive sensitivity area of the comparison video, and 3) spatial cognitive sensitivity area change. Spatial information can be generated. The change of the spatial cognitive sensitive region may mean a change between the spatial cognitive sensitive region of the reference video and the spatial cognitive sensitive region of the comparative video.

말하자면, 비교 비디오의 결합된 공간적 정보는 영역에 대한 인지 중요도가 반영된 비교 비디오의 공간적 특징 정보일 수 있다.In other words, the combined spatial information of the comparison video may be spatial feature information of the comparison video in which the recognition importance of the region is reflected.

비교 비디오 공간적 정보 결합부(1221)는 1) 비교 비디오의 공간적 인지 민감 영역 및 2) 공간적 인지 민감 영역의 변화를 이용하여 비교 비디오의 공간적 특징 정보가 영역에 대한 인지 중요도를 반영하게 할 수 있다. 말하자면, 비교 비디오 공간적 정보 결합부(1221)에 의해 비교 비디오의 공간적 특징 정보가 영역 별로 인지 중요도를 반영할 수 있다.The comparison video spatial information combining unit 1221 may use 1) the spatial cognitive sensitive area of the comparison video and 2) the spatial cognitive sensitivity area to change the spatial feature information of the comparative video to reflect the cognitive importance of the area. In other words, the spatial characteristic information of the comparative video may be reflected per region by the comparison video spatial information combining unit 1221.

단계(1322)에서, 공간적 풀링부(1222)는 비교 비디오 공간적 정보 결합부(1221)로부터의 결과인 비교 비디오의 결합된 공간적 정보로부터 비교 비디오의 대표 공간적 정보를 추출할 수 있다.In step 1322, the spatial pooling unit 1222 may extract representative spatial information of the comparison video from the combined spatial information of the comparison video, which is the result from the comparison video spatial information combining unit 1221.

영역에 대한 인지 중요도가 반영된 비교 비디오의 공간적 특징 정보로부터 비교 비디오의 각 영상의 대표 공간적 정보를 추출할 수 있다.Representative spatial information of each image of the comparison video may be extracted from spatial feature information of the comparison video in which the recognition importance of the region is reflected.

예를 들면, 공간적 풀링부(1222)는 자동 화질 측정 방법에서 사용되는 공간적 풀링을 사용하여 영상의 대표 공간적 정보를 추출할 수 있다.For example, the spatial pooling unit 1222 may extract representative spatial information of an image using spatial pooling used in an automatic image quality measurement method.

단계(1323)에서, 시간적 풀링부(1223)는 비교 비디오(또는, 비교 비디오의 GOP)의 각 영상의 대표 공간적 정보로부터 비교 비디오(또는, 비교 비디오의 GOP)의 대표 공간적 정보를 추출할 수 있다.In step 1323, the temporal pooling unit 1223 may extract representative spatial information of the comparison video (or GOP of the comparison video) from representative spatial information of each image of the comparison video (or GOP of the comparison video). .

예를 들면, 비교 비디오(또는, 비교 비디오의 GOP)의 대표 공간적 정보는 비교 비디오(또는, 비교 비디오의 GOP)의 공간적 정보들의 평균 값일 수 있다.For example, representative spatial information of the comparison video (or GOP of the comparison video) may be an average value of spatial information of the comparison video (or GOP of the comparison video).

예를 들면, 비교 비디오(또는, 비교 비디오의 GOP)의 대표 공간적 정보는 비교 비디오(또는, 비교 비디오의 GOP)의 공간적 정보들의 표준 편차일 수 있다.For example, the representative spatial information of the comparison video (or GOP of the comparison video) may be a standard deviation of the spatial information of the comparison video (or GOP of the comparison video).

예를 들면, 시간적 풀링부(1223)는 자동 화질 측정 방법에서 사용되는 시간적 풀링을 사용하여 비교 비디오(또는, 비교 비디오의 GOP)의 대표 공간적 정보를 추출할 수 있다.For example, the temporal pooling unit 1223 may extract representative spatial information of the comparison video (or GOP of the comparison video) using temporal pooling used in the automatic image quality measurement method.

단계(1330)에서, 대표 공간적 정보 차이 계산부(1230)는 기준 비디오의 대표 공간적 정보 및 비교 비디오의 대표 공간적 정보 간의 차이를 계산할 수 있다.In operation 1330, the representative spatial information difference calculator 1230 may calculate a difference between the representative spatial information of the reference video and the representative spatial information of the comparison video.

도 3을 참조하여 전술된 공간적 왜곡은 기준 비디오(또는, 기준 비디오의 영상)의 대표 공간적 정보 및 비교 비디오(또는, 비교 비디오의 영상)의 대표 공간적 정보 간의 차이일 수 있다.The spatial distortion described above with reference to FIG. 3 may be a difference between representative spatial information of a reference video (or an image of the reference video) and representative spatial information of a comparison video (or image of the comparison video).

도 14는 일 예에 따른 공간적 특징 정보 추출부의 구조를 나타낸다.14 shows a structure of a spatial feature information extracting unit according to an example.

공간적 특징 정보 추출부(410)는 수평/수직 특징 검출부(1410) 및 수평/수직 이외 방향 특징 검출부(1420)를 포함할 수 있다.The spatial feature information extraction unit 410 may include a horizontal/vertical feature detection unit 1410 and a horizontal/vertical feature detection unit 1420.

수평/수직 특징 검출부(1410) 및 수평/수직 이외 방향 특징 검출부(1420) 의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.The functions and operations of the horizontal/vertical feature detection unit 1410 and the horizontal/vertical feature detection unit 1420 will be described in detail below.

도 15는 일 예에 따른 공간적 특징 정보 추출의 흐름도이다.15 is a flowchart of spatial feature information extraction according to an example.

도 5를 참조하여 전술된 단계(510)는 단계들(1510 및 1520)를 포함할 수 있다.Step 510 described above with reference to FIG. 5 may include steps 1510 and 1520.

단계(1510)에서, 수평/수직 특징 검출부(1410)는 입력 비디오의 각 영상으로부터 공간적 특징 정보의 수평/수직 특징을 검출할 수 있다.In step 1510, the horizontal/vertical feature detector 1410 may detect horizontal/vertical features of spatial feature information from each image of the input video.

수평/수직 특징은 수평 특징 및 수직 특징 중 하나 이상을 의미할 수 있다.The horizontal/vertical feature may mean one or more of a horizontal feature and a vertical feature.

일반적으로, 물체의 경계와 같이, 큰 대비(contrast)를 갖는 영역은 높은 인지적 민감도를 가질 수 있다.Generally, an area having a large contrast, such as an object boundary, may have high cognitive sensitivity.

이러한 특징에 따라, 수평/수직 특징 검출부(1410)는 영상 처리의 소벨(sobell) 연산과 같은 엣지 검출 방법을 사용하여 영상으로부터 인지적 민감도가 높은 영역을 도출할 수 있다.According to these characteristics, the horizontal/vertical feature detector 1410 may derive a region having high cognitive sensitivity from the image using an edge detection method such as a sobell operation of image processing.

소벨 연산은 일 예이며, 대비 민감도가 높은 영역을 추출하는 다른 연산이 엣지 검출 방법으로서 사용될 수 있다.The Sobel operation is an example, and another operation for extracting a region having high contrast sensitivity may be used as an edge detection method.

수평/수직 특징 검출부(1410)는 엣지 검출 방법을 수평 방향 및 수직 방향으로 각각 적용할 수 있다. 수평/수직 특징 검출부(1410)는 영상에 대하여 수평 방향의 엣지 검출 방법을 사용하여 수평 방향 엣지 값의 정보를 도출할 수 있다. 또한, 수평/수직 특징 검출부(1410)는 영상에 대하여 수직 방향의 엣지 검출 방법을 사용하여 수직 방향 엣지 값의 정보를 도출할 수 있다.The horizontal/vertical feature detection unit 1410 may apply the edge detection method in the horizontal direction and the vertical direction, respectively. The horizontal/vertical feature detection unit 1410 may derive information on a horizontal edge value using a horizontal edge detection method for an image. In addition, the horizontal/vertical feature detection unit 1410 may derive information on a vertical edge value using an edge detection method in a vertical direction with respect to an image.

수평/수직 특징 검출부(1410)는 수평 방향 엣지 값 및 수직 방향 엣지 값 사용하여 최종 엣지 정보를 도출할 수 있다.The horizontal/vertical feature detection unit 1410 may derive the final edge information using a horizontal edge value and a vertical edge value.

예를 들면, 최종 엣지 정보는 수평 방향 엣지 값의 제곱 및 수직 방향 엣지 값의 제곱의 합의 제곱근일 수 있다. For example, the final edge information may be a square root of a sum of squares of horizontal edge values and squares of vertical edge values.

단계(1520)에서, 수평/수직 이외 방향 특징 검출부(1420)는 입력 비디오의 각 영상으로부터 공간적 특징 정보의 수평/수직 이외 방향에 대한 특징을 검출할 수 있다.In step 1520, the feature detection unit 1420 other than the horizontal/vertical direction may detect a feature of the spatial feature information in a direction other than the horizontal/vertical direction from each image of the input video.

수평/수직 엣지에 더 민감하다는 인간의 특성과, 부호화로 인한 블록킹 아티팩트가 수평/수직 엣지 형태로 생성된다는 것을 고려하기 위해, 수평/수직 특징 검출부(1410)는 엣지 검출 방법에 의해 생성된 검출 결과로부터 수평/수직 성분으로 강한 엣지를 분리할 수 있다. 수평/수직 특징 검출부(1410)에 의해 검출된 수평/수직 특징은 분리된 수평/수직 성분으로 강한 엣지일 수 있다.In order to consider human characteristics that are more sensitive to horizontal/vertical edges, and blocking artifacts due to encoding, are generated in the form of horizontal/vertical edges, the horizontal/vertical feature detection unit 1410 detects results generated by the edge detection method From the horizontal/vertical component, strong edges can be separated. The horizontal/vertical feature detected by the horizontal/vertical feature detector 1410 may be a strong edge as a separate horizontal/vertical component.

또는, 수평/수직 특징 검출부(1410)는 최종 엣지 정보로부터 중간에 도출된 수직 방향 엣지 값이 문턱치(threshold)의 이상인 영역 및 수평 방향 엣지 값이 문턱치의 이상인 영역을 분리할 수 있다. 수평/수직 특징 검출부(1410)에 의해 검출된 수평/수직 특징은 이러한 분리된 영역일 수 있다. 수평/수직 이외 방향 특징 검출부(1420)에 의해 검출된 수평/수직 이외 방향 특징은 최종 엣지 정보에서 상기의 분리된 영역 외의 나머지일 수 있다.Alternatively, the horizontal/vertical feature detector 1410 may separate a region in which a vertical edge value derived in the middle from the final edge information is greater than or equal to a threshold and a region in which horizontal edge values are greater than or equal to a threshold. The horizontal/vertical feature detected by the horizontal/vertical feature detection unit 1410 may be such a separated area. The horizontal/vertical direction feature detected by the non-horizontal/vertical feature detection unit 1420 may be the rest of the separated area in the final edge information.

예를 들면, 수평/수직 특징은 수평-수직 엣지 맵(Horizontal-Vertical edge Map; HVM)일 수 있다. 수평/수직 이외 방향에 대한 특징은 (수평/수직 엣지에 대한 정보가 제외된) 엣지 맵(Edge Map; EM)일 수 있다.For example, the horizontal/vertical feature may be a horizontal-vertical edge map (HVM). A feature for a direction other than horizontal/vertical may be an edge map (EM) (excluding information on a horizontal/vertical edge).

도 16은 일 예에 따른 공간적 인지 민감 영역 검출부의 구조를 나타낸다.16 shows a structure of a spatial cognitive sensitive region detection unit according to an example.

공간적 인지 민감 영역 검출부(610)는 공간 복잡도(spatial randomness) 계산부(1610) 및 배경 평탄도 계산부(1620)를 포함할 수 있다.The spatial cognitive sensitive area detection unit 610 may include a spatial randomness calculation unit 1610 and a background flatness calculation unit 1620.

공간 복잡도 계산부(1610) 및 배경 평탄도 계산부(1620)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.The functions and operations of the spatial complexity calculator 1610 and the background flatness calculator 1620 are described in detail below.

도 17은 일 예에 따른 공간적 인지 민감 영역 검출의 흐름도이다.17 is a flowchart of spatial cognitive sensitive area detection according to an example.

도 7을 참조하여 전술된 단계(710)는 단계들(1710 및 1720)을 포함할 수 있다.Step 710 described above with reference to FIG. 7 may include steps 1710 and 1720.

단계(1710)에서, 공간 복잡도 계산부(1610)는 입력 비디오의 영상의 픽셀 또는 블록의 공간 복잡도를 계산할 수 있다. 공간 복잡도 계산부(1610)는 입력 비디오의 영상에 대하여, 픽셀 또는 블록의 단위로 공간 복잡도를 계산할 수 있다. In step 1710, the spatial complexity calculator 1610 may calculate a spatial complexity of pixels or blocks of an image of an input video. The spatial complexity calculator 1610 may calculate the spatial complexity in units of pixels or blocks for an image of the input video.

공간 복잡도 계산부(1610)는 픽셀 또는 블록의 주변 유사성을 사용하여 픽셀 또는 블록의 공간 복잡도를 계산할 수 있다.The spatial complexity calculator 1610 may calculate the spatial complexity of the pixel or block using the similarity of the surroundings of the pixel or block.

설명의 편의를 위해, 공간 복잡도 계산부(1610)에 의한 처리의 결과를 공간 복잡도 맵(Spatial Randomness Map; SRM)이라고 칭할 수 있다. 말하자면, 공간 복잡도 계산부(1610)는 입력 비디오의 영상의 픽셀들 또는 제1 블록들의 공간 복잡도들을 계산함으로써 SRM을 생성할 수 있다. SRM은 입력 비디오의 영상의 픽셀들 또는 제1 블록들의 공간 복잡도들을 포함할 수 있다.For convenience of description, the result of processing by the spatial complexity calculator 1610 may be referred to as a spatial randomness map (SRM). In other words, the spatial complexity calculator 1610 may generate an SRM by calculating spatial complexity of pixels or first blocks of an image of an input video. The SRM may include spatial complexity of pixels or first blocks of an image of the input video.

단계(1720)에서, 배경 평탄도 계산부(1620)는 입력 비디오의 영상의 픽셀 또는 블록의 배경 평탄도를 계산할 수 있다. 배경 평탄도 계산부(1620)는 입력 비디오의 영상에 대하여, 블록의 단위로 배경 평탄도를 계산할 수 있다.In operation 1720, the background flatness calculator 1620 may calculate a background flatness of pixels or blocks of an image of an input video. The background flatness calculator 1620 may calculate a background flatness in units of blocks for an image of an input video.

배경 평탄도 계산부(1620)에서 단위로 사용되는 제2 블록의 크기는 공간 복잡도 계산부(1610)에서 단위로 사용되는 제1 블록의 크기보다 더 클 수 있다.The size of the second block used as a unit in the background flatness calculator 1620 may be larger than the size of the first block used as a unit in the spatial complexity calculator 1610.

배경 평탄도 계산부(1620)에 의한 처리의 결과를 평탄도 맵(Smoothness Map; SM)이라고 칭할 수 있다. 말하자면, 배경 평탄도 계산부(1620)는 입력 비디오의 영상의 제2 블록들의 배경 평탄도들을 계산함으로써 SM을 생성할 수 있다. SM은 입력 비디오의 영상의 픽셀들 또는 블록들의 배경 평탄도들을 포함할 수 있다.The result of the processing by the background flatness calculating unit 1620 may be referred to as a smoothness map (SM). In other words, the background flatness calculator 1620 may generate an SM by calculating the background flatness of the second blocks of the image of the input video. SM may include background flatness of pixels or blocks of an image of the input video.

도 18은 일 예에 따른 영상의 공간 복잡도 측정을 나타낸다.18 shows a measurement of spatial complexity of an image according to an example.

아래에서는, 공간 복잡도 계산부(1610)의 동작이 설명된다.In the following, the operation of the spatial complexity calculator 1610 is described.

도 18에서는, 공간 복잡도의 측정을 위해 사용되는 예측 모델이 예시되었다. 영상에서, 중앙 영역 Y는 주변 영역 X들로부터 예측될 수 있다. 도 18에서, 중앙 영역 Y는 내부가 채워진 원으로 도시되었고, 주변 영역 X들은 내부가 사선으로 채워진 원들로 도시되었다. 화살표는 예측을 나타낼 수 있다. 이 때, 화살표가 출발하는 영역이 화살표가 가리키는 영역의 예측을 위해 사용될 수 있다.In FIG. 18, a prediction model used for measurement of spatial complexity is illustrated. In the image, the central region Y can be predicted from the peripheral regions X. In FIG. 18, the central area Y is shown as a circle filled inside, and the peripheral areas X are shown as circles filled with a diagonal line inside. Arrows can indicate predictions. At this time, the region from which the arrow starts may be used for prediction of the region indicated by the arrow.

예를 들면, 영역은 픽셀 또는 블록일 수 있다. 또한, 영역은 공간 복잡도 계산부(1610)가 공간 복잡도를 측정하는 단위일 수 있다.For example, an area can be a pixel or block. In addition, the area may be a unit in which the spatial complexity calculator 1610 measures spatial complexity.

공간 복잡도 계산부(1610)는 영역이 블록인 경우, 각 블록이 공간 복잡도 맵의 하나의 픽셀에 대응하도록 입력 비디오의 영상에 대한 다운샘플링을 수행할 수 있다.When the region is a block, the spatial complexity calculator 1610 may downsample an image of the input video so that each block corresponds to one pixel of the spatial complexity map.

이하에서는, 영역이 픽셀인 경우, 즉 공간 복잡도 계산부(1610)가 픽셀의 단위로 공간 복잡도를 측정하는 경우가 설명된다.Hereinafter, a case where the area is a pixel, that is, a case where the spatial complexity calculator 1610 measures spatial complexity in units of pixels will be described.

도 18에서 도시된 것과 같이, 공간 복잡도 계산부(1610)는 주변 영역 X로부터 중앙 영역 Y를 예측할 수 있다. 이러한 예측은 아래의 수식 1과 같이 표현될 수 있다. 주변 영역 X는 중앙 영역 Y를 둘러싸고 있는 인근의 픽셀들의 집합을 의미할 수 있다.As illustrated in FIG. 18, the spatial complexity calculator 1610 may predict the central region Y from the peripheral region X. This prediction can be expressed as Equation 1 below. The peripheral area X may mean a set of neighboring pixels surrounding the central area Y.

[수식 1][Equation 1]

u는 공간적 위치를 의미할 수 있다. H는 최적의 예측을 제공하는 변환 행렬일 수 있다. u may mean a spatial location. H may be a transformation matrix that provides optimal prediction.

기호 "^"는 예측에 의해 값이 생성됨을 나타낼 수 있다.The symbol "^" may indicate that a value is generated by prediction.

예를 들면, H는 아래의 수식 2와 같이 최소 평균 에러의 최적화 방법을 통해 획득될 수 있다.For example, H can be obtained through the optimization method of the minimum average error as shown in Equation 2 below.

[수식 2][Equation 2]

R _XY 는 X 및 Y의 상호 상관(cross-correlation) 행렬일 수 있다. R _X 는 X의 상관계수(correlation) 행렬일 수 있다. R _XY may be a cross-correlation matrix of X and Y. R _X may be a correlation matrix of X.

R- ¹ _X 은 R _X 의 역행렬일 수 있다. R- ¹ _X may be the inverse matrix of R _X.

R- ¹ _X 은 아래의 수식 3과 같이 근사된 의사-역 행렬(approximated pseudo-inverse matrix) 기법을 사용하여 획득될 수 있다. R- ¹ _X can be obtained using an approximated pseudo-inverse matrix technique as shown in Equation 3 below.

[수식 3][Equation 3]

최종적으로, 공간 복잡도 계산부(1610)는 아래의 수식 4를 이용하여 SRM을 획득할 수 있다. SR(u)는 공간적 랜덤의 정도를 나타낼 수 있다.Finally, the spatial complexity calculator 1610 can obtain the SRM using Equation 4 below. SR ( u ) may indicate the degree of spatial randomness.

[수식 4][Equation 4]

u는 (x,y) 픽셀의 위치 정보를 나타내는 벡터일 수 있다. u may be a vector representing position information of ( x , y ) pixels.

수식 4 등을 통해, 공간 복잡도 계산부(1610)는 입력 영상 내의 공간적 랜덤의 정도를 픽셀의 단위로 수치적으로 파악할 수 있다.Through Equation 4, the spatial complexity calculator 1610 may numerically grasp the degree of spatial randomness in the input image in units of pixels.

아래에서는, 배경 평탄도 계산부(1620)의 동작이 설명된다.In the following, the operation of the background flatness calculating unit 1620 is described.

배경 평탄도 계산부(1620)는 블록의 단위로 배경 평탄도를 계산할 수 있다.The background flatness calculating unit 1620 may calculate a background flatness in units of blocks.

SRM의 계산에 있어서 사용된 단위가 제1 블록인 경우, 배경 평탄도 계산부(1620)의 계산의 단위인 제2 블록은 SRM의 계산에 있어서 사용된 단위인 블록에 비해 가로 및 세로의 각각으로 2 배 이상으로 더 커야 할 수 있다. 말하자면, 공간 복잡도 계산부(1610)의 계산의 단위인 제1 블록의 크기가 (w, h)인 경우 배경 평탄도 계산부(1620)의 계산의 단위인 제2 블록은 (2w, 2h) 보다 더 클 수 있다.When the unit used in the calculation of the SRM is the first block, the second block that is the unit of calculation of the background flatness calculation unit 1620 is horizontal and vertical compared to the block that is the unit used in the calculation of the SRM. It may have to be more than 2 times larger. That is, when the size of the first block that is a unit of calculation of the spatial complexity calculator 1610 is ( w , h ), the second block that is a unit of calculation of the background flatness calculator 1620 is (2 w , 2 h) ).

블록의 배경 평탄도를 계산함에 있어서, 배경 평탄도 계산부(1620)는 상기의 블록의 내부에 대응하는 SRM의 픽셀들의 픽셀 값들을 사용하여 블록의 배경 평탄도를 계산할 수 있다. 블록의 배경 평탄도는 아래의 수식 5에 따라서 계산될 수 있다.In calculating the background flatness of the block, the background flatness calculator 1620 may calculate the background flatness of the block using pixel values of pixels of the SRM corresponding to the inside of the block. The background flatness of the block can be calculated according to Equation 5 below.

[수식 5][Equation 5]

N _lc 는 계산의 대상인 블록 내에서의 임계치(threshold) 보다 더 낮은 공간 복잡도(spatial randomness)를 갖는 픽셀(즉, 낮은 복잡도를 갖는 픽셀)의 개수를 나타낼 수 있다. W _b ²는 블록의 면적(즉, 블록 내의 픽셀들의 개수)를 나타낼 수 있다. N _lc may indicate the number of pixels having a spatial randomness lower than a threshold in a block to be calculated (ie, pixels having a low complexity). W _b ² may represent an area of the block (ie, the number of pixels in the block).

아래에서는, 기준 비디오 공간적 정보 결합부(1211) 및 비교 비디오 공간적 정보 결합부(1221)의 동작이 설명된다. 이하에서, 기준 비디오 공간적 정보 결합부(1211) 및 비교 비디오 공간적 정보 결합부(1221)는 공간적 정보 결합부로 약술된다. 공간적 정보 결합부에 대한 설명은 기준 비디오 공간적 정보 결합부(1211) 및 비교 비디오 공간적 정보 결합부(1221)에 각각 적용될 수 있다. 공간적 정보 결합부로의 입력 비디오는 기준 비디오 공간적 정보 결합부(1211)에 대해서는 기준 비디오로 해석될 수 있고, 비교 비디오 공간적 정보 결합부(1221)에 대해서는 비교 비디오로 해석될 수 있다.In the following, operations of the reference video spatial information combining unit 1211 and the comparison video spatial information combining unit 1221 are described. Hereinafter, the reference video spatial information combining unit 1211 and the comparative video spatial information combining unit 1221 are abbreviated as spatial information combining units. The description of the spatial information combining unit may be applied to the reference video spatial information combining unit 1211 and the comparison video spatial information combining unit 1221, respectively. The input video to the spatial information combining unit may be interpreted as a reference video for the reference video spatial information combining unit 1211, and may be interpreted as a comparison video for the comparison video spatial information combining unit 1221.

공간적 정보 결합부는 입력된 3 개의 정보들을 하나의 정보로 결합할 수 있다. 여기에서 입력된 3 개의 정보들은 1) 입력 비디오의 공간적 특징 정보, 2) 입력 비디오의 공간적 인지 민감 영역 및 3) 공간적 인지 민감 영역의 변화일 수 있다.The spatial information combining unit may combine the inputted three information into one information. The three pieces of information inputted here may be 1) spatial characteristic information of the input video, 2) spatial cognitive sensitive area of the input video, and 3) spatial cognitive sensitive area change.

공간적 정보 결합부는 인지적 민감 영역의 공간적 특징을 보다 더 강조할 수 있고, 인지적 둔감 영역의 공간적 특징은 더 약하게 할 수 있다. 또한, 기준 비디오 및 비교 비디오에서 공간적 인지 민감 영역이 변경되었다는 것은 중요한 정보가 손실되었거나 화질에 크게 영향을 위치는 아티팩트(artifact)들이 발생하였다는 것을 의미할 수 있다. 따라서, 기준 비디오 및 비교 비디오에서 공간적 인지 민감 영역이 변경된 경우 공간적 정보 결합부는 공간적 인지 민감 영역이 변경된 영역의 공간적 특징을 강조할 수 있다.The spatial information combining unit may further emphasize the spatial characteristics of the cognitive sensitive region, and the spatial characteristics of the cognitive desensitization region may be weaker. In addition, a change in the spatial cognitive sensitive area in the reference video and the comparative video may mean that important information has been lost or artifacts that significantly affect image quality have occurred. Accordingly, when the spatial cognitive sensitive region is changed in the reference video and the comparative video, the spatial information combining unit may emphasize the spatial characteristics of the spatially cognitive sensitive region.

공간적 정보 결합부로의 입력은 전술된 1) HVM(말하자면, 수평/수직 성분 특징), 2) EM(말하자면, 수평/수직 이외 방향 특징), 3) SRM(말하자면, 공간 복잡도) 및 4) SM(말하자면, 배경 평탄도)일 수 있다. 공간적 정보 결합부로의 입력이 HVM, EM, SRM 및 SM인 경우, 아래의 수식 6 내지 수식 9가 성립할 수 있다.The inputs to the spatial information combining section are 1) HVM (that is, horizontal/vertical component feature), 2) EM (that is, feature other than horizontal/vertical feature), 3) SRM (that is, spatial complexity) and 4) SM ( That is, background flatness). When the inputs to the spatial information combining unit are HVM, EM, SRM, and SM, Equations 6 to 9 below can be established.

[수식 6][Equation 6]

[수식 7][Equation 7]

[수식 8][Equation 8]

[수식 9][Equation 9]

수식 6 및 수식 7은 기준 비디오의 영상에 대한 공간적 정보 결합을 나타낼 수 있다.Equations 6 and 7 may represent a combination of spatial information on an image of a reference video.

수식 8 및 수식 9는 비교 비디오의 영상에 대한 공간적 정보 결합을 나타낼 수 있다.Equations 8 and 9 may represent a combination of spatial information on an image of a comparison video.

ref는 기준 비디오의 영상을 나타낼 수 있다. comp는 비교 비디오의 영상을 나타낼 수 있다. x, y는 영상의 픽셀의 좌표들을 나타낼 수 있다. SVW는 기존의 공간적 정보에 대한 인지적 가중치(또는, 공간적 시각적 가중치(spatial visual weighting))일 수 있다. ref may represent an image of a reference video. comp may represent an image of a comparison video. x and y may represent the coordinates of the pixel of the image. The SVW may be cognitive weight (or spatial visual weighting) for existing spatial information.

기호 "'"는 갱신된 값들을 나타낼 수 있다. 예를 들면, EM' _ref (x, y)는 픽셀에 대한 갱신된 EM의 값을 나타낼 수 있다. 여기에서, 상기의 픽셀은 기준 비디오의 영상 내의, 좌표들이 (x, y)인 픽셀일 수 있다.The symbol "'" may indicate updated values. For example, EM ' _ref ( x , y ) may indicate the updated value of EM for a pixel. Here, the pixel may be a pixel having coordinates ( x , y ) in an image of the reference video.

또는, 기호 "'"는 SVW가 가해짐에 따라 갱신된 것을 나타낼 수 있다. 예를 들면, EM' _ref 는 기준 비디오의 영상에 대한 EM에 가중치를 부여함으로써 생성된 가중치가 부여된 EM(weighted EM)일 수 있다.Alternatively, the symbol "'" may indicate that the SVW has been updated as it is applied. For example, EM _'ref may be an EM (EM weighted) the generated weighted by weighting the EM for the image of the reference video.

기호 "·"는 곱을 의미할 수 있다. 기호 "·"는 2 개의 맵들(말하자면, 행렬들)의 동일한 위치의 값들(말하자면, 행렬의 요소들)을 곱하는 것을 의미할 수 있다.The symbol "·" can mean a product. The symbol "·" may mean multiplying the values of the two positions (in other words, matrices) at the same position (in other words, elements of the matrix).

공간적 정보 결합부는 기준 비디오에 대한 EM에 인지적 가중치를 적용하여 기준 비디오에 대한 EM을 갱신할 수 있다.The spatial information combining unit may update the EM for the reference video by applying a cognitive weight to the EM for the reference video.

공간적 정보 결합부는 기준 비디오에 대한 HVM에 인지적 가중치를 적용하여 기준 비디오에 대한 HVM을 갱신할 수 있다.The spatial information combining unit may update the HVM for the reference video by applying cognitive weights to the HVM for the reference video.

공간적 정보 결합부는 비교 비디오에 대한 EM에 인지적 가중치를 적용하여 비교 비디오에 대한 EM을 갱신할 수 있다.The spatial information combining unit may update the EM for the comparison video by applying a cognitive weight to the EM for the comparison video.

공간적 정보 결합부는 비교 비디오에 대한 HVM에 인지적 가중치를 적용하여 비교 비디오에 대한 HVM을 갱신할 수 있다.The spatial information combining unit may update the HVM for the comparison video by applying cognitive weights to the HVM for the comparison video.

예를 들면, 수식 6 및 수식 7에서 사용된 기준 비디오의 영상의 공간적 특징 정보에 대한 인지적 가중치는 아래의 수식 10과 같이 정의될 수 있다.For example, the cognitive weight for spatial feature information of the image of the reference video used in Equations 6 and 7 may be defined as Equation 10 below.

[수식 10][Equation 10]

"w"는 인지적 가중치들을 나타낼 수 있다." w " may indicate cognitive weights.

설명의 편의 상, 수식 6 및 수식 7에서 동일한 인지적 가중치가 사용되었으나, 수식 6 및 수식 7에서 서로 상이한 인지적 가중치들이 각각 사용될 수 있다.For convenience of explanation, the same cognitive weights are used in Equations 6 and 7, but different cognitive weights can be used in Equations 6 and 7, respectively.

예를 들면, 수식 8 및 수식 9에서 사용된 비교 비디오의 영상의 공간적 특징 정보에 대한 인지적 가중치는 아래의 수식 11과 같이 정의될 수 있다.For example, the cognitive weight for spatial feature information of the image of the comparison video used in Equations 8 and 9 may be defined as Equation 11 below.

[수식 11][Equation 11]

설명의 편의 상, 수식 8 및 수식 9에서 동일한 인지적 가중치가 사용되었으나, 수식 8 및 수식 9에서 서로 상이한 인지적 가중치들이 각각 사용될 수 있다.For convenience of explanation, the same cognitive weights are used in Equations 8 and 9, but different cognitive weights can be used in Equations 8 and 9, respectively.

수식 10 및 수식 11의 인지적 가중치는 공간적 인지 민감 영역의 변화가 반영되지 않은 것일 수 있다. 아래의 수식 12 및 수식 13은 공간적 인지 민감 영역의 변화가 결합된 갱신된 인지적 가중치를 나타낼 수 있다. 말하자면, 수식 12 및 13은 공간적 인지 민감 영역의 변화가 반영된 인지적 가중치를 나타낼 수 있다.The cognitive weights of Equations 10 and 11 may not reflect changes in the spatial cognitive sensitive area. Equations 12 and 13 below may represent updated cognitive weights that combine changes in spatial cognitive sensitivity regions. In other words, Equations 12 and 13 may represent cognitive weights reflecting changes in spatial cognitive sensitivity regions.

[수식 12] [Equation 12]

[수식 13][Equation 13]

전술된 수식 10, 수식 11, 수식 12 및 수식 13에 있어서, 기준 비디오의 영상 및 비교 비디오의 영상에 대하여 동일한 인지 가중치가 적용된 것으로 설명되었다. 이와는 달리 기준 비디오의 영상 및 비교 비디오의 영상에 대하여 서로 상이한 인지 가중치들이 각각 적용될 수도 있다.In Equation 10, Equation 11, Equation 12, and Equation 13 described above, it has been described that the same cognitive weight is applied to an image of a reference video and an image of a comparison video. Alternatively, different cognitive weights may be applied to the video of the reference video and the video of the comparison video, respectively.

도 19는 일 예에 따른 시간적 왜곡 계산부의 구조를 나타낸다.19 shows the structure of a temporal distortion calculation unit according to an example.

시간적 왜곡 계산부(1020)는 기준 비디오 대표 시간적 정보 계산부(1910), 비교 비디오 대표 시간적 정보 계산부(1920) 및 대표 시간적 정보 차이 계산부(1930)를 포함할 수 있다.The temporal distortion calculation unit 1020 may include a reference video representative temporal information calculation unit 1910, a comparative video representative temporal information calculation unit 1920, and a representative temporal information difference calculation unit 1930.

기준 비디오 대표 시간적 정보 계산부(1910)는 기준 비디오 시간적 정보 결합부(1911), 공간적 풀링부(1912) 및 시간적 풀링부(1913)를 포함할 수 있다.The reference video representative temporal information calculating unit 1910 may include a reference video temporal information combining unit 1911, a spatial pooling unit 1912, and a temporal pooling unit 1913.

비교 비디오 대표 시간적 정보 계산부(1920)는 비교 비디오 시간적 정보 결합부(1921), 공간적 풀링부(1922) 및 시간적 풀링부(1923)를 포함할 수 있다.The comparison video representative temporal information calculating unit 1920 may include a comparison video temporal information combining unit 1921, a spatial pooling unit 1922, and a temporal pooling unit 1923.

기준 비디오 대표 시간적 정보 계산부(1910), 비교 비디오 대표 시간적 정보 계산부(1920) 및 대표 시간적 정보 차이 계산부(1930)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.The functions and operations of the reference video representative temporal information calculator 1910, the comparative video representative temporal information calculator 1920, and the representative temporal information difference calculator 1930 will be described in detail below.

도 20은 일 예에 따른 시간적 왜곡 계산의 흐름도이다.20 is a flowchart of temporal distortion calculation according to an example.

설명의 편의 상, 실시예는 공간적/시간적 특징 정보 추출부(210)가 시간적 특징 정보 추출부(420)만을 포함하는 경우에 대해 설명될 수 있다. 이러한 설명과는 달리, 공간적/시간적 특징 정보 추출부(210)는 공간적 특징 정보 추출부(410), 시간적 특징 정보 추출부(420) 및 시공간적 특징 정보 추출부(430) 중 2 개 이상을 포함할 수도 있다.For convenience of description, the embodiment may be described for a case where the spatial/temporal feature information extractor 210 includes only the temporal feature information extractor 420. Unlike this description, the spatial/temporal feature information extractor 210 may include two or more of the spatial feature information extractor 410, the temporal feature information extractor 420, and the spatiotemporal feature information extractor 430. It might be.

도 11을 참조하여 전술된 단계(1120)는 단계들(2010, 2020 및 2030)를 포함할 수 있다.Step 1120 described above with reference to FIG. 11 may include steps 2010, 2020 and 2030.

단계(2010)에서, 기준 비디오 대표 시간적 정보 계산부(1910)는 1) 기준 비디오로부터 추출된 기준 비디오의 시간적 특징 정보, 2) 기준 비디오로부터 검출된 기준 비디오의 시간적 인지 민감 영역 및 3) 기준 비디오 및 비교 비디오로부터 계산된 기준 비디오의 시간적 인지 민감 영역 및 비교 비디오에서의 시간적 인지 민감 영역 간의 변화를 사용하여 기준 비디오의 대표 시간적 정보를 계산할 수 있다.In step 2010, the reference video representative temporal information calculator 1910 includes: 1) temporal characteristic information of the reference video extracted from the reference video, 2) temporal perception sensitive areas of the reference video detected from the reference video, and 3) reference video And a change between the temporal cognitive sensitive region of the reference video calculated from the comparative video and the temporal cognitive sensitive region of the comparative video, to calculate representative temporal information of the reference video.

단계(2010)는 단계들(2011, 2012 및 2013)를 포함할 수 있다.Step 2010 may include steps 2011, 2012 and 2013.

단계(2011)에서, 기준 비디오 시간적 정보 결합부(1911)는 1) 기준 비디오의 시간적 특징 정보, 2) 기준 비디오의 시간적 인지 민감 영역 및 3) 시간적 인지 민감 영역의 변화를 결합함으로써 기준 비디오의 결합된 시간적 정보를 생성할 수 있다. 시간적 인지 민감 영역의 변화는 기준 비디오의 시간적 인지 민감 영역 및 비교 비디오의 시간적 인지 민감 영역 간의 변화를 의미할 수 있다.In step 2011, the reference video temporal information combining unit 1911 combines the reference video by combining 1) temporal feature information of the reference video, 2) temporal cognitive sensitive area of the reference video, and 3) changes of the temporal cognitive sensitive area. Generated temporal information. The change of the temporal cognitive sensitive region may mean a change between the temporal cognitive sensitive region of the reference video and the temporal cognitive sensitive region of the comparative video.

말하자면, 기준 비디오의 결합된 시간적 정보는 영역에 대한 인지 중요도가 반영된 기준 비디오의 시간적 특징 정보일 수 있다.In other words, the combined temporal information of the reference video may be temporal characteristic information of the reference video in which the recognition importance of the region is reflected.

기준 비디오 시간적 정보 결합부(1911)는 1) 기준 비디오의 시간적 인지 민감 영역 및 2) 시간적 인지 민감 영역의 변화를 이용하여 기준 비디오의 시간적 특징 정보가 영역에 대한 인지 중요도를 반영하게 할 수 있다. 말하자면, 기준 비디오 시간적 정보 결합부(1911)에 의해 기준 비디오의 시간적 특징 정보가 영역 별로 인지 중요도를 반영할 수 있다.The reference video temporal information combining unit 1911 may allow the temporal characteristic information of the reference video to reflect the cognitive importance of the area by using 1) temporal cognitive sensitive areas of the reference video and 2) temporal cognitive sensitive areas. In other words, the temporal characteristic information of the reference video may be reflected per region by the reference video temporal information combining unit 1911.

단계(2012)에서, 공간적 풀링부(1912)는 기준 비디오 시간적 정보 결합부(1911)로부터의 결과인 기준 비디오의 결합된 시간적 정보로부터 기준 비디오의 대표 시간적 정보를 추출할 수 있다.In step 2012, the spatial pooling unit 1912 can extract representative temporal information of the reference video from the combined temporal information of the reference video, which is the result from the reference video temporal information combining unit 1911.

공간적 풀링부(1912)는 영역에 대한 인지 중요도가 반영된 기준 비디오의 시간적 특징 정보로부터 기준 비디오의 각 영상의 대표 시간적 정보를 추출할 수 있다.The spatial pooling unit 1912 may extract representative temporal information of each image of the reference video from temporal characteristic information of the reference video reflecting the recognition importance of the region.

예를 들면, 영상의 대표 시간적 정보는 영상의 시간적 정보들의 평균 값일 수 있다.For example, representative temporal information of the image may be an average value of temporal information of the image.

예를 들면, 영상의 대표 시간적 정보는 영상의 시간적 정보들의 표준 편차일 수 있다.For example, the representative temporal information of the image may be a standard deviation of temporal information of the image.

예를 들면, 공간적 풀링부(1912)는 자동 화질 측정 방법에서 사용되는 공간적 풀링을 사용하여 영상의 대표 시간적 정보를 추출할 수 있다.For example, the spatial pooling unit 1912 may extract representative temporal information of an image using spatial pooling used in an automatic image quality measurement method.

단계(2013)에서, 시간적 풀링부(1913)는 기준 비디오(또는, 기준 비디오의 GOP)의 각 영상의 대표 시간적 정보로부터 기준 비디오(또는, 기준 비디오의 GOP)의 대표 시간적 정보를 추출할 수 있다.In step 2013, the temporal pooling unit 1913 may extract representative temporal information of the reference video (or GOP of the reference video) from representative temporal information of each image of the reference video (or GOP of the reference video). .

예를 들면, 기준 비디오(또는, 기준 비디오의 GOP)의 대표 시간적 정보는 기준 비디오(또는, 기준 비디오의 GOP)의 시간적 정보들의 평균 값일 수 있다.For example, the representative temporal information of the reference video (or GOP of the reference video) may be an average value of temporal information of the reference video (or GOP of the reference video).

예를 들면, 기준 비디오(또는, 기준 비디오의 GOP)의 대표 시간적 정보는 기준 비디오(또는, 기준 비디오의 GOP)의 시간적 정보들의 표준 편차일 수 있다.For example, the representative temporal information of the reference video (or GOP of the reference video) may be a standard deviation of temporal information of the reference video (or GOP of the reference video).

예를 들면, 시간적 풀링부(1913)는 자동 화질 측정 방법에서 사용되는 시간적 풀링을 사용하여 기준 비디오(또는, 기준 비디오의 GOP)의 대표 시간적 정보를 추출할 수 있다.For example, the temporal pooling unit 1913 may extract representative temporal information of the reference video (or GOP of the reference video) using temporal pooling used in the automatic image quality measurement method.

단계(2020)에서, 비교 비디오 대표 시간적 정보 계산부(1920)는 비교 비디오로부터 추출된 비교 비디오의 시간적 특징 정보, 2) 비교 비디오로부터 검출된 비교 비디오의 시간적 인지 민감 영역 및 3) 기준 비디오 및 비교 비디오로부터 계산된 기준 비디오의 시간적 민감 영역 및 비교 비디오에서의 시간적 인지 민감 영역 간의 변화를 사용하여 비교 비디오의 대표 시간적 정보를 계산할 수 있다.In step 2020, the comparison video representative temporal information calculation unit 1920 includes temporal characteristic information of the comparison video extracted from the comparison video, 2) temporal recognition sensitive area of the comparison video detected from the comparison video, and 3) reference video and comparison. The representative temporal information of the comparison video may be calculated using the change between the temporal sensitive area of the reference video calculated from the video and the temporal cognitive sensitive area of the comparison video.

단계(2020)는 단계들(2021, 2022 및 2023)를 포함할 수 있다.Step 2020 may include steps 2021, 2022 and 2023.

단계(2021)에서, 비교 비디오 시간적 정보 결합부(1921)는 1) 비교 비디오의 시간적 특징 정보, 2) 비교 비디오의 시간적 인지 민감 영역 및 3) 시간적 인지 민감 영역의 변화를 결합함으로써 비교 비디오의 결합된 시간적 정보를 생성할 수 있다. 시간적 인지 민감 영역의 변화는 기준 비디오의 시간적 인지 민감 영역 및 비교 비디오의 시간적 인지 민감 영역 간의 변화를 의미할 수 있다.In step 2021, the comparison video temporal information combining unit 1921 combines the comparison video by combining 1) temporal characteristic information of the comparative video, 2) temporal cognitive sensitive area of the comparative video, and 3) changes of the temporal cognitive sensitive area. Generated temporal information. The change of the temporal cognitive sensitive region may mean a change between the temporal cognitive sensitive region of the reference video and the temporal cognitive sensitive region of the comparative video.

말하자면, 비교 비디오의 결합된 시간적 정보는 영역에 대한 인지 중요도가 반영된 비교 비디오의 시간적 특징 정보일 수 있다.In other words, the combined temporal information of the comparison video may be temporal characteristic information of the comparison video in which the recognition importance of the region is reflected.

비교 비디오 시간적 정보 결합부(1921)는 1) 비교 비디오의 시간적 인지 민감 영역 및 2) 시간적 인지 민감 영역의 변화를 이용하여 비교 비디오의 시간적 특징 정보가 영역에 대한 인지 중요도를 반영하게 할 수 있다. 말하자면, 비교 비디오 시간적 정보 결합부(1921)에 의해 비교 비디오의 시간적 특징 정보가 영역 별로 인지 중요도를 반영할 수 있다.The comparison video temporal information combining unit 1921 may allow the temporal characteristic information of the comparative video to reflect the cognitive importance of the region by using 1) temporal cognitive sensitive regions of the comparative video and 2) temporal cognitive sensitive regions. That is, the comparison video temporal information combining unit 1921 may reflect the cognitive importance of temporal feature information of the comparison video for each region.

단계(2022)에서, 공간적 풀링부(1922)는 비교 비디오 시간적 정보 결합부(1921)로부터의 결과인 비교 비디오의 결합된 시간적 정보로부터 비교 비디오의 대표 시간적 정보를 추출할 수 있다.In step 2022, the spatial pooling unit 1922 may extract representative temporal information of the comparison video from the combined temporal information of the comparison video resulting from the comparison video temporal information combining unit 1921.

공간적 풀링부(1922)는 영역에 대한 인지 중요도가 반영된 비교 비디오의 시간적 특징 정보로부터 비교 비디오의 각 영상의 대표 시간적 정보를 추출할 수 있다.The spatial pooling unit 1922 may extract representative temporal information of each image of the comparative video from temporal characteristic information of the comparative video in which the recognition importance of the region is reflected.

예를 들면, 공간적 풀링부(1922)는 자동 화질 측정 방법에서 사용되는 공간적 풀링을 사용하여 영상의 대표 시간적 정보를 추출할 수 있다.For example, the spatial pooling unit 1922 may extract representative temporal information of an image using spatial pooling used in an automatic image quality measurement method.

일 실시예에서, 공간적 풀링부(1922)에 의한 단계(2022)는 생략될 수 있다.In one embodiment, step 2022 by the spatial pooling portion 1922 may be omitted.

단계(2023)에서, 시간적 풀링부(1923)는 비교 비디오(또는, 비교 비디오의 GOP)의 각 영상의 대표 시간적 정보로부터 비교 비디오(또는, 비교 비디오의 GOP)의 대표 시간적 정보를 추출할 수 있다.In step 2023, the temporal pooling unit 1923 may extract representative temporal information of the comparison video (or GOP of the comparison video) from representative temporal information of each image of the comparison video (or GOP of the comparison video). .

예를 들면, 비교 비디오(또는, 비교 비디오의 GOP)의 대표 시간적 정보는 비교 비디오(또는, 비교 비디오의 GOP)의 시간적 정보들의 평균 값일 수 있다.For example, representative temporal information of the comparison video (or GOP of the comparison video) may be an average value of temporal information of the comparison video (or GOP of the comparison video).

예를 들면, 비교 비디오(또는, 비교 비디오의 GOP)의 대표 시간적 정보는 비교 비디오(또는, 비교 비디오의 GOP)의 시간적 정보들의 표준 편차일 수 있다.For example, the representative temporal information of the comparison video (or GOP of the comparison video) may be a standard deviation of temporal information of the comparison video (or GOP of the comparison video).

예를 들면, 시간적 풀링부(1923)는 자동 화질 측정 방법에서 사용되는 시간적 풀링을 사용하여 비교 비디오(또는, 비교 비디오의 GOP)의 대표 시간적 정보를 추출할 수 있다.For example, the temporal pooling unit 1923 may extract representative temporal information of the comparison video (or GOP of the comparison video) using temporal pooling used in the automatic image quality measurement method.

단계(2030)에서, 대표 시간적 정보 차이 계산부(1930)는 기준 비디오의 대표 시간적 정보 및 비교 비디오의 대표 시간적 정보 간의 차이를 계산할 수 있다.In operation 2030, the representative temporal information difference calculator 1930 may calculate a difference between the representative temporal information of the reference video and the representative temporal information of the comparison video.

도 3을 참조하여 전술된 시간적 왜곡은 기준 비디오의 대표 시간적 정보 및 비교 비디오의 대표 시간적 정보 간의 차이일 수 있다.The temporal distortion described above with reference to FIG. 3 may be a difference between representative temporal information of the reference video and representative temporal information of the comparison video.

도 21는 일 예에 따른 시간적 특징 정보 추출부의 구조를 나타낸다.21 shows a structure of a temporal feature information extracting unit according to an example.

시간적 특징 정보 추출부(420)는 수직 움직임 정보 추출부(2110) 및 수평 움직임 정보 추출부(2120)를 포함할 수 있다.The temporal feature information extraction unit 420 may include a vertical motion information extraction unit 2110 and a horizontal motion information extraction unit 2120.

수직 움직임 정보 추출부(2110) 및 수평 움직임 정보 추출부(2120)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.The functions and operations of the vertical motion information extraction unit 2110 and the horizontal motion information extraction unit 2120 will be described in detail below.

도 22는 일 예에 따른 시간적 특징 정보 추출의 흐름도이다.22 is a flowchart of temporal feature information extraction according to an example.

도 5를 참조하여 전술된 단계(520)는 단계들(2210 및 2220)를 포함할 수 있다.Step 520 described above with reference to FIG. 5 may include steps 2210 and 2220.

단계(2210)에서, 수직 움직임 정보 추출부(2110)는 입력 비디오의 복수의 영상들로부터 시간적 특징 정보의 수직 움직임 정보를 추출할 수 있다.In operation 2210, the vertical motion information extraction unit 2110 may extract vertical motion information of temporal feature information from a plurality of images of an input video.

사람의 인지 특성에 따르면, 사람은 프레임들 간의 급격한 움직임에 민감할 수 있다. 이러한 인지 특성을 고려하기 위해 수직 움직임 정보 추출부(2110)는 아래의 수식 14와 같이 수직 움직임 정보를 추출할 수 있다.According to a person's cognitive characteristics, a person may be sensitive to sudden movements between frames. In order to consider these cognitive characteristics, the vertical motion information extraction unit 2110 may extract vertical motion information as shown in Equation 14 below.

[수식 14][Equation 14]

v _x (i, j, t) 는 x 축 방향으로의 움직임 속도일 수 있다. R은 초당 프레임율이다. R은 1/Δt일 수 있다. v _x ( i , j , t ) may be a movement speed in the x- axis direction. R is the frame rate per second. R may be 1/ Δt .

수식 14는 움직임 벡터에 인지적인 특성인 저더(judder) 왜곡을 반영시킬 수 있다. 수직 움직임 정보 추출부(2110)는 수식 14를 사용하여 움직임 정보를 추출함으로써 인지적인 특성인 저더 왜곡을 반영할 수 있다.Equation 14 can reflect the cognitive characteristic judder distortion in the motion vector. The vertical motion information extraction unit 2110 may reflect cognitive distortion judder distortion by extracting motion information using Equation 14.

단계(2220)에서, 수평 움직임 정보 추출부(2120) 입력 비디오의 복수의 영상들로부터 시간적 특징 정보의 수평 움직임 정보를 추출할 수 있다.In step 2220, the horizontal motion information extraction unit 2120 may extract horizontal motion information of temporal feature information from a plurality of images of the input video.

수평 움직임 정보 추출부(2120)는 전술된 수식 14에 y 축 방향으로의 움직임 속도를 대입하여 수평 움직임 정보를 추출할 수 있다.The horizontal motion information extraction unit 2120 may extract horizontal motion information by substituting the motion speed in the y- axis direction into Equation 14 described above.

도 23은 일 예에 따른 시간적 인지 민감 영역 검출부의 구조를 나타낸다.23 shows a structure of a temporal cognitive sensitive region detection unit according to an example.

시간적 인지 민감 영역 검출부(620)는 공간 복잡도(spatial randomness) 계산부(2310) 및 배경 평탄도 계산부(2320)를 포함할 수 있다.The temporal cognitive sensitive region detection unit 620 may include a spatial randomness calculation unit 2310 and a background flatness calculation unit 2320.

공간 복잡도 계산부(2310) 및 배경 평탄도 계산부(2320)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.The functions and operations of the spatial complexity calculation unit 2310 and the background flatness calculation unit 2320 will be described in detail below.

도 24는 일 예에 따른 시간적 인지 민감 영역 검출의 흐름도이다.24 is a flowchart of temporal cognitive sensitive area detection according to an example.

도 7을 참조하여 전술된 단계(720)는 단계들(2410 및 2420)을 포함할 수 있다.Step 720 described above with reference to FIG. 7 may include steps 2410 and 2420.

시간적 인지 민감 영역 검출부(620)는 GOP에 대하여 시간적 인지 민감 영역을 검출할 수 있다. GOP는 이전 영상 및 현재 영상의 2 개의 영상들로 구성될 수 있다.The temporal cognitive sensitive region detector 620 may detect a temporal cognitive sensitive region with respect to the GOP. The GOP may be composed of two images, a previous image and a current image.

단계(2410)에서, 공간 복잡도 계산부(2310)는 입력 비디오의 GOP의 현재 영상에 대하여 공간 복잡도를 계산할 수 있다.In step 2410, the spatial complexity calculator 2310 may calculate the spatial complexity of the current image of the GOP of the input video.

이 때, 전술된 공간적 인지 민감 영역 검출부(610)의 공간 복잡도 계산부(1610)의 공간 복잡도를 계산하는 방식이 공간 복잡도 계산부(2310)에서도 사용될 수 있다.At this time, the method for calculating the spatial complexity of the spatial complexity calculator 1610 of the spatial cognitive sensitive region detector 610 described above may also be used in the spatial complexity calculator 2310.

또는, 공간 복잡도 계산부(2310)는 입력 비디오의 GOP의 이전 영상에 대하여 공간 복잡도를 계산할 수 있다.Alternatively, the spatial complexity calculator 2310 may calculate the spatial complexity of the previous image of the GOP of the input video.

또는, 공간 복잡도 계산부(2310)는 입력 비디오의 GOP의 현재 영상 및 이전 영상의 가중치가 부여된 평균에 대하여 공간 복잡도를 계산할 수 있다.Alternatively, the spatial complexity calculator 2310 may calculate the spatial complexity of the weighted average of the current image and the previous image of the GOP of the input video.

단계(2420)에서, 배경 평탄도 계산부(2320)는 입력 비디오의 GOP의 현재 영상에 대하여 배경 평탄도를 계산할 수 있다.In step 2420, the background flatness calculating unit 2320 may calculate the background flatness for the current image of the GOP of the input video.

이 때, 전술된 공간적 인지 민감 영역 검출부(610)의 배경 평탄도 계산부(1620)의 배경 평탄도를 계산하는 방식이 배경 평탄도 계산부(2320)에서도 사용될 수 있다.At this time, the method of calculating the background flatness of the background flatness calculating unit 1620 of the spatial cognitive sensitive region detector 610 described above may also be used in the background flatness calculating unit 2320.

또는, 배경 평탄도 계산부(2320)는 입력 비디오의 GOP의 이전 영상에 대하여 배경 평탄도를 계산할 수 있다.Alternatively, the background flatness calculating unit 2320 may calculate the background flatness with respect to a previous image of the GOP of the input video.

또는, 배경 평탄도 계산부(2320)는 입력 비디오의 GOP의 현재 영상 및 이전 영상의 가중치가 부여된 평균에 대하여 배경 평탄도를 계산할 수 있다.Alternatively, the background flatness calculating unit 2320 may calculate the background flatness with respect to the weighted average of the current image and the previous image of the GOP of the input video.

도 25는 일 실시예에 따른 인지 화질 측정 심층 신경망을 사용하는 화질 측정을 위한 구성을 나타낸다.25 illustrates a configuration for measuring image quality using a deep neural network for measuring cognitive image quality according to an embodiment.

도 2를 참조하여 전술된 공간적/시간적 특징 정보 추출부(210), 왜곡 계산부(240) 및 지표 계산부(250)는 인지 화질 측정 심층 신경망(2510)으로 대체될 수 있다. 처리부(210)는 인지 화질 측정 심층 신경망(2510)을 구동할 수 있다.The spatial/temporal feature information extraction unit 210, the distortion calculation unit 240, and the index calculation unit 250 described above with reference to FIG. 2 may be replaced with the deep neural network 2510 for cognitive image quality measurement. The processor 210 may drive the deep neural network 2510 for cognitive image quality measurement.

도 2를 참조하여 전술된 부(unit)들 대신 인지 화질 측정 심층 신경망(2510)에 의해 비교 비디오의 인지 화질이 측정될 수 있다. 이 때, 인지 화질 측정 심층 신경망(2510)으로는 기준 비디오 및 비교 비디오가 입력될 수 있고, 1) 기준 비디오의 공간적/시간적 인지 민감 영역, 2) 비교 비디오의 공간적/시간적 인지 민감 영역 및 3) 기준 비디오의 공간적/시간적 인지 민감 영역 및 비교 비디오의 공간적/시간적 인지 민감 영역 간의 변화가 입력될 수 있다.The cognitive image quality of the comparison video may be measured by the deep neural network 2510 measuring the cognitive image quality instead of the units described above with reference to FIG. 2. At this time, a reference video and a comparison video may be input to the deep neural network 2510 for cognitive quality measurement, 1) a spatial/temporal cognitive sensitive area of the reference video, 2) a spatial/temporal cognitive sensitive area of the comparative video, and 3) Changes between the spatial/temporal cognitive sensitive region of the reference video and the spatial/temporal cognitive sensitive region of the comparison video may be input.

말하자면, 인지 화질 측정 심층 신경망(2510)이 인지 화질을 측정함에 있어서, 인지 화질의 측정을 위한 정보가 기준 비디오 및 비교 비디오가 입력된 인지 화질 측정 심층 신경망(2510) 내에서 자체적으로 생성되는 것이 아니라, 실시예에서 설명된 공간적/시간적 인지 민감 영역 검출부(220) 및 공간적/시간적 인지 민감 영역 변화 계산부(230)에 의해 제공될 수 있다.In other words, when the cognitive image quality measurement deep neural network 2510 measures cognitive image quality, information for measuring cognitive image quality is not generated by itself in the cognitive image quality measurement deep neural network 2510 in which a reference video and a comparison video are input. , May be provided by the spatial/temporal cognitive sensitive region detection unit 220 and the spatial/temporal cognitive sensitive region change calculation unit 230 described in the embodiment.

이러한 외부로부터의 정보가 입력됨에 따라, 인지 화질 측정 심층 신경망(2510)은 심층 신경망의 학습의 과정에서 기준 비디오 및 비교 비디오에 관한 인지적 특성들을 보다 잘 학습할 수 있고, 학습의 종료 후 인지 화질의 평가에 있어서 더 높은 측정 신뢰도를 제공할 수 있다.As the information from the outside is input, the cognitive image quality measurement deep neural network 2510 can better learn cognitive characteristics related to the reference video and the comparative video in the process of learning the deep neural network, and the cognitive image quality after the end of learning Can provide higher measurement reliability in the evaluation of.

인지 화질 측정 심층 신경망(2510)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.The function and operation of the cognitive quality measurement deep neural network 2510 will be described in detail below.

도 26은 일 예에 따른 일 실시예에 따른 인지 화질 측정 심층 신경망을 사용하는 화질 측정 방법의 흐름도이다.26 is a flowchart of a method for measuring image quality using a deep neural network for measuring cognitive image quality according to an embodiment.

단계(2610)는 도 3을 참조하여 전술된 단계(320)에 대응할 수 있다. 중복되는 설명은 생략된다.Step 2610 may correspond to step 320 described above with reference to FIG. 3. Duplicate description is omitted.

단계(2620)는 도 3을 참조하여 전술된 단계(340)에 대응할 수 있다. 중복되는 설명은 생략된다.Step 2620 may correspond to step 340 described above with reference to FIG. 3. Duplicate description is omitted.

단계(2630)는 도 3을 참조하여 전술된 단계(350)에 대응할 수 있다. 중복되는 설명은 생략된다.Step 2630 may correspond to step 350 described above with reference to FIG. 3. Duplicate description is omitted.

단계(2640)에서, 1) 기준 비디오, 2) 비교 비디오, 3) 기준 비디오의 공간적/시간적 인지 민감 영역, 4) 비교 비디오의 공간적/시간적 인지 민감 영역 및 5) 기준 비디오의 공간적/시간적 인지 민감 영역 및 비교 비디오의 공간적/시간적 인지 민감 영역 간의 변화는 인지 화질 측정 심층 신경망(2510)으로 입력될 수 있다.In step 2640, 1) reference video, 2) comparative video, 3) spatial/temporal cognitive sensitive area of reference video, 4) spatial/temporal cognitive sensitive area of reference video and 5) spatial/temporal cognitive sensitivity of reference video The change between the region and the spatial/temporal cognitive sensitive region of the comparison video may be input to the deep neural network 2510 for cognitive quality measurement.

인지 화질 측정 심층 신경망(2510)은 1) 기준 비디오, 2) 비교 비디오, 3) 기준 비디오의 공간적/시간적 인지 민감 영역, 4) 비교 비디오의 공간적/시간적 인지 민감 영역 및 5) 기준 비디오의 공간적/시간적 인지 민감 영역 및 비교 비디오의 공간적/시간적 인지 민감 영역 간의 변화를 사용하여 화질 측정의 결과를 생성할 수 있다.The cognitive quality measurement deep neural network 2510 includes 1) a reference video, 2) a comparative video, 3) a spatial/temporal cognitive sensitive area of the reference video, 4) a spatial/temporal cognitive sensitive area of the comparative video, and 5) a spatial/ The results of the image quality measurement can be generated using the change between the temporal cognitive sensitive region and the spatial/temporal cognitive sensitive region of the comparison video.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and/or combinations of hardware components and software components. For example, the devices and components described in the embodiments include, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors (micro signal processors), microcomputers, field programmable arrays (FPAs), It may be implemented using one or more general purpose computers or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may run an operating system (OS) and one or more software applications running on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of understanding, a processing device may be described as one being used, but a person having ordinary skill in the art, the processing device may include a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include. For example, the processing device may include a plurality of processors or a processor and a controller. In addition, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instruction, or a combination of one or more of these, and configure the processing device to operate as desired, or process independently or collectively You can command the device. Software and/or data may be interpreted by a processing device, or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. , Or may be permanently or temporarily embodied in the transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium.

컴퓨터 판독 가능한 기록 매체는 본 발명에 따른 실시예들에서 사용되는 정보를 포함할 수 있다. 예를 들면, 컴퓨터 판독 가능한 기록 매체는 비트스트림을 포함할 수 있고, 비트스트림은 본 발명에 따른 실시예들에서 설명된 정보를 포함할 수 있다.The computer-readable recording medium may include information used in embodiments according to the present invention. For example, a computer-readable recording medium may include a bitstream, and the bitstream may include information described in embodiments according to the present invention.

컴퓨터 판독 가능한 기록 매체는 비-일시적 컴퓨터 판독 가능한 매체(non-transitory computer-readable medium)를 포함할 수 있다.The computer-readable recording medium may include a non-transitory computer-readable medium.

상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded in the medium may be specially designed and configured for the embodiments or may be known and usable by those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. -Hardware devices specifically configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language code that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by a limited embodiment and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques are performed in a different order than the described method, and/or the components of the described system, structure, device, circuit, etc. are combined or combined in a different form from the described method, or other components Alternatively, even if replaced or substituted by equivalents, appropriate results can be achieved.

100: 부호화 장치
110: 처리부
120: 통신부
170: 센서100: encoding device
110: processing unit
120: communication unit
170: sensor

Claims

A communication unit that receives a reference video and a comparison video; And
Processing unit for generating a result of the quality measurement for the comparison video
Including,
The processing unit changes the feature information of the reference video, the feature information of the comparison video, the cognitive sensitive region of the reference video, the cognitive sensitive region of the comparison video, the cognitive sensitive region of the reference video, and the cognitive sensitive region of the comparison video An image processing apparatus that calculates distortion using and generates a result of the image quality measurement based on the distortion.

According to claim 1,
The feature information of the reference video includes at least one of spatial feature information of the reference video, temporal feature information of the reference video, and spatial and temporal feature information of the reference video,
The feature information of the comparison video includes at least one of spatial feature information of the comparison video, temporal feature information of the comparison video, and spatiotemporal feature information of the comparison video.

According to claim 2,
The processing unit extracts the spatiotemporal feature information from the spatiotemporal slice of the input video,
The input video is at least one of the reference video and the comparison video,
The space-time slice is an image processing apparatus in which a group of pictures (GOP) of the input video is spatially divided.

According to claim 2,
The processing unit detects horizontal and vertical features of the spatial feature information from an image of an input video,
The processing unit detects a characteristic for a direction other than a horizontal direction and a vertical direction from the image.

The method of claim 4,
The processing unit derives a region having high cognitive sensitivity from the image using an edge detection method.

The method of claim 5,
The edge detection method is a Sobel operation image processing apparatus.

The method of claim 5,
The horizontal feature and the vertical feature are horizontal-vertical edge maps,
The feature for the directions other than the horizontal direction and the vertical direction is an image processing apparatus that is an edge map in which information on horizontal edges and vertical edges is excluded.

The method of claim 7,
The processing unit updates the horizontal-vertical edge map using a first cognitive weight,
The processing unit updates the edge map from which information on the horizontal edge and the vertical edge is excluded by using a second cognitive weight.

The method of claim 8,
The first cognitive weight and the second cognitive weight are image processing devices that reflect changes in the spatial cognitive sensitive region.

According to claim 1,
The cognitive sensitive region of the reference video includes at least one of a spatial cognitive sensitive region of the reference video, a temporal cognitive sensitive region of the reference video, and a spatial and temporal cognitive sensitive region of the reference video,
The cognitive sensitive region of the comparison video includes at least one of a spatial cognitive sensitive region of the comparison video, a temporal cognitive sensitive region of the comparison video, and a spatial and temporal cognitive sensitive region of the comparison video.

The method of claim 10,
The processing unit generates a spatial complexity map by calculating spatial complexity of pixels or first blocks of an image of an input video,
The input video is the reference video or the comparison video,
The processing unit generates an flatness map by calculating background flatnesses of second blocks of an image of the input video,

According to claim 1,
The change is an image processing apparatus that is a difference between a cognitive sensitive region of the reference video and a cognitive sensitive region of the comparison video.

The method of claim 12,
The distortion includes one or more of spatial distortion, temporal distortion, and spatial and temporal distortion.

The method of claim 13,
The processing unit extracts representative spatial information of the reference video by using spatial characteristic information of the reference video, spatial cognitive sensitive areas and spatial cognitive sensitive areas of the reference video,
The processing unit extracts representative spatial information of the comparison video by using the spatial characteristic information of the comparison video, the spatial cognitive sensitive area and the spatial cognitive sensitive area of the comparison video,
The processing unit calculates the spatial distortion,
The spatial distortion is a difference between representative spatial information of the reference video and representative spatial information of the comparison video,
The change of the spatial cognitive sensitive region is a change between the spatial cognitive sensitive region of the reference video and the spatial cognitive sensitive region of the comparison video.

The method of claim 14,
The processor generates combined spatial information of the reference video by combining spatial feature information of the reference video, spatial cognitive sensitive area of the reference video, and changes in the spatial cognitive sensitive area,
The processing unit extracts representative spatial information of each image of the reference video from the combined spatial information,
The processing unit extracts representative spatial information of the reference video from representative spatial information of each image of the reference video.

The method of claim 13,
The processing unit extracts representative temporal information of the reference video using temporal characteristic information of the reference video, temporal cognitive sensitive areas and temporal cognitive sensitive areas of the reference video,
The processing unit extracts representative temporal information of the comparison video by using temporal characteristic information of the comparison video, temporal cognitive sensitive area and temporal cognitive sensitive area of the comparison video,
The processing unit calculates the temporal distortion,
The temporal distortion is a difference between the representative temporal information of the reference video and the representative temporal information of the comparison video,
The change of the temporal cognitive sensitive region is a change between the temporal cognitive sensitive region of the reference video and the temporal cognitive sensitive region of the comparison video.

The method of claim 16,
The processor generates combined temporal information of the reference video by combining temporal characteristic information of the reference video, temporal cognitive sensitive area of the reference video, and changes in the temporal cognitive sensitive area,
The processing unit extracts representative temporal information of each image of the reference video from the combined temporal information,
The processing unit extracts representative temporal information of the reference video from representative temporal information of each image of the reference video.

Extracting feature information of the reference video;
Detecting a cognitive sensitive area of the reference video:
Extracting feature information of the comparison video;
Detecting a cognitive sensitive region of the comparison video;
Calculating a change between a cognitive sensitive region of the reference video and a cognitive sensitive region of the comparison video;
Calculating distortion using feature information of the reference video, feature information of the comparison video, a cognitive sensitive region of the reference video, a cognitive sensitive region of the comparison video, and the change; And
Generating a result of image quality measurement based on the distortion
Image processing method comprising a.

A computer-readable recording medium containing a program for processing the image processing method of claim 18.

A communication unit that receives a reference video and a comparison video; And
Processing unit to drive deep neural network for cognitive quality measurement
Including,
The processing unit detects the cognitive sensitive region of the reference video and the cognitive sensitive region of the comparison video, calculates a change between the cognitive sensitive region of the reference video and the cognitive sensitive region of the comparative video,
The cognitive sensitive region of the reference video, the cognitive sensitive region of the comparison video, and the change are input to the deep neural network for measuring cognitive image quality,
The cognitive image quality measurement deep neural network is an image processing apparatus that generates a result of image quality measurement using a cognitive sensitive region of the reference video, a cognitive sensitive region of the comparison video, and the change.