KR20170042235A

KR20170042235A - Method and apparatus for adaptive encoding and decoding based on image quality

Info

Publication number: KR20170042235A
Application number: KR1020160127739A
Authority: KR
Inventors: 정세윤; 김휘용; 김종호; 임성창; 양 상린; 씨 제이 쿠오 씨; 황 친
Original assignee: 한국전자통신연구원
Priority date: 2015-10-08
Filing date: 2016-10-04
Publication date: 2017-04-18
Also published as: KR102602690B1

Abstract

A method and apparatus for adaptive encoding and decoding based on image quality are provided. An encoding apparatus can determine an optimal frame per second (FPS) for a video and can encode the video according to the determined FPS. Further, the encoding apparatus can provide improved temporal scalability. The decoding apparatus can select a frame to be reproduced among the frames of the video according to a required minimum image quality satisfaction. Through the selection of the frame, the decoding apparatus can provide improved temporal scalability. So, the level of the deterioration of the image quality can be predicted.

Description

TECHNICAL FIELD [0001] The present invention relates to a method and apparatus for adaptive encoding and decoding based on picture quality,

아래의 실시예들은 영상의 복호화 방법, 복호화 장치, 부호화 방법 및 부호화 장치에 관한 것으로서, 보다 상세하게는 영상의 화질에 기반하여 부호화 및 복호화를 적응적으로 수행하는 방법 및 장치에 관한 것이다.The present invention relates to a decoding method, a decoding apparatus, a coding method, and an encoding apparatus, and more particularly, to a method and apparatus for adaptively performing encoding and decoding based on image quality of an image.

정보 통신 산업의 지속적인 발달을 통해 HD(High Definition) 해상도를 가지는 방송 서비스가 세계적으로 확산되었다. 이러한 확산을 통해, 많은 사용자들이 고해상도이며 고화질인 영상(image) 및/또는 비디오(video)에 익숙해지게 되었다.With the continuous development of the information and telecommunication industry, broadcasting service with HD (High Definition) resolution spread worldwide. With this proliferation, many users become accustomed to high resolution and high quality images and / or video.

높은 화질에 대한 사용자들의 수요를 만족시키기 위하여, 많은 기관들이 차세대 영상 기기에 대한 개발에 박차를 가하고 있다. 에이치디티브이(High Definition TV; HDTV) 및 풀에이치디(Full HD; FHD) TV뿐만 아니라, FHD TV에 비해 4배 이상의 해상도를 갖는 울트라에이치디(Ultra High Definition; UHD) TV에 대한 사용자들의 관심이 증대하였고, 이러한 관심의 증대에 따라, 더 높은 해상도 및 화질을 갖는 영상에 대한 영상 부호화(encoding)/복호화(decoding) 기술이 요구된다.In order to meet the users' demand for high image quality, many organizations are spurring development on next generation image devices. In addition to High Definition TV (HDTV) and Full HD (FHD) TVs, users' interest in ultra high definition (UHD) TVs with more than four times the resolution of FHD TVs As the interest increases, there is a need for image encoding / decoding techniques for images with higher resolution and image quality.

UHD와 같은 신규한 비디오(video) 서비스에 있어서, 고 프레임 율(High Frame Rage; HFR)의 비디오에 대한 필요성이 높아지고 있다. 예를 들면, 고 프레임 율은 60 초 당 프레임(Frame Per Second; FPS) 이상의 프레임 재생 율일 수 있다.For new video services such as UHD, the need for high frame rate (HFR) video is increasing. For example, the high frame rate may be a frame refresh rate of more than a frame per second (FPS).

그러나, 이러한 HFR의 비디오가 제공되기 위해서는 비디오의 데이터 량이 증가된다는 문제가 발생할 수 있다. 또한, 비디오의 데이터 량의 증가에 따라 비디오의 전송 및 저장에 있어서 비용의 문제 및 기술적인 문제가 발생할 수 있다.However, in order to provide such HFR video, there is a problem that the amount of video data increases. In addition, as the amount of video data increases, problems of cost and technical problems may occur in transmission and storage of video.

다행히도, 인간의 시각의 특성 상, 모든 상황에서 HFR이 요구되지는 않는다, 일 예를 들면, 대부분의 사람들은 움직임이 거의 존재하지 않는 비디오에서는 30 FPS의 비디오 및 60 FPS의 비디오 간의 화질 차이 또는 화질 저하를 느끼지 못한다.Fortunately, due to the nature of human vision, HFR is not required in all situations, for example, for most people, the video quality difference or picture quality between 30 FPS video and 60 FPS video, I do not feel a drop.

즉, 비디오의 내용에 따라, FPS가 특정한 기준 값(threshold value)의 이상이면, FPS가 더 높아지더라도, 인간의 인지적 특징에 따라, 대부분의 사람들은 화질 차이를 거의 느끼지 못할 수 있다.That is, depending on the content of the video, even if the FPS is higher than a certain threshold value, even if the FPS is higher, most people may hardly notice a difference in image quality depending on the human cognitive characteristic.

일 실시예는 HFR의 비디오를 낮은 프레임 율의 비디오로 변환 할 때 발생하는 화질의 저하의 정도를 예측하는 방법 및 장치를 제공할 수 있다.One embodiment can provide a method and apparatus for predicting the degree of image quality degradation that occurs when converting HFR video to low frame rate video.

일 실시예는 화질의 저하의 정도의 예측을 통해 부호화에 요구되는 비트 율을 감소시키는 방법 및 장치를 제공할 수 있다.One embodiment can provide a method and apparatus for reducing the bit rate required for encoding through prediction of the degree of degradation in image quality.

일 실시예는 시간적 스케일러빌러티(Temporal Scalability; TS) 등을 통해 동영상의 프레임 율을 더 낮게 변환 할 때, 화질의 저하의 정도의 예측을 통해 생성된 정보를 사용하여 화질의 저하를 최소화하는 방법 및 장치를 제공할 수 있다. In one embodiment, when the frame rate of a moving image is lowered through temporal scalability (TS), a method of minimizing the deterioration of image quality by using information generated through prediction of the degree of degradation of image quality And apparatus.

일 실시예는 시간적 스케일러빌러티를 적용함에 있어서 화질의 저하에 관련된 정보도 함께 고려하여, 화질 저하를 최소로 하는 방법 및 장치를 제공할 수 있다.One embodiment of the present invention can provide a method and apparatus for minimizing image quality deterioration by considering information relating to image quality degradation in applying temporal scalability.

일 측에 있어서, 프레임의 선택 정보에 기초하여 상기 프레임의 복호화 여부의 결정을 수행하는 단계; 및 상기 프레임의 복호화가 결정된 경우 상기 프레임의 복호화를 수행하는 단계를 포함하는 동영상 복호화 방법이 제공된다.The method comprising the steps of: determining, on a side, whether to decode the frame based on frame selection information; And performing decoding of the frame when decoding of the frame is determined.

상기 결정은 복수의 프레임들의 각 프레임에 대하여 수행될 수 있다.The determination may be performed for each frame of the plurality of frames.

상기 복수의 프레임들은 픽쳐의 그룹(Group Of Picture; GOP)의 프레임들일 수 있다.The plurality of frames may be frames of a group of pictures (GOP).

상기 선택 정보는 상기 프레임이 복호화에서 제외되도록 상기 프레임을 포함하는 동영상의 재생이 설정되더라도 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.The selection information may be related to a ratio of a person who is not aware of a degradation in the picture quality of the moving picture even though the reproduction of the moving picture including the frame is set so that the frame is excluded from decoding.

상기 선택 정보는 상기 프레임이 복호화에서 제외되도록 상기 프레임을 포함하는 동영상의 재생의 초 당 프레임(Frames Per Second; FPS)가 결정되더라도 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.The selection information may be related to a ratio of persons who do not perceive degradation in the picture quality of the moving picture even if frames per second (FPS) of the moving picture reproduction including the frame is determined so that the frame is excluded from decoding .

상기 프레임의 상기 복호화 여부는 상기 선택 정보의 값 및 재생 정보의 값 간의 비교에 기반하여 결정될 수 있다.The decoding of the frame may be determined based on a comparison between the value of the selection information and the value of the reproduction information.

상기 재생 정보는 상기 프레임을 포함하는 동영상의 재생과 관련된 정보일 수 있다.The reproduction information may be information related to reproduction of a moving picture including the frame.

상기 재생 정보는 상기 프레임을 포함하는 동영상의 재생의 초 당 프레임(Frames Per Second; FPS)과 관련된 정보일 수 있다.The playback information may be information related to frames per second (FPS) of playback of a moving image including the frame.

상기 선택 정보의 값이 재생 정보의 값보다 더 크면 상기 프레임에 대해서는 상기 복호화가 수행되지 않는 것이 결정될 수 있다.If the value of the selection information is larger than the value of the reproduction information, it can be determined that the decoding is not performed for the frame.

상기 선택 정보는 상기 프레임에 대한 에스이아이(Supplemental Enhancement Information; SEI) 내에 포함될 수 있다.The selection information may be included in Supplemental Enhancement Information (SEI) for the frame.

상기 복호화 여부의 결정은 상기 프레임을 포함하는 복수의 프레임들 중 상기 프레임의 시간적 식별자와 동일한 시간적 식별자를 갖는 다른 프레임에게 공통적으로 적용될 수 있다.The determination of the decryption may be commonly applied to other frames having the same temporal identifier as the temporal identifier of the frame among the plurality of frames including the frame.

다른 일 측에 있어서, 프레임의 선택 정보에 기반하여 상기 프레임의 복호화 여부의 결정을 수행하는 제어부; 및 상기 프레임의 복호화가 결정된 경우 상기 프레임의 복호화를 수행하는 복호화부를 포함하는 동영상 복호화 장치가 제공된다.A control unit for performing a determination as to whether to decode the frame based on the selection information of the frame, on the other side; And a decoding unit that decodes the frame when decoding of the frame is determined.

또 다른 일 측에 있어서, 동영상의 프레임에 대한 선택 정보를 생성하는 단계; 및 상기 선택 정보에 기반하여 상기 동영상의 부호화를 수행하는 단계를 포함하는 동영상 부호화 방법이 제공된다.Generating a selection information for a frame of the moving picture; And encoding the moving picture based on the selection information.

상기 동영상의 부호화를 수행하는 단계는, 상기 프레임에 대한 상기 선택 정보에 기반하여 상기 프레임의 부호화 여부의 결정을 수행하는 단계; 및 상기 프레임의 부호화가 결정된 경우 상기 프레임의 부호화를 수행하는 단계를 포함할 수 있다.Wherein the encoding of the moving picture comprises: determining whether to encode the frame based on the selection information for the frame; And performing encoding of the frame when the encoding of the frame is determined.

상기 부호화 여부의 결정은 상기 프레임을 포함하는 복수의 프레임들 중 상기 프레임의 시간적 식별자와 동일한 시간적 식별자를 갖는 다른 프레임에게 공통적으로 적용될 수 있다.The determination as to whether to perform the encoding may be commonly applied to other frames having the same temporal identifier as the temporal identifier of the frame among the plurality of frames including the frame.

상기 선택 정보는 상기 프레임이 비디오의 부호화에서 제외되더라도 상기 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.The selection information may be related to a ratio of persons who are not aware of the degradation of the picture quality of the moving picture even if the frame is excluded from the coding of the video.

상기 비율은 기계 학습에 의해 계산될 수 있다.The ratio can be calculated by machine learning.

상기 비율은 프레임의 특징 벡터에 기반하여 결정될 수 있다.The ratio can be determined based on the feature vector of the frame.

상기 선택 정보는 복수일 수 있다.The selection information may be plural.

상기 복수의 선택 정보는 동영상의 재생의 초 당 프레임(Frames Per Second; FPS) 별로 계산될 수 있다.The plurality of selection information may be calculated for each frames per second (FPS) of reproduction of a moving image.

상기 동영상의 부호화에 의해 생성된 비트스트림은 상기 선택 정보를 포함할 수 있다.The bit stream generated by encoding the moving picture may include the selection information.

상기 선택 정보는 상기 프레임을 포함하는 복수의 프레임들 중 상기 프레임의 시간적 식별자와 동일한 시간적 식별자를 갖는 다른 프레임에게 공통적으로 적용될 수 있다.The selection information may be commonly applied to other frames having the same temporal identifier as the temporal identifier of the frame among a plurality of frames including the frame.

상기 선택 정보는 상기 프레임이 복호화에서 제외되도록 상기 동영상의 재생이 설정되더라도 재생되는 상기 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.The selection information may be related to a ratio of persons who are not aware of the degradation of the picture quality of the moving picture to be played even if the moving picture is set to be reproduced so that the frame is excluded from decoding.

HFR의 비디오의 단위 구간(unit interval) 별로, 단위 구간의 프레임 율을 낮출 경우 단위 구간이 화질이 얼마나 저하되는지를 예측하는 방법 및 장치가 제공된다.There is provided a method and apparatus for predicting how much deteriorated picture quality of a unit section when a frame rate of a unit section is lowered for each unit interval of video of the HFR.

HFR의 비디오를 단위 구간 별로, 사용자에 의해 입력된 화질 차이 범위 또는 기정의된 허용 가능한 화질 차이 범위의 이내로 인지 화질을 유지하면서, 단위 구간의 프레임 율을 감소시킴으로써 비디오의 데이터 율을 감소시키는 방법 및 장치가 제공된다. 예를 들면, 기정의된 허용 가능한 화질 차이 범위는 80%의 사람들이 화질 차이를 느끼지 못하는 범위일 수 있다.A method of reducing the data rate of a video by reducing the frame rate of the unit section while maintaining the quality of the video of the HFR per unit section, the image quality difference range input by the user or within the predetermined allowable image quality difference range, and Device is provided. For example, a predetermined acceptable range of image quality differences may be such that 80% of people do not feel the image quality difference.

TS의 적용에 있어서 화질이 얼마나 저하되는지를 예측하는 화질 저하 예측 정보를 고려하는 방법 및 장치가 제공된다.There is provided a method and an apparatus for considering image quality degradation prediction information that predicts how much image quality degrades in application of TS.

화질 저하 예측 정보를 고려함으로써 사용자에 의해 입력된 화질 저하 범위 또는 기정의된 화질 저하 범위에 해당하는 단위 구간에 대해서만 TS를 적용하는 방법 및 장치가 제공된다.There is provided a method and an apparatus for applying a TS only to a unit section corresponding to an image quality degradation range input by a user or a predetermined image quality degradation range by considering image quality degradation prediction information.

도 1은 본 발명이 적용되는 부호화 장치의 일 실시예에 따른 구성을 나타내는 블록도이다.
도 2는 본 발명이 적용되는 복호화 장치의 일 실시예에 따른 구성을 나타내는 블록도이다.
도 3은 일 실시예에 따른 SURP의 동작을 설명한다.
도 4는 일 예에 따른 화질 평가 결과를 도시한다.
도 5는 일 실시예에 따른 특징 벡터(feature vector)의 추출의 절차를 셜명한다.
도 6은 일 예에 따른 공간적 임의의 측정을 위한 예측 모델을 나타낸다.
도 7은 일 예에 따른 SRM을 도시한다.
도 8은 일 예에 따른 SM을 도시한다.
도 9는 일 예에 따른 SCM을 도시한다.
도 10는 일 예에 따른 VSM을 도시한다.
도 11은 일 에에 따른 SIM을 도시한다.
도 12은 일 예에 따른 시간적 예측 모델의 생성에 사용되는 프레임 및 예측 대상 프레임 간의 관계를 도시한다.
도 13a는 일 예에 따른 연속된 3개의 프레임들 중 첫 번째의 프레임을 나타낸다.
도 13b는 일 예에 따른 연속된 3개의 프레임들 중 두 번째의 프레임을 나타낸다.
도 13c는 일 예에 따른 연속된 3개의 프레임들 중 세 번째 프레임을 나타낸다.
도 13d는 일 예에 따른 연속된 3개의 프레임들에 대한 시간적 임의 맵(Temporal Randomness Map; TRM)을 나타낸다.
도 13e는 일 예에 따른 연속된 3개의 프레임들에 대한 공간시간적 영향 맵(SpatioTemporal Influence Map; STIM)을 나타낸다.
도 13f는 일 예에 따른 연속된 3개의 프레임들에 대한 가중치가 부여된 공간시간적 영향 맵(Weighted SpatioTemporal Influence Map; WSTIM)을 나타낸다.
도 14는 일 예에 따른 FPS 별 화질 만족도를 나타낸다.
도 15는 일 예에 따른 GOP 별로 결정된 최적 프레임 율을 나타낸다.
도 16은 일 예에 따른 최적 프레임 율을 사용하는 GOP의 부호화 방법의 흐름도이다.
도 17은 일 예에 따른 최적 프레임 율의 결정 방법의 흐름도이다.
도 18은 일 예에 따른 시간적 식별자(temporal identifier)를 갖는 프레임들의 계층적인 구조를 나타낸다.
도 19는 일 예에 따른 화질 만족도를 포함하는 메시지를 나타낸다.
도 20은 일 예에 따른 화질 만족도 테이블의 인덱스를 포함하는 메시지를 나타낸다.
도 21은 일 예에 따른 화질 만족도들의 전체를 포함하는 메시지를 나타낸다.
도 22는 일 예에 따른 화질 만족도 정보의 생성 방법을 도시한다.
도 23a는 일 예에 따른 75% 이상의 화질 만족도를 유지하는 구성을 나타낸다.
도 23b는 일 예에 따른 75%의 화질 만족도를 기준으로 FPS의 변경의 여부를 결정하는 구성을 나타낸다.
도 24는 일 실시예에 따른 부호화 장치의 구조도이다.
도 25는 일 실시예에 따른 부호화 방법의 흐름도이다.
도 26은 일 예에 따른 동영상의 부호화를 수행하는 방법의 흐름도이다.
도 27은 일 실시예에 따른 복호화 장치의 구조도이다.
도 28은 일 실시예에 따른 복호화 방법의 흐름도이다.1 is a block diagram illustrating a configuration of an encoding apparatus to which the present invention is applied.
2 is a block diagram illustrating a configuration of a decoding apparatus to which the present invention is applied.
Figure 3 illustrates the operation of SURP according to one embodiment.
4 shows a picture quality evaluation result according to an example.
Figure 5 illustrates the procedure for extracting a feature vector according to one embodiment.
6 shows a prediction model for spatial arbitrary measurement according to an example.
7 shows an SRM according to an example.
Figure 8 shows an SM according to an example.
9 shows an SCM according to an example.
10 shows a VSM according to an example.
Fig. 11 shows a SIM according to Fig.
FIG. 12 shows a relationship between a frame used for generation of the temporal prediction model and a prediction target frame according to an example.
13A shows a first one of three consecutive frames according to an example.
13B shows a second one of three consecutive frames according to an example.
13C shows a third frame of three consecutive frames according to an example.
13D shows a Temporal Randomness Map (TRM) for three consecutive frames according to an example.
FIG. 13E shows a Spatio Temporal Influence Map (STIM) for three consecutive frames according to an example.
FIG. 13F shows a weighted Spatio Temporal Influence Map (WSTIM) for three consecutive frames according to an example.
FIG. 14 shows image quality satisfaction by FPS according to an example.
FIG. 15 shows an optimum frame rate determined for each GOP according to an example.
16 is a flowchart of a method of encoding a GOP using an optimum frame rate according to an example.
17 is a flowchart of a method of determining an optimal frame rate according to an example.
FIG. 18 shows a hierarchical structure of frames having temporal identifiers according to an example.
FIG. 19 shows a message including the image quality satisfaction according to an example.
FIG. 20 shows a message including an index of a picture quality satisfaction table according to an example.
FIG. 21 shows a message including the entirety of image quality satisfaction according to an example.
FIG. 22 shows a method of generating image quality satisfaction information according to an example.
FIG. 23A shows a configuration for maintaining image quality satisfaction of 75% or more according to an example.
FIG. 23B shows a configuration for determining whether to change the FPS based on 75% picture quality satisfaction according to an example.
24 is a structural diagram of an encoding apparatus according to an embodiment.
25 is a flowchart of a coding method according to an embodiment.
26 is a flowchart of a method of performing motion picture coding according to an example.
27 is a structural diagram of a decoding apparatus according to an embodiment.
28 is a flowchart of a decoding method according to an embodiment.

후술하는 예시적 실시예들에 대한 상세한 설명은, 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 실시예를 실시할 수 있기에 충분하도록 상세히 설명된다. 다양한 실시예들은 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 실시예의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 예시적 실시예들의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다.The following detailed description of exemplary embodiments refers to the accompanying drawings, which illustrate, by way of illustration, specific embodiments. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments. It should be understood that the various embodiments are different, but need not be mutually exclusive. For example, certain features, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the invention in connection with an embodiment. It is also to be understood that the location or arrangement of the individual components within each disclosed embodiment may be varied without departing from the spirit and scope of the embodiments. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the exemplary embodiments is to be limited only by the appended claims, along with the full scope of equivalents to which such claims are entitled, if properly explained.

도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다. 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.In the drawings, like reference numerals refer to the same or similar functions throughout the several views. The shape and size of the elements in the figures may be exaggerated for clarity.

어떤 구성요소(component)가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 상기의 2개의 구성요소들이 서로 간에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있으나, 상기의 2개의 구성요소들의 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 또한, 예시적 실시예들에서 특정 구성을 "포함"한다고 기술하는 내용은 상기의 특정 구성 이외의 구성을 배제하는 것이 아니며, 추가적인 구성이 예시적 실시예들의 실시 또는 예시적 실시예들의 기술적 사상의 범위에 포함될 수 있음을 의미한다.When it is mentioned that a component is "connected" or "connected" to another component, the two components may be directly connected or connected to each other, It is to be understood that other components may be present in the middle of the components. Also, in the exemplary embodiments, the description of "comprising" a specific configuration does not exclude a configuration other than the specific configuration, and the additional configuration is not limited to the implementation of the exemplary embodiments or the technical idea of the exemplary embodiments. Range. &Lt; / RTI >

제1 및 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기의 구성요소들은 상기의 용어들에 의해 한정되어서는 안 된다. 상기의 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하여 지칭하기 위해서 사용된다. 예를 들어, 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.The terms first and second, etc. may be used to describe various components, but the components should not be limited by the terms above. The above terms are used to distinguish one component from another. For example, without departing from the scope of the right, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

또한 실시예들에 나타나는 구성요소들은 서로 다른 특징적인 기능들을 나타내기 위해 독립적으로 도시되는 것으로, 각 구성요소가 분리된 하드웨어나 하나의 소프트웨어 구성 단위로만 이루어짐을 의미하지 않는다. 즉, 각 구성요소는 설명의 편의상 각각의 구성요소로 나열된 것이다. 예를 들면, 구성요소들 중 적어도 두 개의 구성요소들이 하나의 구성요소로 합쳐질 수 있다. 또한, 하나의 구성요소가 복수의 구성요소들로 나뉠 수 있다. 이러한 각 구성요소의 통합된 실시예 및 분리된 실시예 또한 본질에서 벗어나지 않는 한 권리범위에 포함된다.In addition, the components shown in the embodiments are shown independently to represent different characteristic functions, which does not mean that each component is composed of separate hardware or one software constituent unit. That is, each component is listed as each component for convenience of explanation. For example, at least two of the components may be combined into a single component. Also, one component can be divided into a plurality of components. The integrated embodiments and the separate embodiments of each of these components are also included in the scope of the right without departing from the essence.

또한, 일부의 구성요소는 본질적인 기능을 수행하는 필수적인 구성요소는 아니고 단지 성능을 향상시키기 위한 선택적 구성요소일 수 있다. 실시예들은 실시예의 본질을 구현하는데 필수적인 구성부만을 포함하여 구현될 수 있고, 예를 들면, 단지 성능 향상을 위해 사용되는 구성요소와 같은, 선택적 구성요소가 제외된 구조 또한 권리 범위에 포함된다.Also, some components are not essential components to perform essential functions, but may be optional components only to improve performance. Embodiments may be implemented only with components that are essential to implementing the essentials of the embodiments, and structures within which the optional components are excluded, such as, for example, components used only for performance enhancement, are also included in the scope of the right.

이하에서는, 기술분야에서 통상의 지식을 가진 자가 실시예들을 용이하게 실시할 수 있도록 하기 위하여, 첨부된 도면을 참조하여 실시예들을 상세히 설명하기로 한다. 실시예들을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 명세서의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings in order to facilitate embodiments of the present invention by those skilled in the art. In the following description of the embodiments, detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present disclosure rather unclear.

이하에서, 영상은 비디오(video)을 구성하는 하나의 픽쳐를 의미할 수 있으며, 비디오 자체를 나타낼 수도 있다. 예를 들면, "영상의 부호화 및/또는 복호화"는 "비디오의 부호화 및/또는 복호화"를 의미할 수 있으며, "비디오를 구성하는 영상들 중 하나의 영상의 부호화 및/또는 복호화"를 의미할 수도 있다.Hereinafter, an image may refer to one picture constituting a video, and may also represent the video itself. For example, "encoding and / or decoding of an image" may mean "encoding and / or decoding of video ", which means" encoding and / or decoding of one of the images constituting a video " It is possible.

이하에서, 실시예들에서 사용된 숫자는 일 예에 불과한 것으로, 실시예에서 명시된 값은 아닌 다른 값으로 대체될 수 있다.In the following, the numbers used in the embodiments are merely examples, and may be replaced with values other than the values specified in the embodiments.

실시예에서, 부호화에 적용되는 정보, 동작, 기능은 부호화에 대해 상응하는 방식으로 복호화에 대해서도 적용될 수 있다.
In an embodiment, the information, operations, and functions applied to the encoding may also be applied to decoding in a manner corresponding to the encoding.

전술된 것과 같은 인간의 시각 특성을 이용하여, 프레임 율을 적응적으로 자동으로 조정할 수 있다면, 다수의 사람들을 대상으로 인지 화질(perceptual quality)의 측면에서는 화질의 열화 없이 비디오의 데이터 량이 감소될 수 있다. 인간의 시각 특성을 HFR 비디오에 적용하여 비디오를 구간 별로 최적의 화면 재생율로 변환할 수 있다면 비디오의 데이터 량의 증가가 최소화될 수 있다. 따라서, HFR 비디오의 전송 및 저장 과정에서 발생하는 비용적인 문제 및 기술적인 문제가 해결될 수 있다.If the frame rate can be adaptively and automatically adjusted using the human visual characteristics as described above, the data amount of the video can be reduced without degrading the picture quality in terms of perceptual quality for a large number of people have. If the human visual characteristics are applied to the HFR video and the video can be converted to the optimal screen refresh rate for each section, the increase of the data amount of the video can be minimized. Accordingly, a cost problem and a technical problem that occur in transmission and storage of HFR video can be solved.

프레임 율의 적응적이고 자동적인 조정을 위해서는, 비디오를 HFR에서 저 프레임 율로 변경할 경우 나타나는 화질 차이가 어느 정도인가가 정확하게 예측될 수 있어야 한다. 이러한 예측의 기술은 비디오의 부호화 전처리(preprocessing) 과정, 부호화 과정, 전송 과정 및/또는 복호화 과정 중에서 유용하게 이용될 수 있다.For adaptive and automatic adjustment of the frame rate, it is necessary to be able to predict exactly how much the image quality difference occurs when changing the video from HFR to a low frame rate. This prediction technique can be usefully used in a preprocessing process, a coding process, a transmission process, and / or a decoding process of a video.

그러나, 종래의 기술에 의해서는, HFR의 비디오를 저 프레임 율의 비디오를 변환하였을 때 발생하는 화질 차이가 정확하게 예측될 수 없다.However, according to the conventional technique, the image quality difference which occurs when the video of the HFR is converted at a low frame rate can not be accurately predicted.

또한, 화질의 조정과 관련된 종래의 기술로서 시간적 스케일러빌러티(Temporal Scalability; TS) 기술이 있다. TS 기술은 비디오에 TS를 적용하였을 때 비디오에 대한 화질이 어떻게 변화할 지를 고려하지 않고 일률적으로 비디오의 프레임 율을 조정한다. 따라서, 비디오에 TS가 적용될 경우, 비디오의 화질이 크게 저하될 수 있다는 문제가 발생한다.In addition, there is a temporal scalability (TS) technique as a conventional technique related to adjustment of image quality. The TS technique uniformly adjusts the frame rate of the video without considering how the quality of the video changes when TS is applied to the video. Therefore, when TS is applied to video, there occurs a problem that the video quality may be significantly deteriorated.

아래의 실시예들에서는, 1) 화질 만족도를 예측하는 방법, 2) 화질 만족도의 예측을 통해 비디오를 최적의 저 프레임 율로 변경하는 방법 및 3) 화질 만족도의 예측에 의해 생성된 정보를 TS에 활용하는 방법이 설명된다.In the following embodiments, a method of predicting image quality satisfaction, a method of changing video to an optimal low frame rate through prediction of image quality satisfaction, and a method of using information generated by predicting image quality satisfaction in TS Is explained.

1) 화질 만족도를 예측하는 방법: HFR을 낮은 프레임 율로 변경할 경우 발생하는 화질 차이의 정도를 예측하는 화질 만족도 예측기가 설명된다.1) Method of predicting image quality satisfaction: A picture quality predictor that predicts the degree of image quality difference that occurs when changing HFR to a low frame rate is described.

2) 화질 만족도의 예측을 통해 비디오를 최적의 저 프레임 율로 변경하는 방법: 화질 만족도 예측기를 이용하여 고정된 HFR의 비디오를 기본 결정 단위 별로 최적의 낮은 프레임 율로 변경하여 부호화하는 부호화기가 설명된다.2) A method for changing video to an optimal low frame rate through prediction of image quality satisfaction: An encoder that changes the video of a fixed HFR to an optimal low frame rate and encodes the fixed HFR video by using a picture quality satisfaction predictor is explained.

3) 화질 만족도의 예측에 의해 생성된 정보를 TS에 활용하는 방법: 비디오의 부호화에 의해 생성되는 비디오 부호화 스트림은 화질 만족도 예측기에 의해 생성된 화질 만족도 정보를 포함할 수 있다. 화질 정보를 비디오 부호화 스트림의 전송 과정 또는 복호화 과정 중 TS를 위해 사용될 수 있다. 화질 만족도 정보에 의해 종래에는 가능하지 않았던 개선된 TS가 제공될 수 있다. 화질 만족도 정보에 의해 TS가 적용될 경우 비디오의 화질이 어떻게 변화할 지가 고려될 수 있다. 이러한 고려를 통해 화질 저하를 최소화하는 개선된 TS가 제공될 수 있다.
3) Method of utilizing information generated by predicting image quality satisfaction in TS: A video encoded stream generated by encoding a video may include image quality satisfaction information generated by a picture quality satisfaction predictor. The quality information may be used for TS during the transmission or decryption of the video encoding stream. An improved TS that has not been possible in the past due to image quality satisfaction information can be provided. It can be considered how the image quality of the video changes when the TS is applied by the image quality satisfaction information. This consideration can provide an improved TS that minimizes degradation in image quality.

우선, 실시예들에서 사용되는 용어를 설명한다.First, terms used in the embodiments will be described.

유닛(unit): "유닛"은 영상의 부호화 및 복호화의 단위를 나타낼 수 있다. 유닛 및 블록(block)의 의미들은 동일할 수 있다. 또한, 용어 "유닛" 및 "블록"은 서로 교체되어 사용될 수 있다.Unit: A "unit" can represent a unit of encoding and decoding of an image. The meanings of units and blocks may be the same. In addition, the terms "unit" and "block"

유닛(또는, 블록)은 샘플의 MxN 배열일 수 있다. M 및 N은 각각 양의 정수일 수 있다. 유닛은 흔히 2차원의 샘플의 배열을 의미할 수 있다. 샘플은 픽셀 또는 픽셀 값일 수 있다. The unit (or block) may be an MxN array of samples. M and N may be positive integers, respectively. A unit can often be an array of two-dimensional samples. The sample may be a pixel or a pixel value.

영상의 부호화 및 복호화에 있어서, 유닛은 하나의 영상의 분할에 의해 생성된 영역일 수 있다. 하나의 영상은 복수의 유닛들로 분할될 수 있다. 영상의 부호화 및 복호화에 있어서, 유닛의 종류에 따라서 유닛에 대한 기정의된 처리가 수행될 수 있다. 기능에 따라서, 유닛의 타입은 매크로 유닛(Macro Unit), 코딩 유닛(Coding Unit; CU), 예측 유닛(Prediction Unit; PU) 및 변환 유닛(transform Unit; TU) 등으로 분류될 수 있다. 하나의 유닛은 유닛에 비해 더 작은 크기를 갖는 하위 유닛으로 더 분할될 수 있다. In coding and decoding of an image, a unit may be an area generated by division of one image. One image may be divided into a plurality of units. In the coding and decoding of the image, predetermined processing on the unit may be performed depending on the type of the unit. Depending on the function, the type of the unit can be classified into a macro unit, a coding unit (CU), a prediction unit (PU), and a transform unit (TU). One unit may be further subdivided into smaller units having a smaller size than the unit.

블록 분할 정보는 유닛의 깊이(depth)에 관한 정보를 포함할 수 있다. 깊이 정보는 유닛이 분할되는 회수 및/또는 정도를 나타낼 수 있다. The block partitioning information may include information on the depth of the unit. The depth information may indicate the number and / or the number of times the unit is divided.

하나의 유닛은 트리 구조(tree structure)에 기반하여 깊이 정보(depth)를 가지면서 계층적으로 복수의 하위 유닛들로 분할될 수 있다. 말하자면, 유닛 및 상기의 유닛의 분할에 의해 생성된 하위 유닛은 노드 및 상기의 노드의 자식 노드에 각각 대응할 수 있다. 각각의 분할된 하위 유닛은 깊이 정보를 가질 수 있다. 유닛의 깊이 정보는 유닛이 분할된 회수 및/또는 정도를 나타내므로, 하위 유닛의 분할 정보는 하위 유닛의 크기에 관한 정보를 포함할 수도 있다. One unit can be divided hierarchically into a plurality of subunits having depth information based on a tree structure. That is to say, the unit and the lower unit generated by the division of the unit can correspond to the node and the child node of the node, respectively. Each divided subunit may have depth information. Since the depth information of the unit indicates the number and / or degree of division of the unit, the division information of the lower unit may include information on the size of the lower unit.

트리 구조에서, 가장 상위 노드는 분할되지 않은 최초의 유닛에 대응할 수 있다. 가장 상위 노드는 루트 노드(root node)로 칭해질 수 있다. 또한, 가장 상위 노드는 최소의 깊이 값을 가질 수 있다. 이 때, 가장 상위 노드는 레벨 0의 깊이를 가질 수 있다. In the tree structure, the top node can correspond to the first unit that has not been partitioned. The superordinate node may be referred to as a root node. Also, the uppermost node may have a minimum depth value. At this time, the uppermost node can have a level 0 depth.

레벨 1의 깊이를 갖는 노드는 최초의 유닛이 한 번 분할됨에 따라 생성된 유닛을 나타낼 수 있다. 레벨 2의 깊이를 갖는 노드는 최초의 유닛이 두 번 분할됨에 따라 생성된 유닛을 나타낼 수 있다. A node with a depth of level 1 can represent a unit created as the first unit is once partitioned. A node with a depth of level 2 may represent a unit created as the first unit is divided twice.

레벨 n의 깊이를 갖는 노드는 최초의 유닛이 n번 분할됨에 따라 생성된 유닛을 나타낼 수 있다. A node with a depth of level n may represent a unit created as the first unit is divided n times.

리프 노드는 가장 하위의 노드일 수 있으며, 더 분할될 수 없는 노드일 수 있다. 리프 노드의 깊이는 최대 레벨일 수 있다. 예를 들면, 최대 레벨의 기정의된 값은 3일 수 있다. The leaf node may be the lowest node, and may be a node that can not be further divided. The depth of the leaf node may be the maximum level. For example, the default value of the maximum level may be three.

변환 유닛(Transform Unit): 변환 유닛은 변환, 역변환, 양자화, 역양자화, 변환 계수 부호화, 및 변환 계수 복호화 등과 같은 잔차 신호(residual signal) 부호화 및/또는 잔여 신호 복호화에 있어서의 기본 유닛일 수 있다. 하나의 변환 유닛은 더 작은 크기를 갖는 다수의 변환 유닛들 분할될 수 있다.Transform Unit: A transform unit may be a base unit in residual signal coding and / or residual signal decoding such as transform, inverse transform, quantization, inverse quantization, transform coefficient coding, and transform coefficient decoding . One conversion unit can be divided into a plurality of conversion units having a smaller size.

예측 유닛(Prediction Unit): 예측 유닛은 예측 또는 보상(compensation)의 수행에 있어서의 기본 단위일 수 있다. 예측 유닛은 분할에 의해 다수의 파티션(partition)들이 될 수 있다. 다수의 파티션들 또한 예측 또는 보상의 수행에 있어서의 기본 단위일 수 있다. 예측 유닛의 분할에 의해 생성된 파티션 또한 예측 유닛일 수 있다.Prediction Unit: A prediction unit may be a basic unit in performing prediction or compensation. The prediction unit may be partitioned into a plurality of partitions. Multiple partitions may also be a base unit in performing prediction or compensation. The partition generated by the division of the prediction unit may also be a prediction unit.

복원된 이웃 유닛(Reconstructed Neighbor Unit): 복원된 이웃 유닛은 부호화 대상 유닛 또는 복호화 대상 유닛의 주변에 이미 부호화 또는 복호화되어 복원된 유닛일 수 있다. 복원된 이웃 유닛은 대상 유닛에 대한 공간적(spatial) 인접 유닛 또는 시간적(temporal) 인접 유닛일 수 있다.Reconstructed Neighbor Unit: The reconstructed neighboring unit may be a unit that has already been encoded or decoded around the encoding target unit or the target unit to be decoded. The restored neighboring unit may be a spatial adjacent unit or a temporally adjacent unit for the target unit.

예측 유닛 파티션: 예측 유닛 파티션은 예측 유닛이 분할된 형태를 의미할 수 있다.Predictive unit partition: The predictive unit partition may mean a type in which the predictive unit is divided.

파라미터 세트(Parameter Set): 파라미터 세트는 비트스트림 내의 구조(structure) 중 헤더(header) 정보에 해당할 수 있다. 예를 들면, 파라미터 세트는 시퀀스 파라미터 세트(sequence parameter set), 픽쳐 파라미터 세트(picture parameter set) 및 적응 파라미터 세트(adaptation parameter set) 등을 포함할 수 있다.Parameter Set: A parameter set may correspond to header information among structures in a bitstream. For example, the parameter set may include a sequence parameter set, a picture parameter set, and an adaptation parameter set.

율왜곡 최적화(ratedistortion optimization): 부호화 장치는 코딩 유닛의 크기, 예측 모드, 예측 유닛의 크기, 움직임 정보 및, 변환 유닛의 크기 등의 조합을 이용해서 높은 부호화 효율을 제공하기 위해 율왜곡 최적화를 사용할 수 있다.Ratedistortion optimization: The encoding apparatus uses rate distortion optimization to provide a high coding efficiency using a combination of the size of the coding unit, the prediction mode, the size of the prediction unit, the motion information, and the size of the conversion unit .

율왜곡 최적화 방식은 상기의 조합들 중에서 최적의 조합을 선택하기 위해 각 조합의 율왜곡 비용(ratedistortion cost)을 계산할 수 있다. 율왜곡 비용은 아래의 수학식 1을 이용하여 계산될 수 있다. 일반적으로 상기 율왜곡 비용이 최소가 되는 조합이 율왜곡 최적화 방식에 있어서의 최적의 조합으로 선택될 수 있다. The rate distortion optimization scheme may calculate the ratedistortion cost of each combination to select the optimal combination from among the combinations above. The rate distortion cost can be calculated using Equation (1) below. In general, the combination in which the rate distortion cost is minimized can be selected as the optimal combination in the rate distortion optimization method.

D는 왜곡을 나타낼 수 있다. D는 변환 블록 내에서 원래의 변환 계수들 및 복원된 변환 계수들 간의 차이 값들의 제곱들의 평균(mean square error)일 수 있다.D can represent distortion. D may be the mean square error of the original transform coefficients within the transform block and the difference values between the restored transform coefficients.

R은 율을 나타낼 수 있다. R은 관련된 문맥 정보를 이용한 비트 율을 나타낼 수 있다.R can represent the rate. R can represent the bit rate using related context information.

λ는 라그랑지안 승수(Lagrangian multiplier)를 나타낼 수 있다. R은 예측 모드, 움직임 정보 및 부호화 블록 플래그(coded block flag) 등과 같은 부호화 파라미터 정보뿐만 아니라, 변환 계수의 부호화에 의해 발생하는 비트도 포함할 수 있다.lambda can represent a Lagrangian multiplier. R may include not only encoding parameter information such as a prediction mode, motion information, and coded block flag, but also bits generated by encoding the transform coefficients.

부호화 장치는 정확한 D 및 R을 계산하기 위해 인터 예측 및/또는 인트라 예측, 변환, 양자화, 엔트로피 부호화, 역양자화, 역변환 등의 과정을 수행하는데, 이러한 과정은 부호화 장치에서의 복잡도를 크게 증가시킬 수 있다.The encoding apparatus performs inter prediction and / or intra prediction, conversion, quantization, entropy encoding, inverse quantization, and inverse transformation to calculate the correct D and R, and this process can greatly increase the complexity in the encoding apparatus have.

참조 픽쳐(reference picture): 참조 픽쳐는 인터 예측 또는 움직임 보상에 사용되는 영상일 수 있다. 참조 픽쳐는 인터 예측 또는 움직임 보상을 위해 대상 유닛이 참조하는 참조 유닛을 포함하는 픽쳐일 수 있다. 픽쳐 및 영상의 의미들은 동일할 수 있다. 또한, 용어 "픽쳐" 및 "영상"은 서로 교체되어 사용될 수 있다.Reference picture: The reference picture may be an image used for inter prediction or motion compensation. The reference picture may be a picture including a reference unit referred to by the target unit for inter prediction or motion compensation. The meanings of pictures and images may be the same. In addition, the terms "picture" and "image" may be used interchangeably.

참조 픽쳐 리스트(reference picture list): 참조 픽쳐 리스트는 인터 예측 또는 움직임 보상에 사용되는 참조 영상들을 포함하는 리스트일 수 있다. 참조 픽쳐 리스트의 종류는 리스트 조합(List Combined; LC), 리스트 0(List 0; L0) 및 리스트 1(List 1; L1) 등이 있을 수 있다.Reference picture list: The reference picture list may be a list including reference pictures used for inter prediction or motion compensation. The type of the reference picture list may be a list combination (LC), a list 0 (L0), and a list 1 (L1).

움직임 벡터(Motion Vector; MV): 움직임 벡터는 인터 예측에서 사용되는 2차원의 벡터일 수 있다. 예를 들면, MV는 (mv_x, mv_y)와 같은 형태로 표현될 수 있다. mv_x는 수평(horizontal) 성분을 나타낼 수 있고, mv_y 는 수직(vertical) 성분을 나타낼 수 있다.Motion Vector (MV): A motion vector may be a two-dimensional vector used in inter prediction. For example, MV can be expressed in the form (mv _x , mv _y ). mv _x may represent a horizontal component, and mv _y may represent a vertical component.

MV는 대상 픽쳐 및 참조 픽쳐 간의 오프셋(offset)을 나타낼 수 있다. The MV may indicate an offset between the target picture and the reference picture.

탐색 영역(search range): 탐색 영역은 인터 예측 중 MV에 대한 탐색이 이루어지는 2차원의 영역일 수 있다. 예를 들면, 탐색 영역의 크기는 MxN일 수 있다. M 및 N은 각각 양의 정수일 수 있다.Search range: The search area may be a two-dimensional area where an MV search is performed during inter prediction. For example, the size of the search area may be MxN. M and N may be positive integers, respectively.

기본 결정 단위(basic decision unit): 기본 결정 단위는 실시예에서 처리의 대상이 되는 기본 단위를 나타낼 수 있다. 기본 결정 단위는 복수의 프레임들일 수 있으며, 픽처의 그룹(Group Of Picture; GOP)일 수 있다.Basic decision unit: The basic decision unit may represent a basic unit to be processed in the embodiment. The basic decision unit may be a plurality of frames and may be a group of pictures (GOP).

픽처의 그룹(Group Of Picture; GOP): 실시예에서의 GOP의 의미는 비디오의 부호화에 있어서 일반적으로 사용되는 GOP의 의미와 동일할 수 있다. 말하자면, GOP의 시작 프레임은 아이(I) 프레임일 수 있으며, GOP의 시작 프레임 외 다른 프레임들은 I 프레임을 직접적으로 참조하는 프레임이거나 I 프레임을 간접적으로 참조하는 프레임일 수 있다.Group of Picture (GOP): The meaning of a GOP in the embodiment may be the same as the meaning of a GOP generally used in video coding. That is to say, the start frame of the GOP may be an eye (I) frame, and other frames than the start frame of the GOP may be a frame directly referring to the I frame or a frame indirectly referring to the I frame.

프레임 율(frame rate): 프레임 율은 비디오의 시간적인 해상도를 나타낼 수 있다. 프레임 율은 1 초 당 몇 장의 화면(들)이 재생(display)되는지를 의미할 수 있다.Frame rate: The frame rate can represent the temporal resolution of the video. The frame rate may indicate how many screen (s) are displayed per second.

초 당 프레임(Frape Per Second; FPS): FPS는 프레임 율을 나타내는 단위일 수 있다. 예를 들면, "30 FPS"는 1 초 당 30장의 프레임들이 재생됨을 나타낼 수 있다.Frape Per Second (FPS): FPS can be a unit of frame rate. For example, "30 FPS" may indicate that 30 frames per second are played.

고 프레임 율(High Frame Rate; HFR): 소정의 기준 값 이상의 프레임 율을 나타낼 수 있다. 예를 들면, 한국 및 미국의 기준으로는 60 이상의 FPS를 갖는 비디오가 HFR 비디오일 수 있고, 유럽의 기준으로는 50 이상의 FPS를 갖는 비디오가 HFR 비디오일 수 있다. 실시예들에서는 60 FPS의 비디오가 HFR 비디오로 예시된다.High Frame Rate (HFR): It can indicate a frame rate higher than a predetermined reference value. For example, video with a FPS of 60 or higher may be HFR video, and video with a FPS higher than 50 on a European basis may be HFR video. In the embodiments, video of 60 FPS is exemplified as HFR video.

최적 프레임 율(Optimal Frame Rate): 최적 프레임 율은 비디오의 내용(content)에 따라 결정되는 프레임 율일 수 있다. 최적 프레임 율은 각 기본 결정 단위 별로 결정될 수 있다. 최적 프레임 율의 이상으로 비디오 또는 기본 결정 단위의 프레임 율이 증가되어도, 사람들은 프레임 율의 증가에 따른 화질 차이를 느끼지 못할 수 있다. 말하자면, 최적 프레임 율은 사람들이 HFR에 비해 비디오의 화질의 저하를 느끼지 못하게 하는 최소의 프레임 율일 수 있다. 또한, 최적 프리임 율은, HFR 비디오의 FPS를 단계적으로 감소시키면서 원래의 HFR 비디오 및 감소된 FPS의 비디오 간의 화질 차이를 측정할 때, 화질 저하가 발생하기 직전의 FPS 일 수 있다. 여기에서, 화질 저하의 발생은 사용자에 의해 설정된 비율 또는 기설정된 비율 이상의 사람들이 화질 차이를 인식하는 것을 의미할 수 있다.Optimal Frame Rate: The optimal frame rate may be a frame rate determined by the content of the video. The optimal frame rate can be determined for each basic decision unit. Even if the frame rate of video or basic decision unit is increased beyond the optimum frame rate, people may not feel the difference in image quality due to the increase of the frame rate. That is to say, the optimal frame rate can be the minimum frame rate that prevents people from feeling degraded in video quality compared to HFR. Also, the optimal preliminary rate may be the FPS immediately before the image quality degradation occurs when the image quality difference between the original HFR video and the reduced FPS video is measured while gradually reducing the FPS of the HFR video. Here, the occurrence of image quality degradation may mean that the ratio set by the user or a predetermined ratio or more of people recognize the image quality difference.

호모지니어스 비디오(homogenous video): 호모지니어스 비디오는 비디오를 구성하는 모든 기본 결정 단위들이 동일하거나 유사한 특성을 갖는 비디오일 수 있다.Homogenous video: Homogeneous video can be a video with all the basic decision units that make up the video have the same or similar characteristics.

화질 저하 인지율: 화질 저하 인지율은 기본 결정 단위 별로 HFR 비디오의 프레임 율이 감소되는 경우에 화질 저하를 인지할 수 있는 사람의 비율을 나타낸다. 통상적으로, FPS는 1/2 배로 단계적으로 변경될 수 있다. 화질 저하 인지율의 단위는 %일 수 있다.Perceived degradation of image quality: Perceived degradation of image quality indicates the percentage of people who can perceive degradation in image quality when the frame rate of HFR video is decreased per basic decision unit. Typically, the FPS may be changed stepwise by a factor of two. The unit of the perceived degradation of image quality may be%.

화질 저하 인지율 예측기: 화질 저하 인지율 예측기는 화질 저하 인지율을 예측하는 예측기일 수 있다.Picture quality degradation awareness predictor: The picture quality degradation awareness predictor can be a predictor to predict the picture quality degradation awareness.

화질 만족도(satisfied user ratio): 화질 만족도는 기본 결정 단위 별로 HFR 비디오의 FPS가 감소되는 경우에 HFR 비디오의 화질에 비해 감소된 프레임 율의 비디오의 화질에 만족하는 사람의 비율일 수 있다. 또는, 화질 만족도는 기본 결정 단위 별로 HFR 미디어의 FPS가 감소되는 경우에 HFR 비디오의 화질 및 감소된 FPS의 비디오의 화질 간의 차이를 인지하지 못하는 사람의 비율일 수 있다.Satisfied user ratio: Image quality satisfaction may be the ratio of people who are satisfied with the video quality of HFR video with reduced frame rate compared to the HFR video quality when the FPS of HFR video is decreased according to the basic decision unit. Alternatively, the image quality satisfaction may be a ratio of persons who do not perceive the difference between the image quality of the HFR video and the image quality of the video of the reduced FPS when the FPS of the HFR media is decreased by the basic decision unit.

통상적으로, FPS는 1/2 배로 단계적으로 변경될 수 있다. 여기에서, 사람의 비율은 1a/b의 값을 백분율로 표현한 것일 수 있다. a는 주관적 화질 평가를 통해 HFR 비디오의 화질 및 감소된 FPS 의 비디오의 화질 간에 차이가 있다고 평가한 사람들의 수일 수 있다. b는 전체의 사람들의 수일 수 있다. 예를 들면, 60 FPS의 비디오 및 30 FPS의 비디오에 대해, 모두 5명의 사람들이 주관적 화질 평가를 하였고, 5명 중 3명이 60 FPS의 비디오 및 30 FPS의 비디오에게 동일한 화질 점수를 부여하였고, 남은 2명은 60 FPS의 비디오에 비해 30 FPS의 비디오에게 더 낮은 화질 점수를 부여하였으면, 화질 만족도는 12/5 = 60%일 수 있다. 이 때, 화질 저하 인지율은 40%일 수 있다. 즉, 화질 만족도 및 화질 저하 인지율의 합은 100%일 수 있다.Typically, the FPS may be changed stepwise by a factor of two. Here, the ratio of people can be expressed as a percentage of the value of 1a / b. a may be the number of people who evaluated that there is a difference between the image quality of the HFR video and the video of the reduced FPS through a subjective image quality assessment. b can be the total number of people. For example, for a video of 60 FPS and a video of 30 FPS, 5 people gave a subjective image quality evaluation, 3 of 5 gave the same image quality score to 60 FPS video and 30 FPS video, If two people gave a lower image quality score to 30 FPS video compared to 60 FPS video, the image quality could be 12/5 = 60%. At this time, the image quality degradation rate may be 40%. That is, the sum of the image quality satisfaction and the image quality degradation perception rate may be 100%.

화질 만족도 예측기(Satisfied User Ratio Predictor; SURP): SURP는 화질 만족도를 예측하는 예측기일 수 있다. 화질 저하 인지율 예측기 및 SURP의 구성 및 동작은 서로 유사할 수 있으며, 양자는 단순히 출력 값에 있어서만 차이룰 가질 수 있다.Satisfied User Ratio Predictor (SURP): SURP can be a predictor of image quality satisfaction. The configuration and operation of the image quality degradation awareness predictor and the SURP may be similar to each other, and both can only be subtracted in the output value.

선택 정보: 실시예에서, 화질 정보는 화질 저하 인지율 또는 화질 만족도일 수 있다. 용어 "선택 정보"는 용어 "인지 화질 정보"와 동일한 의미로 사용될 수 있고, 양 용어들은 서로 교체하여 사용될 수 있다.
Selection information: In an embodiment, the image quality information may be an image quality degradation rate or image quality satisfaction. The term "selection information" may be used interchangeably with the term "cognitive image quality information ", and both terms may be used interchangeably.

부호화 장치 및 복호화 장치의 기본적인 동작Basic operations of the encoding apparatus and the decoding apparatus

도 1은 본 발명이 적용되는 부호화 장치의 일 실시예에 따른 구성을 나타내는 블록도이다.1 is a block diagram illustrating a configuration of an encoding apparatus to which the present invention is applied.

부호화 장치(100)는 비디오 부호화 장치 또는 영상 부호화 장치일 수 있다. 비디오는 하나 이상의 영상들을 포함할 수 있다. 부호화 장치(100)는 비디오의 하나 이상의 영상들을 시간에 따라 순차적으로 부호화할 수 있다.The encoding apparatus 100 may be a video encoding apparatus or an image encoding apparatus. The video may include one or more images. The encoding apparatus 100 may sequentially encode one or more images of the video according to time.

도 1을 참조하면, 부호화 장치(100)는 인터 예측부(110), 인트라 예측부(120), 스위치(115), 감산기(125), 변환부(130), 양자화부(140), 엔트로피 부호화부(150), 역양자화부(160), 역변환부(170), 가산기(175), 필터부(180) 및 참조 픽쳐 버퍼(190)를 포함할 수 있다.1, an encoding apparatus 100 includes an inter prediction unit 110, an intra prediction unit 120, a switch 115, a subtractor 125, a transform unit 130, a quantization unit 140, An inverse quantization unit 160, an inverse transform unit 170, an adder 175, a filter unit 180, and a reference picture buffer 190. [

부호화 장치(100)는 입력 영상에 대해 인트라 모드 및/또는 인터 모드로 부호화를 수행할 수 있다. 입력 영상은 현재 부호화의 대상인 현재 영상으로 칭해질 수 있다.The encoding apparatus 100 may perform encoding in an intra mode and / or an inter mode on an input image. The input image can be referred to as the current image which is the object of the current encoding.

또한, 부호화 장치(100)는 입력 영상에 대한 부호화를 통해 부호화의 정보를 포함하는 비트스트림을 생성할 수 있고, 생성된 비트스트림을 출력할 수 있다.Also, the encoding apparatus 100 can generate a bitstream including encoding information through encoding of the input image, and output the generated bitstream.

인트라 모드가 사용되는 경우, 스위치(115)는 인트라로 전환될 수 있다. 인터 모드가 사용되는 경우, 스위치(115)는 인터로 전환될 수 있다.When the intra mode is used, the switch 115 can be switched to intra. When the inter mode is used, the switch 115 can be switched to the inter.

부호화 장치(100)는 입력 영상의 입력 블록에 대한 예측 블록을 생성할 수 있다. 또한, 부호화 장치(100)는 예측 블록이 생성된 후, 입력 블록 및 예측 블록의 차분(residual)을 부호화할 수 있다. 입력 블록은 현재 부호화의 대상인 현재 블록으로 칭해질 수 있다.The encoding apparatus 100 may generate a prediction block for an input block of an input image. Also, after the prediction block is generated, the encoding device 100 may code the residual of the input block and the prediction block. The input block may be referred to as the current block which is the current encoding target.

예측 모드가 인트라 모드인 경우, 인트라 예측부(120)는 현재 블록의 주변에 있는, 이미 부호화된 블록의 픽셀 값을 참조 픽셀로서 이용할 수 있다. 인트라 예측부(120)는 참조 픽셀을 이용하여 현재 블록에 대한 공간적 예측을 수행할 수 있고, 공간적 예측을 통해 현재 블록에 대한 예측 샘플들을 생성할 수 있다.When the prediction mode is the intra mode, the intraprediction unit 120 can use the pixel value of the already coded block around the current block as a reference pixel. The intra predictor 120 can perform spatial prediction of a current block using a reference pixel and generate prediction samples of a current block through spatial prediction.

인터 예측부(110)는 움직임 예측부 및 움직임 보상부를 포함할 수 있다.The inter prediction unit 110 may include a motion prediction unit and a motion compensation unit.

예측 모드가 인터 모드인 경우, 움직임 예측부는, 움직임 예측 과정에서 참조 영상으로부터 현재 블록과 가장 매치가 잘 되는 영역을 검색할 수 있고, 현재 블록 및 검색된 영역에 대한 움직임 벡터를 도출할 수 있다. 참조 영상은 참조 픽쳐 버퍼(190)에 저장될 수 있으며, 참조 영상에 대한 부호화 및/또는 복호화가 처리될 때 참조 픽쳐 버퍼(190)에 저장될 수 있다.When the prediction mode is the inter mode, the motion prediction unit can search the reference image for the best match with the current block in the motion estimation process, and derive the motion vector for the current block and the searched area. The reference picture may be stored in the reference picture buffer 190 and may be stored in the reference picture buffer 190 when the coding and / or decoding of the reference picture is processed.

움직임 보상부는 움직임 벡터를 이용하는 움직임 보상을 수행함으로써 예측 블록을 생성할 수 있다. 여기에서, 움직임 벡터는 인터 예측에 사용되는 2차원 벡터일 수 있다. 또한 움직임 벡터는 현재 영상 및 참조 영상 간의 오프셋(offset)을 나타낼 수 있다.The motion compensation unit may generate a prediction block by performing motion compensation using a motion vector. Here, the motion vector may be a two-dimensional vector used for inter prediction. The motion vector may also indicate an offset between the current image and the reference image.

감산기(125)는 입력 블록 및 예측 블록의 차분인 잔차 블록(residual block)을 생성할 수 있다. 잔차 블록은 잔차 신호로 칭해질 수도 있다.The subtractor 125 may generate a residual block that is a difference between the input block and the prediction block. The residual block may be referred to as a residual signal.

변환부(130)는 잔차 블록에 대해 변환(transform)을 수행하여 변환 계수를 생성할 수 있고, 생성된 변환 계수(transform coefficient)를 출력할 수 있다. 여기서, 변환 계수는 잔차 블록에 대한 변환을 수행함으로써 생성된 계수 값일 수 있다. 변환 생략(transform skip) 모드가 적용되는 경우, 변환부(130)는 잔차 블록에 대한 변환을 생략할 수도 있다.The transforming unit 130 may perform a transform on the residual block to generate a transform coefficient, and output the generated transform coefficient. Here, the transform coefficient may be a coefficient value generated by performing a transform on the residual block. When the transform skip mode is applied, the transforming unit 130 may omit the transform for the residual block.

변환 계수에 양자화를 적용함으로써 양자화된 변환 계수 레벨(transform coefficient level)이 생성될 수 있다. 이하, 실시예들에서는 양자화된 변환 계수 레벨도 변환 계수로 칭해질 수 있다.A quantized transform coefficient level can be generated by applying quantization to the transform coefficients. Hereinafter, in the embodiments, the quantized transform coefficient level may also be referred to as a transform coefficient.

양자화부(140)는 변환 계수를 양자화 파라미터에 맞춰 양자화함으로써 양자화된 변환 계수 레벨(quantized transform coefficient level)을 생성할 수 있다. 양자화부(140)는 생성된 양자화된 변환 계수 레벨을 출력할 수 있다. 이때, 양자화부(140)에서는 양자화 행렬을 사용하여 변환 계수를 양자화할 수 있다.The quantization unit 140 may generate a quantized transform coefficient level by quantizing the transform coefficient according to the quantization parameter. The quantization unit 140 may output the generated quantized transform coefficient levels. At this time, the quantization unit 140 can quantize the transform coefficient using the quantization matrix.

엔트로피 부호화부(150)는, 양자화부(140)에서 산출된 값들 및/또는 부호화 과정에서 산출된 부호화 파라미터 값들 등에 기초하여 확률 분포에 따른 엔트로피 부호화를 수행함으로써 비트스트림(bitstream)을 생성할 수 있다. 엔트로피 부호화부(150)는 생성된 비트스트림을 출력할 수 있다.The entropy encoding unit 150 can generate a bitstream by performing entropy encoding according to the probability distribution based on the values calculated by the quantization unit 140 and / or the encoding parameter values calculated in the encoding process . The entropy encoding unit 150 may output the generated bitstream.

엔트로피 부호화부(150)는 영상의 픽셀의 정보 외에 영상의 복호화를 위한 정보에 대한 엔트로피 부호화를 수행할 수 있다. 예를 들면, 영상의 복호화를 위한 정보는 신택스 엘리먼트(syntax element) 등을 포함할 수 있다. The entropy encoding unit 150 may perform entropy encoding on information for decoding an image in addition to information on pixels of the image. For example, the information for decoding the image may include a syntax element or the like.

부호화 파라미터는 부호화 및/또는 복호화를 위해 요구되는 정보일 수 있다. 부호화 파라미터는 부호화 장치에서 부호화되어 복호화 장치로 전달되는 정보를 포함할 수 있고, 부호화 혹은 복호화 과정에서 유추될 수 있는 정보를 포함할 수 있다. 예를 들면, 복호화 장치로 전달되는 정보로서, 신택스 엘리먼트가 있다.The encoding parameters may be information required for encoding and / or decoding. The encoding parameter may include information that is encoded in the encoding apparatus and transferred to the decoding apparatus, and may include information that can be inferred in the encoding or decoding process. For example, as information transmitted to the decoding apparatus, there is a syntax element.

예를 들면, 부호화 파라미터는 예측 모드, 움직임 벡터, 참조 픽쳐 색인(index), 부호화 블록 패턴(pattern), 잔차 신호 유무, 변환 계수, 양자화된 변환 계수, 양자화 파라미터, 블록 크기, 블록 분할(partition) 정보 등의 값 또는 통계를 포함할 수 있다. 예측 모드는 인트라 예측 모드 또는 인터 예측 모드를 가리킬 수 있다.For example, the coding parameters include a prediction mode, a motion vector, a reference picture index, a coding block pattern, a residual signal presence, a transformation coefficient, a quantized transform coefficient, a quantization parameter, a block size, Information, or the like. The prediction mode may indicate an intra prediction mode or an inter prediction mode.

잔차 신호는 원 신호 및 예측 신호 간의 차이(difference)를 의미할 수 있다. 또는, 잔차 신호는 원신호 및 예측 신호 간의 차이를 변환(transform)함으로써 생성된 신호일 수 있다. 또는, 잔차 신호는 원 신호 및 예측 신호 간의 차이를 변환 및 양자화함으로써 생성된 신호일 수 있다. 잔차 블록은 블록 단위의 잔차 신호일 수 있다.The residual signal may mean a difference between the original signal and the prediction signal. Alternatively, the residual signal may be a signal generated by transforming the difference between the original signal and the prediction signal. Alternatively, the residual signal may be a signal generated by transforming and quantizing the difference between the original signal and the prediction signal. The residual block may be a residual signal on a block basis.

엔트로피 부호화가 적용되는 경우, 높은 발생 확률을 갖는 심볼에 적은 수의 비트가 할당될 수 있고, 낮은 발생 확률을 갖는 심볼에 많은 수의 비트가 할당될 수 있다. 이러한 할당을 통해 심볼이 표현됨에 따라, 부호화의 대상인 심볼들에 대한 비트열(bitstring)의 크기가 감소될 수 있다. 따라서, 엔트로피 부호화를 통해서 영상 부호화의 압축 성능이 향상될 수 있다. When entropy coding is applied, a small number of bits can be assigned to a symbol having a high probability of occurrence, and a large number of bits can be assigned to a symbol having a low probability of occurrence. As the symbol is represented through this allocation, the size of the bit string for the symbols to be encoded can be reduced. Therefore, the compression performance of the image encoding can be improved through the entropy encoding.

또한, 엔트로피 부호화를 위해 지수 골롬(exponential golomb), 문맥적응형 가변 길이 코딩(ContextAdaptive Variable Length Coding; CAVLC) 및 문맥적응형 이진 산술 코딩(ContextAdaptive Binary Arithmetic Coding; CABAC) 등과 같은 부호화 방법이 사용될 수 있다. 예를 들면, 엔트로피 부호화부(150)는 가변 길이 부호화(Variable Lenghth Coding/Code; VLC) 테이블을 이용하여 엔트로피 부호화를 수행할 수 있다. 예를 들면, 엔트로피 부호화부(150)는 대상 심볼에 대한 이진화(binarization) 방법을 도출할 수 있다. 또한, 엔트로프 부호화부(150)는 대상 심볼/빈(bin)의 확률 모델(probability model)을 도출할 수 있다. 엔트로피 부호화부(150)는 도출된 이진화 방법 또는 확률 모델을 사용하여 엔트로피 부호화를 수행할 수도 있다.In addition, for entropy encoding, encoding methods such as exponential golomb, context adaptive variable length coding (CAVLC), and context adaptive binary arithmetic coding (CABAC) may be used . For example, the entropy encoding unit 150 may perform entropy encoding using a Variable Length Coding / Code (VLC) table. For example, the entropy encoding unit 150 may derive a binarization method for a target symbol. In addition, the entropy encoding unit 150 may derive a probability model of a target symbol / bin. The entropy encoding unit 150 may perform entropy encoding using the derived binarization method or probability model.

부호화 장치(100)에 의해 인터 예측을 통한 부호화를 수행되기 때문에, 부호화된 현재 영상은 이후에 처리되는 다른 영상(들)에 대하여 참조 영상으로서 사용될 수 있다. 따라서, 부호화 장치(100)는 부호화된 현재 영상을 다시 복호화할 수 있고, 복호화된 영상을 참조 영상으로서 저장할 수 있다. 복호화를 위해 부호화된 현재 영상에 대한 역양자화 및 역변환이 처리될 수 있다.Since encoding is performed through inter prediction by the encoding apparatus 100, the encoded current image can be used as a reference image for another image (s) to be processed later. Accordingly, the encoding apparatus 100 can decode the encoded current image again, and store the decoded image as a reference image. The inverse quantization and inverse transform of the current encoded image for decoding can be processed.

양자화된 계수는 역양자화부(160)에서 역양자화될(inversely quantized) 수 있고. 역변환부(170)에서 역변환될(inversely transformed) 수 있다. 역양자화 및 역변환된 계수는 가산기(175)를 통해 예측 블록과 합해질 수 있다, 역양자화 및 역변환된 계수 및 예측 블록을 합함으로써 복원(reconstructed) 블록이 생성될 수 있다.The quantized coefficients may be inversely quantized in the inverse quantization unit 160. And may be inversely transformed in the inverse transform unit 170. [ The dequantized and inverse transformed coefficients may be combined with a prediction block via an adder 175. A reconstructed block may be generated by summing the dequantized and inverse transformed coefficients and the prediction block.

복원 블록은 필터부(180)를 거칠 수 있다. 필터부(180)는 디블록킹 필터(deblocking filter), 에스에이오(Sample Adaptive Offset; SAO), 에이엘어프(Adaptive Loop Filter; ALF) 중 적어도 하나 이상을 복원 블록 또는 복원 픽쳐에 적용할 수 있다. 필터부(180)는 적응적(adaptive) 인루프(inloop) 필터로 칭해질 수도 있다.The restoration block may pass through the filter unit 180. The filter unit 180 may apply at least one of a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) to a reconstructed block or a reconstructed picture. The filter unit 180 may be referred to as an adaptive inloop filter.

디블록킹 필터는 블록들 간의 경계에서 발생한 블록 왜곡을 제거할 수 있다. SAO는 코딩 에러에 대한 보상을 위해 픽셀 값에 적정 오프셋(offset) 값을 더할 수 있다. ALF는 복원 영상 및 원래의 영상을 비교한 값에 기반하여 필터링을 수행할 수 있다. 필터부(180)를 거친 복원 블록은 참조 픽쳐 버퍼(190)에 저장될 수 있다.
The deblocking filter can remove block distortion occurring at the boundary between the blocks. SAO may add a proper offset value to the pixel value to compensate for coding errors. The ALF can perform filtering based on the comparison between the restored image and the original image. The reconstructed block having passed through the filter unit 180 may be stored in the reference picture buffer 190.

도 2는 본 발명이 적용되는 복호화 장치의 일 실시예에 따른 구성을 나타내는 블록도이다.2 is a block diagram illustrating a configuration of a decoding apparatus to which the present invention is applied.

복호화 장치(200)는 비디오 복호화 장치 또는 영상 복호화 장치일 수 있다.The decoding apparatus 200 may be a video decoding apparatus or an image decoding apparatus.

도 2를 참조하면, 복호화 장치(200)는 엔트로피 복호화부(210), 역양자화부(220), 역변환부(230), 인트라 예측부(240), 인터 예측부(250), 가산기(255), 필터부(260) 및 참조 픽쳐 버퍼(270)를 포함할 수 있다.2, the decoding apparatus 200 includes an entropy decoding unit 210, an inverse quantization unit 220, an inverse transform unit 230, an intra prediction unit 240, an inter prediction unit 250, an adder 255, A filter unit 260, and a reference picture buffer 270.

복호화 장치(200)는 부호화 장치(100)에서 출력된 비트스트림을 수신할 수 있다. 복호화 장치(200)는 비트스트림에 대하여 인트라 모드 및/또는 인터 모드의 복호화를 수행할 수 있다. 또한, 복호화 장치(200)는 복호화를 통해 복원 영상을 생성할 수 있고, 생성된 복원 영상을 출력할 수 있다.The decoding apparatus 200 can receive the bit stream output from the encoding apparatus 100. [ The decoding apparatus 200 may perform decoding of an intra mode and / or an inter mode with respect to a bit stream. Also, the decoding apparatus 200 can generate a reconstructed image through decoding and output the generated reconstructed image.

예를 들면, 복호화에 사용되는 예측 모드에 따른 인트라 모드 또는 인터 모드로의 전환은 스위치에 의해 이루어질 수 있다. 복호화에 사용되는 예측 모드가 인트라 모드인 경우 스위치가 인트라로 전환될 수 있다. 복호화에 사용되는 예측 모드가 인터 모드인 경우 스위치가 인터로 전환될 수 있다.For example, switching to an intra mode or an inter mode according to a prediction mode used for decoding may be performed by a switch. When the prediction mode used for decoding is the intra mode, the switch can be switched to intra. When the prediction mode used for decoding is the inter mode, the switch can be switched to the inter.

복호화 장치(200)는 입력된 비트스트림으로부터 복원된 잔차 블록(reconstructed residual block)을 획득할 수 있고, 예측 블록을 생성할 수 있다. 복원된 잔차 블록 및 예측 블록이 획득되면, 복호화 장치(200)는 복원된 잔차 블록과 및 예측 블록을 더함으로써 복원 블록을 생성할 수 있다.The decoding apparatus 200 can obtain a reconstructed residual block from the input bitstream and generate a prediction block. Once the reconstructed residual block and prediction block are obtained, the decoding apparatus 200 can generate a reconstruction block by adding the reconstructed residual block and the prediction block.

엔트로피 복호화부(210)는 확률 분포에 기초하여 비트스트림에 대한 엔트로피 복호화를 수행함으로써 심볼들을 생성할 수 있다. 생성된 심볼들은 양자화된 계수(quantized coefficient) 형태의 심볼을 포함할 수 있다. 여기에서, 엔트로피 복호화 방법은 상술된 엔트로피 부호화 방법과 유사할 수 있다. 예를 들면, 엔트로피 복호화 방법은 상술된 엔트로피 부호화 방법의 역과정일 수 있다.The entropy decoding unit 210 may generate the symbols by performing entropy decoding on the bitstream based on the probability distribution. The generated symbols may include symbols in the form of quantized coefficients. Here, the entropy decoding method may be similar to the above-described entropy encoding method. For example, the entropy decoding method may be the inverse of the above-described entropy encoding method.

양자화된 계수는 역양자화부(220)에서 역양자화될 수 있다. 또한, 역양자화된 계수는 역변환부(230)에서 역변환될 수 있다. 양자화된 계수가 역양자화 및 역변환 된 결과로서, 복원된 잔차 블록이 생성될 수 있다. 이때, 역양자화부(220)는 양자화된 계수에 양자화 행렬을 적용할 수 있다.The quantized coefficients may be inversely quantized in the inverse quantization unit 220. Also, the inverse quantized coefficient may be inversely transformed by the inverse transform unit 230. As a result that the quantized coefficients are inversely quantized and inversely transformed, reconstructed residual blocks can be generated. At this time, the inverse quantization unit 220 may apply the quantization matrix to the quantized coefficients.

인트라 모드가 사용되는 경우, 인트라 예측부(240)는 현재 블록 주변의 이미 부호화된 블록의 픽셀 값을 이용하는 공간적 예측을 수행함으로써 예측 블록을 생성할 수 있다.When the intra mode is used, the intra prediction unit 240 can generate a prediction block by performing spatial prediction using the pixel value of the already coded block around the current block.

인터 예측부(250)는 움직임 보상부를 포함할 수 있다. 인터 모드가 사용되는 경우, 움직임 보상부는 움직임 벡터 및 참조 영상을 이용하는 움직임 보상을 수행함으로써 예측 블록을 생성할 수 있다. 참조 영상은 참조 픽쳐 버퍼(270)에 저장될 수 있다.The inter prediction unit 250 may include a motion compensation unit. When the inter mode is used, the motion compensation unit can generate a prediction block by performing motion compensation using a motion vector and a reference image. The reference picture may be stored in the reference picture buffer 270.

복원된 잔차 블록 및 예측 블록은 가산기(255)를 통해 더해질 수 있다. 가산기(255)는 복원된 잔차 블록 및 예측 블록을 더함으로써 복원 블록을 생성할 수 있다.The reconstructed residual block and the prediction block may be added through an adder 255. The adder 255 may generate the restored block by adding the restored residual block and the predicted block.

복원 블록은 필터부(260)를 거칠 수 있다. 필터부(260)는 디블록킹 필터, SAO 및 ALF 중 적어도 하나 이상을 복원 블록 또는 복원 픽쳐에 적용할 수 있다. 필터부(260)는 복원 영상을 출력할 수 있다. 복원 영상은 참조 픽쳐 버퍼(270)에 저장되어 인터 예측에 사용될 수 있다.
The restoration block may pass through the filter unit 260. The filter unit 260 may apply at least one of a deblocking filter, SAO, and ALF to a restoration block or a restoration picture. The filter unit 260 may output a restored image. The restored image is stored in the reference picture buffer 270 and can be used for inter prediction.

화질 만족도 예측기(Satisfied User Ratio Predictor; SURP)Satisfied User Ratio Predictor (SURP)

도 3은 일 실시예에 따른 SURP의 동작을 설명한다.Figure 3 illustrates the operation of SURP according to one embodiment.

SURP(300)의 입력은 기본 결정 단위일 수 있다. 도 3에서 도시된 것과 같이, 기본 결정 단위는 GOP일 수 있다. SURP(300)의 출력은 변경된 FPS의 GOP에 대한 예측 화질 만족도일 수 있다.The input of the SURP 300 may be a basic decision unit. As shown in FIG. 3, the basic decision unit may be a GOP. The output of the SURP 300 may be a prediction picture quality satisfaction for the GOP of the changed FPS.

예를 들면, 입력되는 GOP는 60 FPS의 8 개의 프레임들을 갖는 GOP일 수 있다.For example, the input GOP may be a GOP having eight frames of 60 FPS.

예측 화질 만족도를 계산하기 위해, SURP(300)는 GOP의 FPS를 변경할 수 있다. 예를 들면, SURP(300)는 X FPS의 GOP를 Y FPS의 GOP로 변경할 수 있다. 예를 들면, Y는 aX이며, a는 1/2일 수 있다. 여기에서, "1/2"는 단지 예시적인 값으로, a는 1/4, 1/8 및 1/16 등과 같이 1/2ⁿ의 값을 가질 수 있다. n은 1 이상의 정수일 수 있다.To calculate the predicted image quality satisfaction, the SURP 300 may change the FPS of the GOP. For example, the SURP 300 may change the GOP of X FPS to the GOP of Y FPS. For example, Y may be aX and a may be 1/2. Here, "1/2" is only an exemplary value, and a may have a value of 1/2 ⁿ , such as 1/4, 1/8 and 1/16. n may be an integer of 1 or more.

X는 변경 전의 GOP의 FPS를 나타낼 수 있다. Y는 변경 후의 GOP의 FPS를 나타낼 수 있다. X=2ⁿ일 때, Y는 X/2, X/2², ... X/2ⁿ 중의 하나의 값일 수 있다. 예를 들면, X의 값이 60일 때, Y의 값은 30일 수 있다. 즉, GOP는 60 FPS에서 30 FPS로 변환될 수 있다.X can represent the FPS of the GOP before the change. Y can indicate the FPS of the changed GOP. When X = 2 ⁿ , Y may be one of X / 2, X / 2 ² , ... X / 2 ⁿ . For example, when the value of X is 60, the value of Y may be 30. That is, the GOP can be converted from 60 FPS to 30 FPS.

예측 화질 만족도는, GOP의 FPS가 60에서 30으로 변경되었다는 가정 하에, 변경 전의 GOP의 화질 및 변경 후의 GOP의 화질 간의 화질 차이를 인지하지 못하는 사람들의 비율에 대한 예측 값일 수 있다. 또는, 예측 화질 만족도는, GOP의 FPS가 60에서 30으로 변경되었다는 가정 하에, 변경 전의 GOP의 화질에 비해 변경 후의 GOP의 화질에도 만족하는 사람들의 비율에 대한 예측 값일 수 있다.The predicted image quality satisfaction may be a predicted value of the percentage of people who do not recognize the image quality difference between the image quality of the GOP before the change and the image quality of the changed GOP on the assumption that the FPS of the GOP has been changed from 60 to 30. [ Alternatively, the predicted image quality satisfaction may be a predicted value of the ratio of people who are satisfied with the image quality of the changed GOP as compared with the image quality of the GOP before the change, on the assumption that the FPS of the GOP is changed from 60 to 30.

SURP(300)는 기계 학습을 통해 예측 화질 만족도를 생성할 수 있다.The SURP 300 can generate predicted image quality satisfaction through machine learning.

이하에서, FPS의 변경 전의 GOP를 소스(source) GOP로 칭할 수 있다. 또한, FPS의 변경 후의 GOP를 타겟(target) GOP로 칭할 수 있다. 또한, FPS의 변경 전의 비디오를 소스(source) 비디오로 칭할 수 있다. 또한, FPS의 변경 후의 비디오를 타겟(target) 비디오로 칭할 수 있다.
Hereinafter, the GOP before the FPS change can be referred to as a source GOP. In addition, the GOP after the FPS change can be referred to as a target GOP. In addition, the video before the change of the FPS can be referred to as a source video. In addition, the video after the FPS change can be referred to as a target video.

기계 학습을 통한 예측 화질 만족도의 생성에 대한 고려 사항.Considerations for the generation of predicted image quality satisfaction through machine learning.

실시예에서, SURP(300)는 GOP 별로 FPS를 변경할 때, FPS의 변경에 의해 얼만큼의 화질 저하가 발생하는 가를 측정하기 위해 기계 학습(machine learning)을 사용할 수 있다.In an embodiment, SURP 300 may use machine learning to measure how much image degradation occurs by changing the FPS when changing the FPS per GOP.

기계 학습을 위해서는, 훈련 데이터(training data가 요구된다. 말하자면, 기계 학습을 위한 그라운드 트루스(ground truth)가 요구될 수 있다. 실시예에서는, 인간의 인지 화질을 유지하는 것이 목적 중의 하나이므로, 인간의 인지 화질에 대한 데이터를 획득하기 위해 주관적 화질 평가 실험을 통해서 데이터를 획득하는 방법이 이용될 수 있다.For machine learning, training data is required, that is, a ground truth for machine learning may be required. In the embodiment, since maintaining the human perception quality is one of the goals, A method of acquiring data through a subjective image quality evaluation experiment may be used to acquire data on the perceived quality of the image.

여기에서, 주관적 화질 평가를 통해 인지 화질에 대한 데이터를 획득하는 것은 단지 일 실시예일 수 있다. 주관적 화질 평가를 통해 인지 화질을 직접 측정하는 대신, SURP(300)는 인지 화질을 모델링한 메져(measure)를 사용하여 인지 화질을 측정할 수 있다. 예를 들면, 메져는 구조적 유사도(Structural SIMilarity; SSIM) 또는 비디오 품질 메트릭(Video Quality Metric; VQM) 등일 수 있다.Here, obtaining the data on the perceived image quality through the subjective image quality evaluation may be only one embodiment. Instead of directly measuring the cognitive quality through the subjective quality assessment, the SURP 300 can measure the cognitive quality using a measure that models the cognitive quality. For example, the sensor may be a Structural SIMilarity (SSIM) or a Video Quality Metric (VQM).

통상적으로, 주관적 화질로서 최종적으로 획득되는 인지 화질의 값로서, 주로 평균 의견 점수(Mean Opinion Score; MOS) 값이 사용된다. MOS를 실시예의 주관적 화질 평가에 적용함에 있어서, 아래의 1) 내지 3)과 같은 문제가 발생할 수 있다.Typically, the Mean Opinion Score (MOS) value is mainly used as the value of the perceived image quality finally obtained as the subjective image quality. In applying the MOS to the subjective image quality evaluation of the embodiment, the following problems 1) to 3) may occur.

1) 첫 번째로, 종래의 사용자가 점수를 부여하는 방식에서는, 사용자에 따라서 점수의 기준이 서로 간에 상이할 수 있다. 또한, 심지어, 점수를 부여하는 평가 과정에서, 동일한 비디오에 대해 동일인이 서로 다른 점수들을 부여하는 경우도 종종 발생할 수 있다.1) Firstly, in the manner in which a conventional user gives a score, the criterion of the score according to the user may be different from each other. In addition, even in the evaluation process of assigning scores, it is often possible that the same person gives different scores to the same video.

2) 두 번째로, 비디오들의 MOS 값들은 서로 상이하기 때문에, 서로 상이한 MOS 값들이 기계 학습을 위해 직접적으로 이용되기 어렵다는 문제가 발생할 수 있다. 이러한 문제에 대한 해결을 위해, SURP(300)는, MOS 대신, 상대적인 인지 화질 정보를 사용할 수 있다. 예를 들면, 상대적인 인지 화질 정보로서 MOS 차이(Difference MOS; DMOS)가 있다. 만약, 소스 비디오의 MOS 값 및 타겟 비디오의 MOS 값 간의 차이가 주관적 화질 평가에 이용된다면, 이러한 주관적 화질 평가의 결과가 기계 학습에 활용될 수 있다. 이러한 주관적 화질 평가를 위해서는 자격 연속 품질 척도(DoubleStimulus Continuous QualityScale; DSCQS) 및 이중 자격 열화 척도(Double Stimulus Impairment Scale; DSIS) 등과 같은 모든 평가 항목에서 소스 비디오에 대한 화질 평가가 요구될 수 있다. 따라서, 이러한 화질 평가에 의해 화질 평가의 실험의 시간이 증가하는 문제가 발생할 수 있다. 2) Secondly, since the MOS values of videos are different from each other, there may arise a problem that MOS values different from each other are difficult to be directly used for machine learning. To solve this problem, the SURP 300 may use relative cognitive image quality information instead of MOS. For example, there is a Difference MOS (DMOS) as relative cognitive quality information. If the difference between the MOS value of the source video and the MOS value of the target video is used for subjective image quality evaluation, the result of such subjective image quality evaluation can be utilized for machine learning. For such subjective image quality assessment, an evaluation of the quality of the source video may be required for all evaluation items, such as the DoubleStimulus Continuous Quality Scale (DSCQS) and the Double Stimulus Impairment Scale (DSIS). Therefore, there may arise a problem that the time for the experiment of image quality evaluation is increased by such image quality evaluation.

3) 세 번째로, 종래의 화질 측정 방법은 각 비디오 시퀀스 별로 8초 이상의 측정이 이루어지도록 권고한다. 즉, 화질 측정의 결과는 시퀀스 레벨의 결과라고 볼 수 있다. 기본 결정 단위에 대한 판별을 위한 기계 학습이 이러한 결과를 이용하여 이루어진다면, 좋은 성능이 획득되지 못할 수 있다. 3) Third, the conventional picture quality measurement method recommends that a measurement of 8 seconds or more is performed for each video sequence. That is, the result of the image quality measurement can be regarded as a result of the sequence level. If machine learning for the determination of the basic decision unit is made using these results, good performance may not be obtained.

따라서, 일 실시예에서는, 기계 학습에 적합한 그라운드 트루스를 획득하기 위한 주관적 화질 평가 방볍 및 화질 평가에 사용되는 비디오 시퀀스의 조건이 제안된다.
Therefore, in one embodiment, a subjective picture quality evaluation method for obtaining a ground truth suitable for machine learning and a condition of a video sequence used for picture quality evaluation are proposed.

기계 학습Machine learning

SURP(300)는 기본 결정 단위 별로 최적의 FPS를 결정할 수 있다. 기본 결정 단위 별로 FPS가 결정되기 위해서는, 이러한 결정을 위한 그라운드 트루스의 데이터가 획득되어야 할 수 있다.The SURP 300 can determine an optimal FPS for each basic decision unit. In order for the FPS to be determined by the basic decision unit, the ground truth data for such a decision may have to be obtained.

이하의 설명에서는, 편의상 GOP가 기본 결정 단위인 것으로 간주되었으며, GOP의 크기는 8인 것으로 간주되었다. 말하자면, 하나의 GOP는 8개의 프레임들을 포함할 수 있다. 이러한 기본 결정 단위 및 GOP의 크기에 대한 가정은 단지 예시적인 것이다.In the following description, a GOP is regarded as a basic decision unit for convenience, and the size of the GOP is regarded as 8. That is to say, one GOP may contain 8 frames. The assumptions about the size of these basic decision units and GOPs are merely exemplary.

그라운드 트루스의 획득 및 주관적 화질 평가 등에 관련된 아래의 설명은 다른 종류의 기본 결정 단위 및 다른 크기의 기본 결정 단위에도 동일 또는 유사한 방식으로 적용될 수 있다.The following description related to the acquisition and subjective image quality evaluation of the ground truth can be applied to the basic decision units of different kinds and basic decision units of different sizes in the same or similar manner.

주관적 화질 평가는 시퀀스 레벨에서만 평가를 수행할 수 있다. 반면, 기계 학습을 위해서는 GOP의 단위에 대한 평가 결과가 요구된다. 이러한 차이를 극복하기 위해서, 비디오 시퀀스의 모든 GOP들이 동일하거나 유사한 내용적(content) 특성 만을 포함하도록 하기 위해 화질 평가에 사용되는 비디오 시퀀스의 길이가 조정될 수 있다. 말하자면, 비디오의 모든 GOP들의 특성들은 서로 유사할 수 있다. 실시예에서, 비디오의 모든 GOP들의 특성들이 서로 유사한 비디오를 호모지니어스 비디오라고 칭할 수 있다.Subjective quality assessment can only be performed at the sequence level. On the other hand, evaluation of the unit of GOP is required for machine learning. To overcome this difference, the length of the video sequence used for image quality evaluation may be adjusted to ensure that all GOPs of the video sequence contain only the same or similar content characteristics. That is to say, the characteristics of all the GOPs of the video may be similar to each other. In an embodiment, a video similar in character to all the GOPs of the video may be referred to as homogeneous video.

호모지니어스 비디오의 길이는 기정의된 값으로 제한될 수 있다. 예를 들면, 호모지니어스 비디오의 길이는 5초가 초과하지 않도록 조정될 수 있다.The length of the homogeneous video can be limited to a predetermined value. For example, the length of the homogeneous video may be adjusted to not exceed 5 seconds.

이러한 유사한 특성들의 GOP들로 구성된 호모지니어스 비디오가 사용되기 때문에, 시퀀스 레벨에서 획득된 화질 평가 결과가 GOP의 단위의 화질 정보로 사용되는 것이 가능할 수 있다. 또한, SURP(300)는 이러한 화질 정보를 사용하는 기계 학습을 통해 SURP(300)의 성능을 향상시킬 수 있다.
Since homogeneous video composed of GOPs of these similar characteristics is used, it may be possible that the image quality evaluation result obtained at the sequence level is used as the image quality information of the unit of the GOP. In addition, the SURP 300 can improve the performance of the SURP 300 through machine learning using the image quality information.

도 4는 일 예에 따른 화질 평가 결과를 도시한다.4 shows a picture quality evaluation result according to an example.

주관적 화질 평가의 일 예로, 소스 비디오는 60 FPS의 HFR 비디오일 수 있고, 에이치디(HD) 해상도의 비디오일 수 있다. 타겟 비디오는 30 FPS의 비디오일 수 있다.As an example of a subjective picture quality assessment, the source video may be a 60 FPS HFR video, or may be HD video. The target video may be 30 FPS video.

SURP(300)는 소스 비디오의 시간적 해상도를 1/n로 낮춰서 타겟 비디오를 생성할 수 있다. n은 2일 수 있다. 또는, n은 2^m일 수 있고, m은 1 이상의 정수일 수 있다.The SURP 300 may generate the target video by lowering the temporal resolution of the source video to 1 / n. n may be 2. Alternatively, n may be 2 ^m , and m may be an integer of 1 or more.

SURP(300)는 소스 비디오에 프레임 드롭(frame drop)을 적용함으로써 타겟 비디오를 생성할 수 있다. 프레임 드롭된 소스 비디오에서 기정의된 프레임을 제거하는 것일 수 있다. 예를 들면, 프레임 드롭은 소스 비디오의 프레임들 중 짝수 번째의 프레임(들)을 제거하는 것일 수 있다.The SURP 300 may generate a target video by applying a frame drop to the source video. It may be to remove the default frame from the frame dropped source video. For example, the frame drop may be to remove the even-numbered frame (s) of the frames of the source video.

주관적 화질 평가 방법으로서, 페어와이즈 비교(pairwise comparison) 방식이 사용될 수 있다. As a subjective image quality evaluation method, a pairwise comparison method can be used.

페어와이즈 비교 방식은 각 평가 문항 별로, 2 개의 비디오들을 평가자에게 보여주고, 2 개의 비디오들 중 어떤 비디오의 화질이 더 좋은가를 질의하는 방법일 수 있다. 평가자는 각 평가 문항 별로 2개의 비디오들 중에 더 좋은 화질의 비디오를 선택할 수 있다. 또는, 페어와이즈 비교 방식은 각 평가 문항 별로, 2 개의 비디오들 중 제1 비디오의 화질이 더 좋은지, 제2 비디오의 화질이 더 좋은지, 아니면 2개의 비디오들의 화질들이 같은 지를 질의할 수도 있다. 말하자면, 평가자는 각 평가 문항 별로 3개의 항목들 중 하나를 선택할 수 있다. 실시예에서는, 3개의 항목들을 사용하는 방식이 예시되었다.The fair-wise comparison method can be a method of showing two videos to the evaluator for each evaluation item, and querying which one of the two videos has better image quality. The evaluator can select a better video among the two videos for each evaluation item. Alternatively, the fairness comparison scheme may query whether the quality of the first video of the two videos is better, the quality of the second video is better, or the quality of the two videos is the same, for each evaluation item. That is to say, the evaluator can select one of three items for each evaluation item. In the embodiment, a method of using three items has been exemplified.

평가 문항 별로 3개의 항목들 중에서 하나가 선택되는 방식은, 1) 2개의 비디오들의 화질들이 유사하고, 2) 상대적으로 적은 평가자들이 실험에 참여한 경우에서, 보다 정확한 화질 평가 결과를 도출할 수 있다.The way in which one of the three items is selected for each evaluation item is as follows: 1) the quality of two videos is similar; and 2) a relatively small evaluator participates in the experiment.

화질 평가에 있어서, 각 평가 문항 별로 고 FPS의 비디오 및 저 FPS의 비디오의 보여지는 순서는 랜덤(random)하게 조정될 수 있다. 이러한 랜덤한 조정은 평가자가 2개의 비디오들 중 어떤 것이 더 큰 FPS의 비디오인 가에 대한 선입견을 가지지 못하게 할 할 수 있다.In the picture quality evaluation, the displayed order of the high FPS video and low FPS video for each evaluation item can be adjusted randomly. This random adjustment can prevent the evaluator from having a preconceived notion of which of the two videos is a larger FPS video.

또한, 비디오 시퀀스의 길이가 소정의 기준 값 이하인 경우, 각 평가 문항 별로 2개의 비디오들이 2 회씩 또는 2 회 이상씩 평가자에게 보여질 수 있다. 예를 들면, 비디오 A 및 비디오 B는 A, B, A, B의 순서로 평가자에게 보여질 수 있다.Also, if the length of the video sequence is less than or equal to a predetermined reference value, two videos may be shown to the evaluator two times or two or more times for each evaluation item. For example, video A and video B can be viewed by the evaluator in the order A, B, A,

도 4에서, x 축 상의 바(bar)는 화질 평가의 대상인 비디오를 나타낸다. x 축 밑의 문자열은 비디오의 명칭을 나타낸다. y 축은 3개의 항목들의 평가자에게 선택된 비율들을 나타낸다. 3개의 항목들은 "고 화질(즉, 60 FPS)의 비디오의 화질이 더 좋다", "양 비디오들의 화질들이 같다" 및 "저 화질(즉, 30 FPS)의 비디오의 화질이 더 좋다"이다. 각 항목의 선택된 비율은, 전체 평가자들 중 각 항목을 선택한 평가자의 비율을 나타낼 수 있다. 비율의 단위는 퍼센트(%)일 수 있다.In Fig. 4, a bar on the x-axis represents video that is the object of image quality evaluation. The string under the x-axis represents the name of the video. The y-axis represents the ratios selected for the evaluator of the three items. The three items are "better picture quality of high picture quality (ie, 60 FPS)", "better picture quality of both videos" and "lower picture quality (ie, 30 FPS)". The selected percentage of each item can represent the percentage of the evaluators who selected each item among the total evaluators. The unit of rate can be a percentage (%).

항목들 중 "양 비디오들의 화질들이 같다" 및 "저 화질 비디오의 화질이 더 좋다"가 선택된 것은, 비디오의 변환 후에도 화질이 유지된 것으로 간주될 수 있다. 말하자면, 항목들 중 "양 비디오들의 화질들이 같다" 및 "저 화질 비디오의 화질이 더 좋다"가 선택된 것은, 변환된 비디오의 화질 또한 평가자를 만족시킨다는 것을 의미할 수 있다. 말하자면, 평가자들 중 "고 화질 비디오의 화질이 더 좋다"의 항목을 선택한 평가자만이 2개의 비디오들의 화질들 간의 차이를 느꼈다고 간주될 수 있다.Of the items, "the quality of both videos is the same" and "the image quality of the low-quality video is better" can be regarded as the image quality is maintained even after the video is converted. That is, among the items, "the picture quality of both videos is the same" and "the picture quality of the low-quality video is better" is selected, which means that the picture quality of the converted video also satisfies the evaluator. That is to say, only the evaluator who has selected the item of "the image quality of the high-quality video is better" among the evaluators may be considered to have felt the difference between the image qualities of the two videos.

따라서, "양 비디오들의 화질들이 같다"가 선택된 비율 및 "저 화질 비디오의 화질이 더 좋다"가 선택된 비율의 합이 2개의 비디오들의 화질들 간의 차이를 인식하지 못하는 사람의 비율을 나타내는 "화질 만족도"로서 사용될 수 있다. "화질 만족도"의 단위는 퍼센트(%)일 수 있다.Therefore, the sum of the ratios of "the video quality of both videos is the same" and "the image quality of the lower quality video is better" is selected as the ratio of the persons who do not recognize the difference between the image qualities of the two videos. " The unit of "image quality satisfaction" may be a percentage (%).

도 4에서는, 17 개의 비디오 시퀀스들이 화질 차이를 인식하지 못하는 사람의 비율이 높은 순으로 정렬되었다. 도 4에 따르면, 7 개의 비디오 시퀀스에 대해서는 참가자의 75% 이상의 사람들이 고 화질 비디오 및 저 화질 비디오의 화질들 간의 차이를 느끼지 못한다. In FIG. 4, 17 video sequences are arranged in descending order of the percentage of people who do not recognize the image quality difference. According to Fig. 4, for seven video sequences, more than 75% of participants do not feel the difference between the image quality of high-definition video and low-definition video.

전술된 것과 유사하게, HFR의 소스 비디오로부터 다양한 타겟 비디오들이 생성될 수 있다. 예를 들면, 타겟 비디오들의 FPS는 소스 비디오의 FPS의 1/4 및 1/8 등일 수 있다. 이러한 타겟 비디오들의 각각 및 소스 비디오에 대해서도 화질 만족도가 측정될 수 있다.Similar to the above, various target videos can be generated from the source video of the HFR. For example, the FPS of the target videos may be 1/4 and 1/8 of the FPS of the source video. Quality of video quality can also be measured for each of these target videos and source video.

이러한 주관적 화질 평가의 결과는 기계 학습을 위한 그라운드 트루스로서 이용될 수 있다. 그라운드 트루스의 데이터의 품질은 평가자의 수가 증가할수록 향상될 수 있다. 또한, 그라운드 트루스의 데이터의 품질은 평가의 대상인 비디오 시퀀스의 개수가 증가할수록 향상될 수 있다.The result of this subjective image quality evaluation can be used as a ground trace for machine learning. The quality of the ground truth data can be improved as the number of evaluators increases. Further, the quality of the data of the ground truth can be improved as the number of video sequences to be evaluated increases.

페어와이즈 비교가 아닌 DSCQS가 주관적 화질 평가를 위해 사용된 경우, 1) DSCQS의 "고 화질 비디오의 오피니언 점수(opinion score)가 저 화질 비디오의 오피니언 점수보다 더 높은 경우"는 페어와이즈 비교의 "고 화질 비디오의 화질이 더 좋다"에 대응할 수 있다. 2) DSCQS의 "고 화질 비디오의 오피니언 점수(opinion score)가 저 화질 비디오의 오피니언 점수가 같은 경우"는 페어와이즈 비교의 "양 비디오들의 화질들이 같다"에 대응할 수 있다. 3) DSCQS의 "고 화질 비디오의 오피니언 점수(opinion score)가 저 화질 비디오의 오피니언 점수보다 더 낮은 경우"는 페어와이즈 비교의 "저 화질 비디오의 화질이 더 좋다"에 대응할 수 있다. 이러한 대응의 관계를 통해, DSCQS가 사용된 경우에도 전술된 페어와이즈 비교가 사용된 경우와 동일한 방식으로 화질 만족도가 측정될 수 있다.
When DSCQS is used for subjective image quality evaluation rather than fair-wise comparison, 1) DSCQS "opinion score of high-quality video is higher than opinion score of low-quality video" means "fairness comparison" The quality of the picture quality video is better ". 2) DSCQS 'opinion score of high-quality video is equal to opinion score of low-quality video' can correspond to 'image quality of both videos is equal to pairwise comparison'. 3) DSCQS's "opinion score of high quality video is lower than opinion score of low quality video" can correspond to "better quality of low quality video" of the pairwise comparison. Through this correspondence relationship, even when the DSCQS is used, the image quality satisfaction can be measured in the same manner as in the case where the above-described pairwise comparison is used.

특징 벡터Feature vector

SURP(300)는 기계 학습을 이용하는 예측기일 수 있다. 기계 학습의 일 예로, SURP(300)는 서포트 벡터 머신(Support Vector Machine; SVM)을 이용할 수 있다. The SURP 300 may be a predictor using machine learning. As an example of the machine learning, the SURP 300 can use a support vector machine (SVM).

비디오의 FPS가 변경되었을 때 비디오의 화질의 저하를 예측하기 위해서는 비디오의 화질에 영향을 미치는 특징이 특징 벡터로서 추출될 수 있어야 한다. 이하에서는 공간적 마스킹 효과(spatial masking effect), 시간적 마스킹 효과(temporal masking effect), 주목(salient) 영역 및 대비 민감도(constract sensitivity) 등의 인간의 시각 특성을 반영하는 다양한 특징 벡터 추출 방법이 설명된다.In order to predict the deterioration of the video quality when the FPS of the video is changed, features that affect the video quality should be extracted as the feature vector. Various feature vector extraction methods that reflect human visual characteristics such as spatial masking effect, temporal masking effect, salient region, and contrast sensitivity are described below.

SVM은, 입력 프레임의 전체 보다는, 예측을 위해 요구되는 입력 프레임의 정보를 보다 더 잘 표현하는 특징 벡터를 일반적으로 사용할 수 있다. 즉, SVM의 예측 성능은 어떤 정보를 특징 벡터로서 사용하는 가에 의해 좌우될 수 있다.The SVM can generally use a feature vector that better represents the information of the input frame required for prediction, rather than the entirety of the input frame. That is, the prediction performance of the SVM can be influenced by what information is used as a feature vector.

실시예에서, 특징 벡터가 인지 화질의 특성을 반영하기 위해, 특징 벡터는 비디오의 부분이 인지 화질에 적은 영향을 미치는지 아니면 인지 화질에 큰 영향을 미치는 지가 고려되도록 설계될 수 있다.In an embodiment, in order for the feature vector to reflect the quality of the cognitive quality, the feature vector may be designed to consider whether the portion of the video has a small impact on the cognitive quality or a large impact on the cognitive quality.

예를 들면, 비디오 중 인지 화질에 미치는 영향이 적은 부분은 공간적 마스킹 효과가 큰 부분 및 시간적 마스킹 효과가 큰 부분일 수 있다. 비디오 중 인지 화질에 미치는 영향이 큰 부분은 대비 민감도(Contrast Sensitivity; CS)가 높은 부분일 수 있다. 또한, 시각적 주목도(Visual Saliency; VS) 또한 인지 화질에 큰 영향을 미칠 수 있다. 따라서, 특징 벡터의 설계에 있어서 VS도 고려될 수 있다.
For example, a part of the video which has little influence on the picture quality can be a part having a large spatial masking effect and a part having a large temporal masking effect. A large part of the video that affects the image quality may be a part with high Contrast Sensitivity (CS). In addition, Visual Saliency (VS) can also have a significant impact on perceived quality. Therefore, VS in the design of the feature vector can also be considered.

도 5는 일 실시예에 따른 특징 벡터(feature vector)의 추출의 절차를 설명한다.FIG. 5 illustrates a procedure of extraction of a feature vector according to an embodiment.

단계(510)에서, SURP(300)는 각 프레임에 대해 공간적 임의 맵(Spatial Randomness Map; SRM)을 생성할 수 있다.At step 510, the SURP 300 may generate a Spatial Randomness Map (SRM) for each frame.

공간적 마스킹(spatial masking) 효과를 특징 벡터에 반영하기 위해 SRM이 사용될 수 있다. 시각적으로 잘 인지되지 않는 영역은 주로 임의(Randomness)가 높은 영역일 수 있다. 따라서, 공간적 임의(Spatial Randoness; SR)의 계산을 통해, 프레임 중 공간적 마스킹 영역이 결정될 수 있다. SRM can be used to reflect the spatial masking effect to the feature vector. The area that is not visually perceived can be a region with high randomness. Thus, through the calculation of the spatial randomness (SR), the spatial masking region in the frame can be determined.

SR의 영역의 특성은 주변의 다른 영역의 특성과 상이할 수 있다. 이러한 상이함은 주변의 영역의 정보로부터는 SR의 영역의 예측되기 어렵다는 것을 나타낼 수 있다. 따라서, SR을 측정하기 위해, 도 6과 같이 주변 영역 X으로부터 중앙 영역 Y가 예측될 수 있다.
The characteristics of the region of the SR may be different from those of the surrounding regions. This disparity may indicate that the area of the SR is hard to predict from the information of the surrounding area. Therefore, in order to measure the SR, the central region Y can be predicted from the peripheral region X as shown in Fig.

도 6은 일 예에 따른 공간적 임의의 측정을 위한 예측 모델을 나타낸다.6 shows a prediction model for spatial arbitrary measurement according to an example.

도 6에서는 중앙의 Y 픽셀이 주변의 X 픽셀들로부터 예측되는 것이 도시되었다.
In FIG. 6, it is shown that the center Y pixel is predicted from the surrounding X pixels.

다시 도 5를 참조한다.See FIG. 5 again.

Y에 대한 최적 예측은 아래의 수학식 2와 같이 표현될 수 있다.The optimal prediction for Y can be expressed as Equation (2) below.

u는 공간적 위치를 나타낼 수 있다. H는 주변의 X 값들로부터 Y 값에 대한 최적의 예측을 제공하는 변환 행렬일 수 있다. 최소 평균 에러의 최적화 방법이 사용될 경우, H는 아래의 수학식 3과 같이 표현될 수 있다.u can represent a spatial position. H may be a transformation matrix that provides an optimal prediction of the Y value from the surrounding X values. If an optimization method of minimum mean error is used, H can be expressed as Equation 3 below.

R_xy는 X 및 Y의 공분산(crosscorrelation) 행렬일 수 있다.R _xy may be a crosscorrelation matrix of X and Y.

R_x는 X 자신에 대한 자기 상관(auto correlation) 행렬일 수 있다.R _x may be an auto correlation matrix for X itself.

근사된 슈도인버스 행렬(approximated pseudoinverse matrix) 기법을 이용하여, R_x의 역행렬은 아래의 수학식 4에 따라 획득될 수 있다.Using an approximated pseudoinverse matrix technique, the inverse of R _x can be obtained according to Equation (4) below.

전술된 수학식 2, 수학식 3 및 수학식 4에 따라 아래의 수학식 5이 성립할 수 있다.The following equation (5) can be established according to the above-mentioned expressions (2), (3) and (4).

SURP(300)는 수학식 5에 기반하여 프레임의 각 픽셀 별로 SRM을 획득할 수 있다.
The SURP 300 may obtain the SRM for each pixel of the frame based on Equation (5).

도 7은 일 예에 따른 SRM을 도시한다.7 shows an SRM according to an example.

도 7에서, 밝은 픽셀은 주변의 픽셀로부터 예측이 잘 되지 않는 픽셀을 나타낼 수 있다. 말하자면, 밝은 픽셀은 공간적 임의가 높은 영역, 즉 공간적 마스킹 효과가 큰 영역을 나타낼 수 있다.
In Figure 7, a bright pixel may represent a pixel that is not predictable from surrounding pixels. That is to say, a bright pixel can represent a region having a high spatial randomness, that is, a region having a large spatial masking effect.

다시 도 5를 참조한다.See FIG. 5 again.

단계(515)에서, SURP(300)는 각 프레임에 대해 에지 맵(Edge Map: EM)을 생성할 수 있다.In step 515, the SURP 300 may generate an edge map (EM) for each frame.

SURP(300)는 각 프레임에 대해 소벨 에지(sobel edge) 연산을 사용하여 에지 맵을 생성할 수 있다.SURP 300 may generate an edge map using a sobel edge operation for each frame.

단계(520)에서, SURP(300)는 각 프레임에 대해 평탄 맵(Smoothness Map; SM)을 생성할 수 있다.In step 520, the SURP 300 may generate a smoothness map (SM) for each frame.

SM은 프레임의 배경이 평탄한 영역인지 여부를 판별하기 위해 사용될 수 있다.The SM may be used to determine whether the background of the frame is a flat area.

SURP(300)는 SRM의 블록들에 대해여 각 블록의 평탄 값을 계산할 수 있다. 각 블록의 평탄 값은 아래의 수학식 6에 의해 계산될 수 있다.The SURP 300 can calculate the flatness value of each block over the SRM blocks. The flatness value of each block can be calculated by Equation (6) below.

N_tc는 블록 내에서 기준 값보다 더 낮은 공간적 임의를 갖는 픽셀의 개수일 수 있다. 여기에서, 기준 값보다 더 낮은 공간적 임의는 낮은 복잡도(low complexity)를 의미할 수 있다.N _tc may be the number of pixels having a spatial randomness lower than the reference value in the block. Here, spatial randomness lower than the reference value may mean low complexity.

W_b ²는 블록의 면적, 즉 블록 내의 픽셀들의 개수를 나타낼 수 있다. 예를 들면, W_b는 32일 수 있다. W_b의 값이 32인 것은, 32x32의 블록이 사용된다는 것을 나타낼 수 있다.W _b ² can represent the area of the block, i.e., the number of pixels in the block. For example, W _b may be 32. A value of W _b of 32 may indicate that a 32x32 block is used.

SURP(300)는 각 블록의 내부의 모든 픽셀들의 값을 상기의 블록의 평탄 값으로 설정함으로써 SM을 생성할 수 있다.
The SURP 300 can generate the SM by setting the values of all the pixels inside each block to the flatness values of the blocks.

도 8은 일 예에 따른 SM을 도시한다.Figure 8 shows an SM according to an example.

SM에서 블록 별로 동일한 픽셀 값이 설정될 수 있다.In the SM, the same pixel value can be set for each block.

SM에서, 블록이 더 밝을수록, 배경은 더 평탄할 수 있다. 말하자면, 밝은 블록은 평탄한 배경을 나타낼 수 있다. 밝은 블록의 에지는 어두운 블록의 에지에 비해 더 잘 인지될 수 있다.
In SM, the brighter the block, the smoother the background. That is to say, a bright block can represent a flat background. The edge of the bright block can be perceived better than the edge of the dark block.

다시 도 5를 참조한다.See FIG. 5 again.

단계(525)에서, SURP(300)는 각 프레임에 대해 공간적 대비 맵(Spatial Contrast Map; SCM)을 생성할 수 있다.In step 525, the SURP 300 may generate a spatial contrast map (SCM) for each frame.

자극 민감도(stimulus sensitivity)는 공간적 대비(Spatial Contrast)와 관련될 수 있다. 공간적 대비를 특징 벡터에 반영하기 위해 SCM이 사용될 수 있다.Stimulus sensitivity can be related to spatial contrast. SCM can be used to reflect the spatial contrast in the feature vector.

SCM은 에지 및 평탄의 측정에 의해 획득될 수 있다. 평탄한(smooth) 배경 영역에서는 자극에 대해서 민감한 반응이 발생할 수 있다. 여기에서, 자극은 에지를 의미할 수 있다. 반면, 낮은 평탄을 갖는 배경 영역에서의 에지는 마스킹 효과로 인해 자극에 대한 낮은 민감도를 가질 수 있다. SCM은 이러한 특성을 반영하기 위한 맵일 수 있다.The SCM can be obtained by measurement of edge and flatness. Sensitive responses to stimuli can occur in the smooth background area. Here, the stimulus can mean an edge. On the other hand, edges in background areas with low flatness may have low sensitivity to stimulation due to masking effects. The SCM may be a map to reflect these characteristics.

SCM은 SM 및 EM의 서로 대응하는 픽셀들의 픽셀 값들의 곱들일 수 있다. SURP(300)는 SM 및 EM의 동일한 위치의 픽셀들의 픽셀 값의 곱을 SCM의 픽셀 값으로 설정할 수 있다.The SCM may be the product of pixel values of corresponding pixels of SM and EM. The SURP 300 may set the product of the pixel values of the pixels at the same position of SM and EM to the pixel value of the SCM.

SURP(300)는 아래의 수학식 7에 따라 SCM을 생성할 수 있다.The SURP 300 may generate the SCM according to Equation (7) below.

Y는 SCM, EM 및 SM의 각각에서의 동일한 위치의 픽셀일 수 있다.
Y may be a pixel at the same position in each of SCM, EM and SM.

도 9는 일 예에 따른 SCM을 도시한다.9 shows an SCM according to an example.

도 9에서, 밝은 픽셀은 강한 에지이면서 평탄이 높은 배경을 나타낼 수 있다. 말하자면, 밝은 픽셀은 강한 에지를 나타내는 픽셀이거나, 평탄이 높은 배경을 나타내는 픽셀일 수 있다.
In Fig. 9, a bright pixel may exhibit a strong edge and a flat background. That is to say, a bright pixel may be a pixel representing a strong edge or a pixel representing a high flat background.

다시 도 5를 참조한다.See FIG. 5 again.

단계(530)에서, SURP(300)는 각 프레임에 대해 시각적 주목도 맵(Visual Saliency Map; VSM)을 생성할 수 있다.In step 530, the SURP 300 may generate a visual saliency map (VSM) for each frame.

인간의 시각 특성 상, 인간이 관심을 가지는 영역에 대한 자극은 인간이 관심을 가지지 않는 영역에 대한 자극보다 더 큰 영향을 미칠 수 있다. 이러한 시각 특징이 특징 벡터에 반영되도록, 시각적 주목도의 정보가 사용될 수 있다.In terms of human visual characteristics, stimulation of areas of human interest may have a greater impact than stimulation of areas of no human interest. Information of visual attention can be used so that such a visual feature is reflected in the feature vector.

VSM의 일 예로, 하렐(Harel) 및 페로나(Perona)에 의해 제안된 그래픽 기반 시각적 주목도 방법(graphic based visual saliency method)이 사용될 수 있다.
As an example of a VSM, a graphic based visual saliency method proposed by Harel and Perona may be used.

도 10는 일 예에 따른 VSM을 도시한다.10 shows a VSM according to an example.

도 10에서, 밝은 영역은 사람이 프레임의 영상을 볼 때 관심을 갖는 영역을 나타낼 수 있다. 말하자면, 밝은 영역은 주목도가 높은 영역일 수 있다. 어두운 영역은 사람이 프레임의 영상을 볼 때 관심을 가지지 않는 영역을 나타낼 수 있다. 말하자면, 어두운 영역은 주목도가 낮은 영역일 수 있다.
In Fig. 10, a bright area may indicate an area of interest when a person views an image of the frame. That is to say, the bright region may be a region of high attention. The dark area can indicate an area that is not of interest when a person views the image of the frame. That is to say, the dark area may be a low-notice area.

다시 도 5를 참조한다.See FIG. 5 again.

단계(535)에서, SURP(300)는 각 프레임에 대해 공간적 영향 맵(Spatial Influence Map; SIM)을 생성할 수 있다.At step 535, the SURP 300 may generate a Spatial Influence Map (SIM) for each frame.

SIM은, 시각적인 자극인 예지의 주변 정보뿐만 아니라, 픽셀이 시각적인 주목도(visual saliency) 영역에 해당하는지 여부를 반영하는 맵일 수 있다.The SIM may be a map that reflects whether or not the pixel corresponds to a visual saliency region, as well as predictive peripherals that are visual stimuli.

SIM은 SCM 및 VSM의 서로 대응하는 픽셀들의 픽셀 값들의 곱들일 수 있다. SURP(300)는 SCM 및 VSM의 동일한 위치의 픽셀들의 픽셀 값의 곱을 SCM의 픽셀 값으로 설정할 수 있다.The SIM may be the product of the pixel values of the corresponding pixels of the SCM and the VSM. The SURP 300 may set the product of the pixel values of the pixels at the same position of the SCM and the VSM to the pixel value of the SCM.

SURP(300)는 아래의 수학식 8에 따라 SCM을 생성할 수 있다.The SURP 300 may generate the SCM according to Equation (8) below.

Y는 SIM, SCM 및 VSM의 각각에서의 동일한 위치의 픽셀일 수 있다.
Y may be a pixel at the same position in each of the SIM, SCM, and VSM.

도 11은 일 에에 따른 SIM을 도시한다.
Fig. 11 shows a SIM according to Fig.

다시 도 5를 참조한다.See FIG. 5 again.

SURP(300)는 기본 결정 단위의 프레임들의 공간적 특성 및 시간적 특성에 기반하여 특징 벡터를 획득할 수 있다. 전술된 단계들(510, 515, 520, 525, 530 및 535)에서는 프레임의 공간적 특성을 반영하여 특징 벡터를 획득하는 과정이 설명되었다. 아래의 단계들(540, 545, 550 및 555)에서는 프레임의 시간적 특성을 반영하여 특징 벡터를 획득하는 과정이 설명된다.The SURP 300 can acquire the feature vector based on the spatial characteristics and the temporal characteristics of the frames of the basic decision unit. In the above-described steps 510, 515, 520, 525, 530, and 535, the process of acquiring the feature vector by reflecting the spatial characteristics of the frame has been described. In the following steps 540, 545, 550, and 555, a process of acquiring a feature vector by reflecting temporal characteristics of a frame will be described.

시간적 마스킹 효과가 높은 영역은 불규칙한 움직임을 갖는 영역 또는 갑작스러운 움직임을 갖는 영역일 수 있다. 즉, 임의가 높은 영역은 시간적 마스킹 효과가 높은 영역일 수 있으며, 프레임에서 임의가 높은 영역을 검출함으로써 시간적 마스킹 효과가 높은 영역이 결정될 수 있다.The region having a high temporal masking effect may be a region having irregular motion or a region having sudden motion. That is, an arbitrarily high region may be an area having a high temporal masking effect, and an area having a high temporal masking effect may be determined by detecting a high arbitrary region in the frame.

시간적 특성에 기반하여 특징 벡터를 획득함으로써, SURP(300)는 불규칙한 움직임 또는 갑작스러운 움직임에 민감하게 반응하는 인간의 시각적 특성을 특징 벡터의 획득에 대하여 반영할 수 있다.By acquiring a feature vector based on temporal characteristics, the SURP 300 can reflect the human visual characteristics that are sensitive to irregular motions or sudden motions to the acquisition of feature vectors.

전술된 것과 같이, 공간적 임의는 주변 픽셀로부터 예측하는 모델에 의해 계산될 수 있다. 이러한 계산 방식과 유사하게, 시간적 임의는 이전의 프레임들로부터 현재의 프레임을 예측하는 모델을 통해 계산될 수 있다.As described above, spatial randomness can be calculated by a model that predicts from neighboring pixels. Similar to this calculation scheme, the temporal arbitrary can be calculated through a model that predicts the current frame from previous frames.

시간적 임의를 계산하기 위해, 우선 GOP의 프레임들의 입력 구간은 2 개의 구간들로 분할될 수 있다.In order to calculate the temporal randomness, the input interval of the frames of the GOP may be divided into two intervals.

구간은 아래의 수학식 9와 같이 표현될 수 있다.The interval can be expressed as Equation (9) below.

Y는 구간을 나타낼 수 있다. k는 구간의 시작 프레임을 나타낼 수 있다. l은 구간의 마지막 프레임으로 구성됨을 나타낼 수 있다. 말하자면, Y_k ^l은 구간이 k 번째 프레임부터 l 번째 프레임으로 구성됨을 나타낼 수 있다.Y may represent a section. k may represent the start frame of the interval. l may represent the last frame of the interval. That is to say, Y _k ^l may indicate that the interval consists of the k th frame to the l th frame.

수학식 9가 행렬로서 저장될 때, 행렬의 열들은 프레임들에 각각 대응할 수 있다. 말하자면, 구간을 행렬로서 저장하는 것은 아래의 1) 및 2)의 단계로 이루어질 수 있다.When equation (9) is stored as a matrix, the columns of the matrix may correspond to the frames, respectively. That is to say, storing the interval as a matrix can be made by the following steps 1) and 2).

1) 구간의 프레임들은 1 차원의 행 벡터의 형태로 배열될 수 있다. 예를 들면, 프레임의 두 번째 행은 프레임의 첫 번째 행에 연쇄(concatenate)될 수 있다. 프레임의 행들은 순차적으로 연쇄될 수 있다.1) -th frame can be arranged in the form of a one-dimensional row vector. For example, the second row of the frame may be concatenated to the first row of the frame. The rows of the frame may be sequentially concatenated.

2) 행 벡터는 열 벡터의 형태로 전치(transpose)될 수 있다. 프레임들의 열 백터들로 구성된 행렬이 저장될 수 있다.2) The row vector may be transposed in the form of a column vector. A matrix of column vectors of frames may be stored.

각 구간의 길이가 d일 때, 2 개의 구간들은 Y_k+d ^l 및 Y_k ^l ^d로 표현될 수 있다.
When the length of each interval is d, the two intervals can be expressed as Y _{k + d} ^l and Y _k ^l ^d .

단계(540)에서, SURP(300)는 시간적 예측 모델(Temporal Prediction Model; TPM)을 생성할 수 있다.In step 540, the SURP 300 may generate a Temporal Prediction Model (TPM).

아래의 도 12에서는, GOP의 크기가 8이면서, d의 값이 4인 경우가 예시된다.
12 illustrates a case in which the size of the GOP is 8 and the value of d is 4 in the following FIG. 12.

도 12은 일 예에 따른 시간적 예측 모델의 생성에 사용되는 프레임 및 예측 대상 프레임 간의 관계를 도시한다.FIG. 12 shows a relationship between a frame used for generation of the temporal prediction model and a prediction target frame according to an example.

도 12에서, "프레임 0"은 제1 프레임을 나타낼 수 있고, "프레임 1"은 제2 프레임을 나타낼 수 있다. 말하자면 "프레임 n"은 제n+1 프레임일 수 있다.In Fig. 12, "frame 0" may represent the first frame and "frame 1" may represent the second frame. That is to say, "frame n" may be the (n + 1) -th frame.

도 12에 따르면, "프레임 0"부터 "프레임 3"까지가 첫 번째 구간일 수 있고, "프레임 4"부터 "프레임 7"까지가 두 번째 구간일 수 있다.According to Fig. 12, "frame 0" to "frame 3" may be the first section, and "frame 4" to "frame 7" may be the second section.

첫 번째 구간의 프레임들은 TPM을 위해 사용될 수 있으며, TPM에 의해 두 번째 구간의 프레임들이 예측될 수 있다.
The frames of the first interval may be used for the TPM, and the frames of the second interval may be predicted by the TPM.

다시 도 5를 참조한다.See FIG. 5 again.

TPM은 GOP의 첫 번째 구간을 이용하여 생성될 수 있다. 예를 들면, 크기가 8인 GOP에 대해, TPM은 제1 프레임, 제2 프레임, 제3 프레임 및 제4 프레임을 이용하여 생성될 수 있다.The TPM can be generated using the first section of the GOP. For example, for a GOP of size 8, the TPM may be generated using the first frame, the second frame, the third frame, and the fourth frame.

SURP(300)는 아래의 수학식 10에 기반하여 시간적 예측을 수행할 수 있다.The SURP 300 may perform temporal prediction based on Equation (10) below.

여기에서, A는 최적 예측을 제공하는 변환 행렬일 수 있다.Here, A may be a transformation matrix providing optimal prediction.

T는 예측 행렬일 수 있다.T may be a prediction matrix.

A는 슈도 인버스(pseudo inverse)를 이용하여 아래의 수학식 11과 같이 계산될 수 있다.A can be calculated as shown in Equation (11) using a pseudo inverse.

그러나, 수학식 11의 각 행렬들은 매우 크기 때문에, 수학식 11의 적용은 실질적으로 불가능할 수 있다.However, since each matrix of Equation (11) is very large, the application of Equation (11) may be practically impossible.

아래의 수학식 12의 상태 시퀀스(state sequence) 표현을 사용함에 따라 행렬 A가 보다 용이하게 계산될 수 있다.The matrix A can be calculated more easily by using the state sequence representation of Equation (12) below.

여기에서, Y는 X의 상태 행렬(state matrix)일 수 있다.Here, Y may be a state matrix of X.

C는 부호화(encoding) 행렬일 수 있다.C may be an encoding matrix.

W는 바이어스 행렬일 수 있다.W may be a bias matrix.

아래의 수학식 13은 Y에 대한 특이 값 분해(singular value decomposition)를 나타낼 수 있다. 이 때, W는 영 행렬로 가정될 수 있다.Equation (13) below can represent a singular value decomposition for Y. At this time, W can be assumed to be a zero matrix.

수학식 12 및 수학식 13을 비교하면, SURP(300)는 아래의 수학식 14에 따라 최적 상태 벡터(optimal state vector)를 도출할 수 있다.Comparing Equations (12) and (13), the SURP 300 can derive an optimal state vector according to Equation (14) below.

SURP는 아래의 수학식 15에 따라 시간적 예측 에러인 시간적 임의를 획득할 수 있다.SURP can obtain a temporal randomness which is a temporal prediction error according to the following equation (15).

도 13a 내지 13c는 연속된 일 예에 따른 3개의 프레임들을 나타낸다.Figures 13A-13C illustrate three frames in accordance with a contiguous example.

도 13a는 일 예에 따른 연속된 3개의 프레임들 중 첫 번째의 프레임을 나타낸다.13A shows a first one of three consecutive frames according to an example.

도 13b는 일 예에 따른 연속된 3개의 프레임들 중 두 번째의 프레임을 나타낸다.13B shows a second one of three consecutive frames according to an example.

도 13c는 일 예에 따른 연속된 3개의 프레임들 중 세 번째 프레임을 나타낸다.
13C shows a third frame of three consecutive frames according to an example.

다음으로, 3개의 프레임들에 대한 맵들이 도시된다.Next, maps for three frames are shown.

도 13d는 일 예에 따른 연속된 3개의 프레임들에 대한 시간적 임의 맵(Temporal Randomness Map; TRM)을 나타낸다.13D shows a Temporal Randomness Map (TRM) for three consecutive frames according to an example.

도 13e는 일 예에 따른 연속된 3개의 프레임들에 대한 공간시간적 영향 맵(SpatioTemporal Influence Map; STIM)을 나타낸다.FIG. 13E shows a Spatio Temporal Influence Map (STIM) for three consecutive frames according to an example.

도 13f는 일 예에 따른 연속된 3개의 프레임들에 대한 가중치가 부여된 공간시간적 영향 맵(Weighted SpatioTemporal Influence Map; WSTIM)을 나타낸다.FIG. 13F shows a weighted Spatio Temporal Influence Map (WSTIM) for three consecutive frames according to an example.

TRM, STIM 및 WSTIM에 대해서 아래에서 상세하게 설명된다.
TRM, STIM and WSTIM are described in detail below.

다시 도 5를 참조한다.See FIG. 5 again.

단계(545)에서, SURP(300)는 시간적 임의 맵(Temporal Randomness Map; TRM)을 생성할 수 있다.In step 545, the SURP 300 may generate a Temporal Randomness Map (TRM).

TRM은 제4 프레임, 제5 프레임, 제6 프레임 및 제7 프레임에 대한 맵일 수 있다.TRM may be a map for the fourth frame, the fifth frame, the sixth frame and the seventh frame.

TRM에서, 밝은 픽셀은 시간적 임의가 큰 영역을 나타낼 수 있다. 시간적 임의가 큰 영역은 시각적인 인지 효과가 적은 영역일 수 있다.In TRM, bright pixels can represent regions with large temporal randomness. A region having a large temporal randomness may be a region having a small visual cognitive effect.

단계(550)에서, SURP(300)는 공간시간적 영향 맵(SpatioTemporal Influence Map; STIM)을 생성할 수 있다.In step 550, the SURP 300 may generate a Spatio Temporal Influence Map (STIM).

STIM은 제4 프레임, 제5 프레임, 제6 프레임 및 제8 프레임에 대한 맵일 수 있다.The STIM may be a map for the fourth frame, the fifth frame, the sixth frame and the eighth frame.

사람이 비디오를 볼 때, 사람의 시각적 특성에 따라 사람은 공간적 시각적 자극 및 시간적 시각적 자극을 함께 인지할 수 있다. 따라서, 특징 벡터에서도 공간적 특성 및 시간적 특성이 동시에 반영될 필요가 있다.When a person views a video, one can perceive spatial visual stimuli and temporal visual stimuli together, depending on the visual characteristics of the person. Therefore, it is also necessary that the spatial characteristic and the temporal characteristic are simultaneously reflected in the feature vector.

이러한 공간적 특성 및 시간적 특성의 반역을 위해, 아래의 수학식 16과 같이 STIM이 정의될 수 있다.In order to rebound the spatial and temporal characteristics, an STIM may be defined as shown in Equation (16) below.

Y는 특정한 위치의 픽셀을 나타낼 수 있다.Y can represent a pixel at a specific position.

STIM은 SIM 및 TRM의 서로 대응하는 픽셀들의 픽셀 값들의 곱들일 수 있다. SURP(300)는 SIM 및 TRM의 동일한 위치의 픽셀들의 픽셀 값의 곱을 STIM의 픽셀 값으로 설정할 수 있다.The STIM may be the product of pixel values of corresponding pixels of SIM and TRM. The SURP 300 may set the product of the pixel values of the pixels at the same position of the SIM and the TRM to the pixel value of the STIM.

단계(555)에서, SURP(300)는 가중치가 부여된 공간시간적 영향 맵(Weighted SpatioTemporal Influence Map; WSTIM)을 생성할 수 있다.In step 555, the SURP 300 may generate a Weighted Spatio Temporal Influence Map (WSTIM).

WSTIM은 제4 프레임, 제5 프레임, 제6 프레임 및 제8 프레임에 대한 맵일 수 있다.The WSTIM may be a map for the fourth frame, the fifth frame, the sixth frame and the eighth frame.

WSTIM은 전체의 자극에서 상대적으로 큰 자극을 강조할 수 있다.WSTIM can emphasize relatively large stimuli in the whole stimulus.

SURP(300)는 SIM의 픽셀의 값이 SIM의 픽셀들의 평균 값으로 나뉘어진 값을 WSTIM의 픽셀 값으로 설정할 수 있다.The SURP 300 may set a value obtained by dividing the value of the SIM pixel by the average value of the pixels of the SIM into the pixel value of the WSTIM.

WSTIM은 프레임들 내에 작은 물체의 빠른 움직임의 자극이 있는 경우에 효과적일 수 있다.The WSTIM may be effective when there is stimulation of fast movement of small objects within the frames.

단계(560)에서, SURP(300)는 SURP(300)는 특성 표현자(feature representor)의 기능을 수행할 수 있다. SURP(300)는 TRIM, STIM 및 WSTIM 중 적어도 하나의 결과를 출력할 수 있다.In step 560, the SURP 300 may perform the function of a feature representor. The SURP 300 may output the result of at least one of TRIM, STIM, and WSTIM.

SURP(300)는 프레임의 기정의된 크기의 블록들의 각 블록 별로 TRIM, STIM 및 WSTIM 중 적어도 하나의 평균을 계산할 수 있다. SURP(300)는 블록들의 평균들을 내림차순으로 정렬된 순서로 출력할 수 있다. 예를 들면, 기정의된 크기는 64x64일 수 있다.The SURP 300 may calculate an average of at least one of TRIM, STIM, and WSTIM for each block of blocks of a predetermined size of the frame. The SURP 300 may output the averages of blocks in descending order. For example, the default size may be 64x64.

전술된 맵들은 영상과 같은 데이터를 나타내는 행렬의 형태로 표현될 수 있다. 전술된 맵들을 특징 벡터에 적합한 벡터의 형태로 표현하는 특성 표현자의 과정이 요구된다.The aforementioned maps may be expressed in the form of a matrix representing data such as an image. There is a need for a process of a characteristic presenter that expresses the aforementioned maps in the form of a vector suitable for the feature vector.

SURP(300)는 맵들의 각 맵을 기정의된 크기의 블록의 단위로 분할할 수 있다. 예를 들면, 비디오가 에이치디(HD) 비디오인 경우, 블록의 크기는 64x64일 수 있다. SURP(300)는 각 블록 별로, 각 블록의 픽셀들의 평균 값을 계산할 수 있다. 또한, SURP(300)는 위치에 대한 종속성을 갖지 않게 하기 위해, 맵의 블록들을 평균 값이 큰 블록부터 작은 블록으로의 순서로 정렬할 수 있다.The SURP 300 may divide each map of maps into blocks of predetermined size. For example, if the video is HD video, the size of the block may be 64x64. The SURP 300 can calculate an average value of pixels of each block for each block. In addition, the SURP 300 may arrange the blocks of the map in order from the block having the largest average value to the block having the smallest value, so as not to have dependency on the position.

이러한 과정을 통해 각 맵은 히스토그램과 같은 1차원 배열의 형태로 표현될 수 있다.Through this process, each map can be expressed in the form of a one-dimensional array such as a histogram.

비디오가 HD 비디오인 경우, 1차원 배열의 크기는 507일 수 있다.If the video is HD video, the size of the one-dimensional array may be 507.

1차원 배열은 블록의 평균 값의 순서로 정렬된 배열일 수 있다. 따라서, 일반적으로, 큰 값들은 1차원 배열의 앞 부분에 배치될 수 있고, 뒤로 갈수록 작은 값들이 배치될 수 있다. 따라서, 1차원 배열의 값들 중 앞의 일부 만이 특징 벡터로서 사용되더라도, 1차원 배열의 값들의 전부가 사용된 것에 비해 성능의 차이가 크지 않을 수 있다. 이러한 특징에 따라, SURP(300)는 1차원 배열의 값들의 전체를 특징 벡터로서 사용하는 대신, 큰 값들을 갖는 앞의 일부 만을 특징 벡터로서 사용할 수 있다. 여기에서, 일부분은 기정의된 길이 또는 기정의된 비율일 수 있다. 1차원 배열에서, 큰 값들은 배열의 앞 부분에 있을 수 있고, 있을 수 있다. 일 예로, SURP(300)는 1차원 배열의 값들 중 앞의 100개만을 사용할 수 있다.
The one-dimensional array may be an array arranged in the order of average values of the blocks. Thus, in general, large values may be placed in the front portion of the one-dimensional array, and smaller values may be placed in the backward direction. Therefore, even if only the preceding part of the values of the one-dimensional array is used as the feature vector, the performance difference may not be large compared to that of all the values of the one-dimensional array used. According to this characteristic, the SURP 300 can use only the previous part having large values as a feature vector, instead of using the entire values of the one-dimensional array as feature vectors. Here, a portion may be a predetermined length or a predetermined ratio. In a one-dimensional array, large values may or may not be in the front of the array. For example, the SURP 300 may use only the first 100 of the values of the one-dimensional array.

모델의 학습Model learning

SURP(300)는 SVM을 이용할 수 있다. SURP(300)의 기계 학습은 아래의 1) 내지 3)의 작업들일 수 있다.The SURP 300 can use the SVM. The machine learning of the SURP 300 may be the operations of 1) to 3) below.

1) SURP(300)는 GOP로부터 특징 벡터를 추출할 수 있다.1) The SURP 300 can extract a feature vector from a GOP.

2) SURP(300)는 추출된 특징 벡터를 SVM의 입력으로서 사용하여 출력 예측을 수행할 수 있다.2) The SURP 300 can perform the output prediction using the extracted feature vector as the input of the SVM.

3) SURP(300)는 출력 예측에 의해 출력된 값이 그라운드 트루스에 최대한 가까운 값을 갖도록 SVM을 생성할 수 있다.3) The SURP 300 may generate the SVM such that the value output by the output prediction has a value as close as possible to the ground truth.

아래에서는, 60 FPS의 비디오가 30 FPS의 비디오로 변경될 때의 화질의 저하를 예측하기 위해 모델을 학습하는 일 예가 설명된다.Below is an example of learning a model to predict degradation in image quality when 60 FPS video is changed to 30 FPS video.

우선, 호모지니어스 비디오의 데이터 세트에 대한 주관적 화질 평가를 통해 그라운드 트루스가 이미 획득되었다.First, the ground truth was already obtained through the subjective image quality evaluation of the data set of the homogeneous video.

다음으로, SURP(300)는 60 FPS의 비디오에 대해 TRM, STIM 및 WSTIM의 각각을 구할 수 있다, 또한, SURP(300)는 비디오의 FPS가 30이라는 내부적인 가정 하에 TRM, STIM 및 WSTIM의 각각을 구할 수 있다. 또한, SURP(300)는 60 FPS의 비디오의 TRM, STIM 및 WSTIM을 특징 표현자를 통해 1차원 배열로 변환할 수 있고, SURP(300)는 30 FPS의 비디오의 TRM, STIM 및 WSTIM을 특징 표현자를 통해 1차원 배열로 변환할 수 있다.Next, the SURP 300 may obtain TRM, STIM, and WSTIM for 60 FPS video, respectively. Also, the SURP 300 may determine each of TRM, STIM and WSTIM based on an internal assumption that the FPS of the video is 30 Can be obtained. In addition, the SURP 300 can convert TRM, STIM, and WSTIM of 60 FPS video into a one-dimensional array through the feature presenters, and the SURP 300 can convert TRM, STIM, and WSTIM of 30 FPS video into feature descriptors Can be converted to a one-dimensional array.

다음으로, SURP(300)는 통해 60 FPS의 비디오의 특징 벡터 및 30 FPS의 비디오의 특징 벡터가 그라운드 트루스에 상응하는 차이를 갖도록 SVM에 대한 트레이닝을 수행할 수 있다.Next, the SURP 300 may perform training on the SVM such that the feature vectors of 60 FPS video feature vectors and 30 FPS video feature differences corresponding to the ground truce.

다음으로, 학습이 완료되면, SURP(300)는 SVM의 내부를 더 이상 변경하지 않고, 60 FPS의 GOP를 사용하여 60 FPS의 GOP의 특징 벡터 및 30 FPS의 GOP의 특징 벡터를 각각 추출할 수 있다.Next, when the learning is completed, the SURP 300 can extract the feature vectors of the 60 FPS GOP and the 30 FPS GOP using the GOP of 60 FPS, without further changing the interior of the SVM have.

SURP(300)는 60 FPS의 GOP의 특징 벡터 및 30 FPS의 GOP의 특징 벡터를 사용하여 60 FPS의 GOP의 화질 및 30 FPS의 GOP의 화질 간의 화질 차이의 정도를 예측할 수 있고, 예측된 화질 차이의 정도에 기반하여 최종적으로 화질 만족도를 생성 및 출력할 수 있다.SURP (300) can predict the image quality difference between the image quality of 60 FPS GOP and 30 FPS GOP by using the characteristic vector of 60 FPS GOP feature vector and 30 FPS GOP feature image, and predicted image quality difference The image quality satisfaction can be finally generated and output based on the degree of the image quality.

여기에서, 60 FPS 및 30 FPS는 단지 예시적인 것이다. SURP(300)는, 전술된 60 FPS 및 30 FPS에서의 화질 만족도의 생성과 유사하게, 다른 FPS들 간의 변환에 대해서도 화질 만족도를 생성 및 출력할 수 있다. 예를 들면, FPS는 60에서 15로 변경될 수 있다. 또는, FPS는 120에서 60으로 변경될 수 있다.Here, 60 FPS and 30 FPS are merely illustrative. SURP 300 can generate and output picture quality satisfaction for conversions between different FPSs, similar to the above-described generation of picture quality satisfaction at 60 FPS and 30 FPS. For example, the FPS may be changed from 60 to 15. Alternatively, the FPS may be changed from 120 to 60.

SVM의 특징 벡터를 생성하기 위해서는 전술된 SIM, TRM, STM 및 RSTM 중 하나의 맵만이 사용될 수 있고, 2개 이상의 맵들이 동시에 사용될 수도 있다.In order to generate the feature vector of the SVM, only one of the SIM, TRM, STM and RSTM described above may be used, and two or more maps may be used at the same time.

또한, 여러 개의 SVM이 캐쉬케이드(cascade)로 결합될 수 있다. 예를 들면, 우선 TRM을 특징 벡터로 사용하는 SVM이 적용될 수 있고, 다음으로 STM 및 직전의 SVM의 결과를 특징 벡터로 사용하는 SVM이 적용될 수 있다. 또한, 마지막으로 RSTM 및 직전의 SVM의 결과를 특징 벡터로 사용하는 SVM이 적용될 수 있다.
In addition, multiple SVMs may be combined into a cascade. For example, an SVM using a TRM as a feature vector may be applied first, and an SVM using a result of an STM and a previous SVM as a feature vector may be applied next. Finally, an SVM using the RSTM and the result of the immediately preceding SVM as a feature vector can be applied.

최적의 프레임 율의 결정Determining the Optimum Frame Rate

이하에서는 SURP(300)를 이용하여 기본 결정 단위 별로 최적 프레임 율을 결정하고, 결정된 최적 프레임 율을 사용하여 비디오에 대한 최적의 부호화를 수행하는 방법이 설명된다.Hereinafter, a method for determining the optimum frame rate for each basic decision unit using the SURP 300 and performing optimal encoding for video using the determined optimal frame rate will be described.

SURP(300)는 비디오의 기본 결정 단위에 대해서 주어진 최소의 화질 만족도 또는 기정의된 화질 만족도의 조건을 충족시키는 기본 결정 단위의 최소의 FPS를 예측할 수 있고, 최소의 화질 만족도로 변환된 기본 결정 단위를 제공할 수 있다. SURP(300)는 최소의 화질 만족도로 변환된 기본 결정 단위를 부호화할 수 있다.
The SURP 300 can predict the minimum FPS of the basic decision unit that satisfies the condition of the minimum image quality satisfaction or the predetermined image quality satisfaction with respect to the basic decision unit of the video, Can be provided. The SURP 300 can encode the basic decision unit converted to the minimum picture quality satisfaction.

도 14는 일 예에 따른 FPS 별 화질 만족도를 나타낸다.FIG. 14 shows image quality satisfaction by FPS according to an example.

전술된 것과 같이 SURP(300)는 특정한 FPS로 변환된 GOP의 화질 만족도를 예측할 수 있고, 예측된 화질 만족도를 출력할 수 있다. 전술된 기능을 응용하면, SURP(300)는 비디오의 GOP들의 각각에 대해, 복수의 FPS들의 복수의 화질 만족도들을 예측 및 출력할 수 있다.As described above, the SURP 300 can predict the image quality satisfaction of the GOP converted into the specific FPS and output the predicted image quality satisfaction. Applying the functions described above, the SURP 300 can predict and output a plurality of image quality satisfactions of a plurality of FPSs for each of the GOPs of the video.

GOP에 대해 복수의 FPS들의 복수의 화질 만족도들을 예측함에 따라, 복수의 예측된 화질 만족도들이 부호화에 사용될 수 있다.By predicting a plurality of image quality satisfactions of a plurality of FPSs for a GOP, a plurality of predicted image quality levels can be used for encoding.

도 14에서, 가로의 첫 줄은 비디오의 복수의 GOP들을 나타낸다. 세로의 첫 줄은 GOP의 변환된 FPS를 나타낸다. 또한, GOP 별로, 변환된 FPS에서의 화질 만족도가 %가 도시되었다.In Fig. 14, the first line of the horizontal represents a plurality of GOPs of the video. The first vertical line indicates the converted FPS of the GOP. Also, for each GOP, the image quality satisfaction in the converted FPS is shown in%.

도 14에서 소스 비디오는 60 FPS일 수 있다.In Fig. 14, the source video may be 60 FPS.

SURP(300)는 타겟 비디오의 복수의 FPS들에 대해 순차적으로 기계 학습을 수행할 수 있다. 먼저, 비디오의 FPS가 30으로 변환된 경우에 대해, SURP(300)는 기계 학습을 수행할 수 있다. 또한, 비디오의 FPS가 15로 변환된 경우에 대해, SURP(300)는 기계 학습을 수행할 수 있다. 또한, 비디오의 FPS가 7.5로 변환된 경우에 대해, SURP(300)는 기계 학습을 수행할 수 있다. 말하자면, SURP(300)는 타겟 비디오의 FPS들의 각각에 대해 기계 학습을 수행할 수 있다.The SURP 300 may sequentially perform machine learning on a plurality of FPSs of the target video. First, when the FPS of the video is converted to 30, the SURP 300 can perform the machine learning. Also, for the case where the FPS of the video is converted to 15, the SURP 300 can perform machine learning. Further, for the case where the FPS of the video is converted to 7.5, the SURP 300 can perform machine learning. That is, the SURP 300 may perform machine learning for each of the FPSs of the target video.

기계 학습이 된 상태에서, SURP(300)는 순차적으로 복수의 FPS들의 각 FPS에 대해서, 비디오의 GOP의 화질 만족도를 예측할 수 있다. 예를 들면, 타겟 비디오의 FPS가 30이 때, 타겟 비디오의 3 개의 GOP들의 화질 만족도는 80%, 90% 및 50%일 수 있다. 타겟 비디오의 FPS가 15일 때, 3 개의 GOP들의 화질 만족도는 70%, 60% 및 45%일 수 있다.With machine learning enabled, the SURP 300 can sequentially estimate the image quality of a GOP of a video for each FPS of a plurality of FPSs. For example, when the FPS of the target video is 30, the image quality satisfaction of the three GOPs of the target video may be 80%, 90%, and 50%. When the FPS of the target video is 15, the image quality satisfaction of the three GOPs may be 70%, 60% and 45%.

이러한 예측을 통해, SURP(300)는 복수의 FPS들에 대해서 GOP의 화질 만족도들을 계산할 수 있다.Through this prediction, the SURP 300 can calculate the image quality satisfaction of a GOP for a plurality of FPSs.

또한, SURP(300)는 소스 비디오의 GOP의 화질 만족도는 100%인 것으로 간주할 수 있다.Also, the SURP 300 can regard the image quality satisfaction of the GOP of the source video as 100%.

도 14에서, 마지막의 행은 요구사항을 나타낼 수 있다. 요구사항은 요구 화질(required pixture quality) 또는 최소 화질 만족도를 나타낼 수 있다. 말하자면, 마지막의 행은 화질 만족도가 75%의 이상이 되어야 한다는 것을 나타낼 수 있다.
In FIG. 14, the last row may indicate a requirement. The requirement may indicate required pixture quality or minimum image quality satisfaction. That is to say, the last row can indicate that image quality satisfaction should be above 75%.

도 15는 일 예에 따른 GOP 별로 결정된 최적 프레임 율을 나타낸다.FIG. 15 shows an optimum frame rate determined for each GOP according to an example.

SURP(300)는 각 GOP에 대하여, 요구사항을 충족시키는 최적의 FPS를 결정할 수 있다. 여기에서, 최적의 FPP는 요구사항의 값의 이상이면서, 가장 낮은 화질 만족도를 갖는 GOP의 FPS일 수 있다. 말하자면, 최적의 FPS는 요구사항을 충족시키는 최저의 FPS일 수 있다. 만약, 변환에 의해 생성된 타겟 비디오의 FPS들 중, 요구사항의 이상인 화질 만족도를 충족시키는 FPS가 존재하지 않을 경우, 최적의 FPS는 소스 비디오의 FPS일 수 있다.The SURP 300 can determine an optimal FPS that satisfies the requirement for each GOP. Here, the optimal FPP may be the FPS of the GOP having the lowest image quality satisfaction, above the value of the requirement. That is to say, the optimal FPS may be the lowest FPS that meets the requirement. If there is no FPS satisfying the picture quality satisfaction among the FPSs of the target video generated by the conversion, the optimal FPS may be the FPS of the source video.

도 15에서는 복수의 GOP들의 각각에 대해 결정된 최적 프레임 율이 도시되었다.In Fig. 15, the optimal frame rate determined for each of a plurality of GOPs is shown.

예를 들면, 도 14에 따르면, 제1 GOP의 복수의 FPS들에 대한 화질 만족도들은 80%, 70% 및 50%일 수 있다. 이 중, 요구사항의 이상이며 최저의 화질 만족도는 30 FPS에서의 80일 수 있다. 따라서, 제1 GOP의 최적 프레임 율은 30 FPS일 수 있다. 제2 GOP의 복수의 FPS들에 대한 화질 만족도들은 90%, 80% 및 50%일 수 있다. 이 중, 요구사항의 이상이며 최저의 화질 만족도는 15 FPS에서의 80일 수 있다. 따라서, 제2 GOP의 최적 프레임 율은 15 FPS일 수 있다. 제3 GOP의 복수의 FPS들에 대한 화질 만족도들은 50%, 45% 및 40%일 수 있다. 화질 만족도들 중 요구사항의 이상인 것이 존재하지 않으므로, 제3 GOP는 변환될 수 없다. 따라서, 제3 GOP의 최적 프레임 율은 소스 비디오의 FPS인 60 FPS일 수 있다.For example, according to FIG. 14, the image quality satisfaction for a plurality of FPSs of the first GOP may be 80%, 70% and 50%. Of these, the minimum requirement for image quality is 80 or 80 FPS at 30 FPS. Thus, the optimal frame rate of the first GOP may be 30 FPS. The image quality satisfaction for a plurality of FPSs of the second GOP may be 90%, 80% and 50%. Of these, the minimum requirement for image quality is 80 or 80 FPS at 15 FPS. Thus, the optimal frame rate of the second GOP may be 15 FPS. The image quality satisfaction for the plurality of FPSs of the third GOP may be 50%, 45% and 40%. The third GOP can not be transformed because there is no more than the requirement among the picture quality satisfacies. Thus, the optimal frame rate of the third GOP may be 60 FPS, which is the FPS of the source video.

복수의 GOP들의 각 GOP는 각 GOP의 최적 프레임 율로 변환될 수 있고, 비디오의 변환된 GOP들은 부호화될 수 있다.
Each GOP of a plurality of GOPs can be converted to an optimal frame rate of each GOP, and the converted GOPs of the video can be encoded.

도 16은 일 예에 따른 최적 프레임 율을 사용하는 GOP의 부호화 방법의 흐름도이다.16 is a flowchart of a method of encoding a GOP using an optimum frame rate according to an example.

우선, 입력 비디오는 60 FPS인 것으로 예시된다. 입력 비디오의 GOP들이 순차적으로 단계들(1610, 1620, 1630, 1640, 1650 및 1660)에 의해 처리될 수 있다.First, input video is illustrated as being 60 FPS. The GOPs of the input video may be processed by steps 1610, 1620, 1630, 1640, 1650 and 1660 sequentially.

단계(1610)에서, SURP(300)는 60 FPS의 GOP를 30 FPS의 GOP로 변환할 수 있고, 30 FPS의 GOP의 화질 만족도를 계산할 수 있다.In step 1610, the SURP 300 may convert a 60 FPS GOP to a 30 FPS GOP and calculate a picture quality rating of a 30 FPS GOP.

단계(1620)에서, SURP(300)는 60 FPS의 GOP를 15 FPS의 GOP로 변환할 수 있고, 15 FPS의 GOP의 화질 만족도를 계산할 수 있다.In step 1620, the SURP 300 may convert a 60 FPS GOP to a 15 FPS GOP and calculate a picture quality rating of a 15 FPS GOP.

단계(1630)에서, SURP(300)는 60 FPS의 GOP를 7.5 FPS의 GOP로 변환할 수 있고, 7.5 FPS의 GOP의 화질 만족도를 계산할 수 있다.In step 1630, the SURP 300 may convert a GOP of 60 FPS to a GOP of 7.5 FPS and calculate a picture quality of a GOP of 7.5 FPS.

단계(1640)에서, SURP(300)는 최적 프레임 율을 결정할 수 있다. In step 1640, the SURP 300 may determine an optimal frame rate.

SURP(300)는 요구 화질을 수신할 수 있다. 요구 화질은 전술된 최소 화질 만족도를 나타낼 수 있다.The SURP 300 can receive the requested image quality. The required image quality may indicate the minimum image quality satisfaction described above.

SURP(300)는 기정의된 FPS로 변경된 GOP들 중 최소 화질 만족도를 충족시키는 최적의 GOP를 선택할 수 있다. 여기에서, 최적의 GOP는 가장 낮은 FPS를 갖는 GOP일 수 있다.The SURP 300 can select an optimal GOP satisfying the minimum image quality satisfaction among the GOPs changed to the predetermined FPS. Here, the optimal GOP may be the GOP having the lowest FPS.

또는, SURP(300)는 변경된 GOP들의 FPS들 중 최소 화질 만족도를 충족시키는 최적 프레임 율을 결정할 수 있다. SURP(300)는 변경된 GOP들의 화질 만족도들 중 최소 화질 만족도의 이상이면서, 가장 작은 화질 만족도를 갖는 GOP 또는 상기의 GOP의 FPS를 선택할 수 있다. 최적 프레임 율은 선택된 FPS 또는 선택된 GOP의 FPS일 수 있다.Alternatively, the SURP 300 may determine an optimal frame rate that satisfies the minimum image quality satisfaction among the FPSs of the modified GOPs. The SURP 300 can select the GOP having the minimum image quality satisfaction or the FPS of the GOP above the minimum image quality satisfaction among the image quality satisfaction degrees of the changed GOPs. The optimal frame rate may be the FPS of the selected FPS or the selected GOP.

단계(1650)에서, SURP(300)는 최적 프레임 율의 GOP의 FPS를 선택할 수 있다. SURP(300)는 GOP의 FPS를 최적 프레임 율로 변환할 수 있다. 또는, SURP(300)는 입력 비디오의 GOP 및 단계들(1610, 1620 및 1630)에 의해 변경된 GOP들 중 최적 프레임 율의 GOP를 선택할 수 있다.In step 1650, the SURP 300 may select the FPS of the GOP of the optimal frame rate. The SURP 300 can convert the FPS of the GOP to an optimal frame rate. Alternatively, the SURP 300 may select the GOP of the input video and the GOP of the optimal frame rate among the changed GOPs by the steps 1610, 1620 and 1630.

단계(1660)에서, SURP(300)는 단계(1650)에서 선택된 GOP의 부호화를 수행할 수 있다.
At step 1660, the SURP 300 may perform the encoding of the GOP selected at step 1650. [

도 17은 일 예에 따른 최적 프레임 율의 결정 방법의 흐름도이다.17 is a flowchart of a method of determining an optimal frame rate according to an example.

도 16을 참조하여 전술된 단계(1640)는 아래의 단계들(1710, 1720 및 1730)을 포함할 수 있다.Step 1640 described above with reference to FIG. 16 may include the following steps 1710, 1720, and 1730.

SURP(300)는 변경된 GOP들에 대해, 높은 FPS의 GOP로부터 낮은 FPS의 GOP의 순서로, 순차적으로 GOP가 요구 화질을 충족시키는지 여부를 검사할 수 있다. SURP(300)는 요구 화질을 충족시키지 못하는 GOP의 바로 이전의 GOP를 선택할 수 있다.The SURP 300 can check whether the GOP satisfies the requested image quality sequentially, in the order of the GOP of the high FPS to the GOP of the low FPS, for the changed GOPs. The SURP 300 can select the immediately preceding GOP of the GOP that does not satisfy the required image quality.

변경된 GOP들 중 첫 번째의 GOP가 요구 화질을 충족시키지 못하는 경우 소스 비디오의 GOP가 선택될 수 있다. 말하자면, 변경된 GOP들 중 어떤 GOP도 요구 화질을 충족시키지 못하는 경우 소스 비디오의 GOP가 선택될 수 있다.If the first GOP of the modified GOPs does not satisfy the required picture quality, the GOP of the source video can be selected. That is, the GOP of the source video can be selected if any of the changed GOPs does not satisfy the required image quality.

변경된 GOP들 중 마지막의 GOP가 요구 화질을 충족시키지 못하는 경우 마지막의 GOP가 선택될 수 있다. 말하자면, 변경된 GOP들 중 모든 GOP들이 요구 화질을 충족시키는 경우 가장 낮은 FPS의 GOP가 선택될 수 있다.If the last GOP of the modified GOPs does not satisfy the requested image quality, the last GOP may be selected. That is, the GOP of the lowest FPS can be selected if all the GOPs among the changed GOPs satisfy the required image quality.

단계(1710)에서, SURP(300)는 30 FPS의 GOP가 요구 화질을 충족시키는지 여부를 검사할 수 있다. 30 FPS의 GOP가 요구 화질을 충족시키지 못하는 경우 SURP(300)는 60 FPS를 최적 프레임 율로 선택할 수 있고, 60 FPS의 GOP를 제공할 수 있다. 30 FPS의 GOP가 요구 화질을 충족시키는 경우 단계(1720)가 수행될 수 있다.In step 1710, the SURP 300 may check whether a 30 FPS GOP meets the required image quality. If a 30 FPS GOP does not meet the required picture quality, the SURP 300 can select 60 FPS at the optimal frame rate and can provide a GOP of 60 FPS. Step 1720 can be performed if the GOP of 30 FPS satisfies the required image quality.

단계(1720)에서, SURP(300)는 15 FPS의 GOP가 요구 화질을 충족시키는지 여부를 검사할 수 있다. 15 FPS의 GOP가 요구 화질을 충족시키지 못하는 경우 SURP(300)는 30 FPS를 최적 프레임 율로 선택할 수 있고, 30 FPS의 GOP를 제공할 수 있다. 15 FPS의 GOP가 요구 화질을 충족시키는 경우 단계(1730)가 수행될 수 있다.In step 1720, the SURP 300 may check whether a 15 FPS GOP meets the required image quality. If a 15 FPS GOP does not meet the required picture quality, the SURP 300 can select 30 FPS at the optimal frame rate and can provide a GOP of 30 FPS. Step 1730 may be performed if the GOP of 15 FPS satisfies the required image quality.

단계(1730)에서, SURP(300)는 7.5 FPS의 GOP가 요구 화질을 충족시키는지 여부를 검사할 수 있다. 7.5 FPS의 GOP가 요구 화질을 충족시키지 못하는 경우 SURP(300)는 15 FPS를 최적 프레임 율로 선택할 수 있고, 15 FPS의 GOP를 제공할 수 있다. 7.5 FPS의 GOP가 요구 화질을 충족시키는 경우 SURP(300)는 7.5 FPS를 최적 프레임 율로 선택할 수 있고, 7.5 FPS의 GOP를 제공할 수 있다.
In step 1730, the SURP 300 may check whether a GOP of 7.5 FPS meets the required image quality. If a GOP of 7.5 FPS does not meet the required picture quality, the SURP (300) can select 15 FPS at the optimal frame rate and can provide a GOP of 15 FPS. When a GOP of 7.5 FPS meets the required picture quality, the SURP (300) can select 7.5 FPS at the optimal frame rate and can provide a GOP of 7.5 FPS.

도 18은 일 예에 따른 시간적 식별자(temporal identifier)를 갖는 프레임들의 계층적인 구조를 나타낸다.FIG. 18 shows a hierarchical structure of frames having temporal identifiers according to an example.

도 18에서는, 9개의 프레임들이 도시되었다. 9개의 프레임들은 "프레임 0" 내지 "프레임 8"일 수 있다.In Fig. 18, nine frames are shown. The nine frames may be "frame 0" to "frame 8 ".

각 프레임들은 시간적 식별자(Temporal Identifier; TI)를 가질 수 있다. 또한, 각 프레임은 I 프레임, P 프레임 또는 B 프레임일 수 있다.Each frame may have a Temporal Identifier (TI). Further, each frame may be an I frame, a P frame, or a B frame.

전술된 것과 같이, SURP(300)를 통해 복수의 FPS들로 GOP가 변환된 경우에, 복수의 변환된 GOP들의 화질 만족도가 예측될 수 있다. 비디오의 부호화에 의해 생성된 비트스트림이 화질 만족도와 관련된 화질 만족도 정보를 포함할 경우, 화질 만족도를 유지하면서, 선택적인 시간적 스케일러빌리티(Temporal Scalability; TS)가 지원되는 개선된 비트스트림이 제공될 수 있다.As described above, when the GOP is converted into a plurality of FPSs through the SURP 300, the image quality satisfaction of the plurality of converted GOPs can be predicted. When the bit stream generated by encoding the video includes the image quality satisfaction information related to the image quality satisfaction, an improved bit stream can be provided that supports the optional temporal scalability (TS) while maintaining image quality satisfaction have.

에이치이브이씨(High Efficiency Video Coding; HEVC) 규격에서는, 각 프레임의 TI는 엔에이엘(Network Abstraction Layer; NAL) 헤더를 통해 전송될 수 있다. 예를 들면, NAL 헤더의 TI 정보 "nuh_temporal_id_plus1"은 프레임의 TI를 나타낼 수 있다.In the High Efficiency Video Coding (HEVC) standard, the TI of each frame can be transmitted through a Network Abstraction Layer (NAL) header. For example, TI information "nuh_temporal_id_plus1" of the NAL header can indicate the TI of a frame.

NAL 헤더의 시간적 식별자 정보를 통해, 화질 만족도 정보를 제공하는 TS가 제공될 수 있다.Through the temporal identifier information of the NAL header, a TS providing the picture quality satisfaction information can be provided.

GOP의 크기가 8인 경우, 도 18의 계층적인 예측 구조가 사용될 때, TI의 값이 3인 홀수 번호의 프레임들이 GOP에서 제거될 경우, GOP의 FPS가 1/2이 될 수 있다. 즉, GOP의 FPS가 60에서 30으로 변경될 수 있다.When the size of the GOP is 8, when the hierarchical prediction structure of FIG. 18 is used, if the odd numbered frames having a TI value of 3 are removed from the GOP, the FPS of the GOP may be 1/2. That is, the FPS of the GOP may be changed from 60 to 30.

추가로, TI의 값이 2인 "프레임 2" 및 "프레임 6"이 제거될 경우, GOP의 FPS가 1/4이 될 수 있다. 즉, GOP의 FPS가 60에서 15로 변경될 수 있다.In addition, if "frame 2" and "frame 6" with a TI value of 2 are removed, the FPS of the GOP may be 1/4. That is, the FPS of the GOP may be changed from 60 to 15.

TS의 제공에 있어서, TI만이 사용될 경우 비디오는 고정된 FPS를 가질 수 있다. 예를 들면, 60 FPS의 비디오의 재생에 있어서, TI가 3인 프레임이 재생에서 제외될 경우, 비디오 전체에 대해 고정된 30 FPS가 적용될 수 있다. 이렇게, TI 만을 고려하는 TS가 사용될 경우, 비디오의 중간에 빠른 움직임의 객체가 존재할 경우, 빠른 움직임의 객체에 대해 시청자가 인지할 수 있는 큰 화질 저하가 발생할 수 있다.
In the provision of TS, video may have a fixed FPS if only TI is used. For example, in the playback of 60 FPS video, if a frame with TI of 3 is excluded from playback, a fixed 30 FPS may be applied to the entire video. Thus, when a TS considering only TI is used, if there is a fast moving object in the middle of the video, a large quality deterioration that a viewer can perceive about a fast moving object may occur.

개선된 시간적 스케일러빌러티Improved temporal scalability

이하에서는, SURP(300)에 의해 생성된 화질 만족도 정보를 사용하여 개선된 시간적 스케일러빌리트를 제공하는 부호화 방법 및 복호화 방법이 설명된다.Hereinafter, an encoding method and a decoding method for providing an improved temporal scaler rule using the image quality satisfaction information generated by the SURP 300 will be described.

전술된 SURP(300)을 이용하면, GOP 별로 TS가 적용될 수 있다. 말하자면, 전술된 SURP(300)을 이용하면 복수의 FPS들의 각 FPS에 대해 GOP의 화질 만족도들이 예측될 수 있다. GOP의 화질 예측도들을 이용하면 화질을 유지하면서도 TS가 제공될 수 있다.With the above-described SURP 300, a TS can be applied to each GOP. That is to say, using the above-described SURP 300, the image quality satisfaction of the GOP can be predicted for each FPS of a plurality of FPSs. Using the picture quality predictions of GOP, TS can be provided while maintaining image quality.

SURP(300)는 TI들의 각각에 대해, 각 TI에 대응하는 화질 만족도 정보를 결정할 수 있다. 화질 만족도 정보는 화질 만족도를 나타낼 수 있다. 여기에서, 특정한 TI에 대응하는 화질 만족도 정보는 특정한 TI의 이상의 TI(들)의 프레임들이 부호화, 복호화 또는 재생에서 제거되었을 경우의 화질에 대한 화질 만족도를 나타낼 수 있다.The SURP 300 may determine, for each of the TIs, image quality satisfaction information corresponding to each TI. The image quality satisfaction information can indicate the image quality satisfaction. Here, the image quality satisfaction information corresponding to a specific TI may indicate the image quality satisfaction with the image quality when the frames of the specific TI or more TI (s) are removed from the encoding, decoding, or reproduction.

예를 들면, "프레임 x"는 GOP가 어떤 FPS로 재생되는가에 따라 재생에 포함될 수도 있고, 재생에서 제외될 수도 있다. FPS에 따라서, "프레임 x"가 TS에 의해 복호화 또는 재생에서 제외될 수 있다. 이러한 경우, "프레임 x"가 재생에서 포함되지 않는 FPS들 중 최대의 FPS가 y이고, FPS가 y일 때의 화질 만족도가 z이고, "프레임 x"의 TI가 w이면, TI w에 대응하는 화질 만족도는 z일 수 있다.For example, "frame x" may be included in playback or excluded from playback depending on what FPS the GOP is played back in. Depending on the FPS, "frame x" may be excluded from decryption or playback by the TS. In this case, if the maximum FPS of the FPSs in which "frame x" is not included in the reproduction is y, the image quality satisfaction when the FPS is y is z, and the TI of "frame x & The image quality satisfaction may be z.

또한, GOP에서, 동일한 TI를 갖는 프레임들은 동일한 화질 만족도 정보를 공통으로 가질 수 있다. 따라서, 요구되는 화질 만족도에 따라서, 후술될 복호화 장치(2700)는 특정한 TI를 갖는 프레임들의 재생 여부를 적응적으로 선택할 수 있다.Further, in a GOP, frames having the same TI can have the same image quality satisfaction information in common. Accordingly, depending on the required image quality satisfaction, the decoding apparatus 2700 to be described later can adaptively select whether to reproduce frames having a specific TI.

예를 들면, 요구되는 화질 만족도가 Z라는 것은, 대응하는 화질 만족도가 Z 이하인 TI의 프레임들이 재생에 포함되어야 한다는 것을 나타낼 수 있다. 또한, 이러한 결정은, 요구되는 화질 만족도가 Z일 때, Z보다 큰 화질 만족도에 대응하는 TI의 프레임들은 재생에서 제외될 수 있다는 것을 나타낼 수 있다.For example, the required image quality satisfaction Z may indicate that frames of TI with a corresponding image quality satisfaction of Z or below should be included in the playback. This determination may also indicate that frames of TI corresponding to image quality satisfaction greater than Z may be excluded from playback when the required image quality satisfaction is Z. [

TI에 대응하는 화질 만족도는 프레임의 접근 유닛(access unit)의 에스이아이(SEI) 메시지에 포함될 수 있으며, NAL 유닛 헤더에 포함될 수 있다. 이 때, SEI 메시지 또는 NAL 유닛 헤더는 화질 만족도 정보의 값을 직접 포함할 수 있다. 또는, SEI 메시지 또는 NAL 유닛 헤더는 화질 만족도 테이블의 인덱스의 값을 포함할 수 있다.The image quality satisfaction corresponding to the TI may be included in the SEI message of the access unit of the frame and may be included in the NAL unit header. At this time, the SEI message or the NAL unit header may directly include the value of the image quality satisfaction information. Alternatively, the SEI message or NAL unit header may include the value of the index of the picture quality satisfaction table.

GOP에서, 동일한 TI를 갖는 프레임들은 동일한 화질 만족도 정보를 가질 수 있다. 따라서, GOP에서, 각 TI 별로, 각 TI의 첫 번째 프레임에서만 SEI를 통해 화질 만족도 정보가 부호화 장치(2400)로부터 복호화 장치(2700)로 전송될 수 있다. 또는, GOP의 첫 번째 프레임에서, 모든 TI들에 대한 화질 만족도 정보가 한 번에 전송될 수도 있다.
In a GOP, frames having the same TI may have the same picture quality satisfaction information. Therefore, in the GOP, picture quality satisfaction information can be transmitted from the encoding device 2400 to the decoding device 2700 through the SEI only in the first frame of each TI for each TI. Alternatively, in the first frame of the GOP, image quality satisfaction information for all TIs may be transmitted at one time.

도 19는 일 예에 따른 화질 만족도를 포함하는 메시지를 나타낸다.FIG. 19 shows a message including the image quality satisfaction according to an example.

도 19에서, 프레임의 접근 유닛(access unit)의 에스이아이(SEI) 메시지는 화질 만족도 정보를 포함할 수 있다. 말하자면, 화질 만족도 정보는 프레임의 접근 유닛(access unit)의 에스이아이(Supplemental Enhancement Information; SEI) 메시지를 통해 제공될 수 있다.In Fig. 19, the SEI message of an access unit of a frame may include image quality satisfaction information. That is to say, the image quality satisfaction information can be provided through Supplemental Enhancement Information (SEI) message of the access unit of the frame.

도 19에서, "SEI_Temporal_ID_SURP"는 화질 만족도 정보를 제공하기 위해 사용되는 데이터를 나타낼 수 있다. "Surp_value"는 화질 만족도 정보를 나타낼 수 있다.In Fig. 19, "SEI_Temporal_ID_SURP" may indicate data used to provide image quality satisfaction information. "Surp_value" may indicate image quality satisfaction information.

예를 들면, "Surp_value"의 값이 70인 것은, 화질 만족도가 "70%"라는 것을 나타낼 수 있다.
For example, a value of "Surp_value" of 70 indicates that the image quality satisfaction is "70%".

도 20은 일 예에 따른 화질 만족도 테이블의 인덱스를 포함하는 메시지를 나타낸다.FIG. 20 shows a message including an index of a picture quality satisfaction table according to an example.

도 19에서, "SEI_Temporal_ID_SURP"는 화질 만족도 정보를 제공하기 위해 사용되는 데이터를 나타낼 수 있다. "Surp_value_idx"는 화질 만족도 테이블의 인덱스일 수 있다.In Fig. 19, "SEI_Temporal_ID_SURP" may indicate data used to provide image quality satisfaction information. "Surp_value_idx" may be an index of the image quality satisfaction table.

비디오 또는 GOP에서 사용되는 화질 만족도의 값들은 화질 만족도 테이블에서 테이블로서 미리 정의되어 있을 수 있다. 예를 들면, 화질 만족도 테이블은 {90, 60, 30}의 값들을 가질 수 있다. 이러한 값들은 90, 60 및 30의 화질 만족도들이 사용된다는 것을 나타낼 수 있다. "Surp_value_idx"의 값이 0이면, 화질 만족도 테이블 중 인덱스 0의 값, 즉 90 또는 90%가 화질 만족도의 값임을 나타낼 수 있다. "Surp_value_idx"의 값이 1이면, 화질 만족도 테이블 중 인덱스 0의 값, 즉 60 또는 60%가 화질 만족도의 값임을 나타낼 수 있다.
The values of the picture quality satisfaction used in the video or GOP may be predefined as a table in the picture quality satisfaction table. For example, the image quality satisfaction table may have values of {90, 60, 30}. These values may indicate that image quality levels of 90, 60, and 30 are used. If the value of "Surp_value_idx" is 0, it can indicate that the value of the index 0 in the image quality satisfaction table, that is, 90 or 90% is the image quality satisfaction value. If the value of "Surp_value_idx" is 1, it can indicate that the value of the index 0 in the image quality satisfaction table, that is, 60 or 60% is the image quality satisfaction value.

도 21은 일 예에 따른 화질 만족도들의 전체를 포함하는 메시지를 나타낸다.FIG. 21 shows a message including the entirety of image quality satisfaction according to an example.

도 21에서, 하나의 SEI 메시지는 GOP의 모든 프레임들의 모든 TI들의 화질 만족도 정보를 포함할 수 있다.In FIG. 21, one SEI message may include image quality satisfaction information of all TIs of all frames of a GOP.

도 21에서, "Max_Temporl_ID"는 TI들의 최대의 개수를 나타낼 수 있다. "Temoporal ID[i]"는 값이 i 또는 i1인 TI의 인덱스를 나타낼 수 있다. 또는, "Temoporal ID[i]"는 i 번째 TI의 화질 만족도 정보를 나타낼 수 있다.In FIG. 21, "Max_Temporl_ID" may represent the maximum number of TIs. The "Temoporal ID [i]" may represent the index of TI whose value is i or i1. Alternatively, "Temoporal ID [i]" may indicate the image quality satisfaction information of the i-th TI.

화질 만족도 정보는 화질 만족도의 값 자체를 나타내거나, 화질 만족도 테이블의 인덱스를 나타낼 수 있다. 도 21에서는, "Temoporal ID[i]"가 화질 만족도 테이블의 인덱스를 나타내는 것으로 예시되었다.
The image quality satisfaction information may indicate the value of the image quality satisfaction itself or the index of the image quality satisfaction table. In FIG. 21, it is illustrated that "Temoporal ID [i]" represents the index of the image quality satisfaction table.

도 22는 일 예에 따른 화질 만족도 정보의 생성 방법을 도시한다.FIG. 22 shows a method of generating image quality satisfaction information according to an example.

도 22는 일 예에 따른 화질 만족도 정보를 포함하는 GOP의 부호화 방법의 흐름도이다.22 is a flowchart of a method of encoding a GOP including image quality satisfaction information according to an example.

우선, 입력 비디오는 60 FPS인 것으로 예시된다. 입력 비디오의 GOP들이 순차적으로 단계들(2210, 2220, 2230, 2240 및 2250에 의해 처리될 수 있다.First, input video is illustrated as being 60 FPS. The GOPs of the input video may be processed sequentially by steps 2210, 2220, 2230, 2240 and 2250.

단계(2210)에서, SURP(300)는 60 FPS의 GOP를 30 FPS의 GOP로 변환할 수 있고, 30 FPS의 GOP의 화질 만족도를 계산할 수 있다.In step 2210, the SURP 300 may convert a 60 FPS GOP to a 30 FPS GOP and calculate a picture quality rating of a 30 FPS GOP.

단계(2220)에서, SURP(300)는 60 FPS의 GOP를 15 FPS의 GOP로 변환할 수 있고, 15 FPS의 GOP의 화질 만족도를 계산할 수 있다.In step 2220, the SURP 300 may convert a 60 FPS GOP to a 15 FPS GOP and calculate a picture quality rating of a 15 FPS GOP.

단계(2230)에서, SURP(300)는 60 FPS의 GOP를 7.5 FPS의 GOP로 변환할 수 있고, 7.5 FPS의 GOP의 화질 만족도를 계산할 수 있다.In step 2230, the SURP 300 may convert a GOP of 60 FPS to a GOP of 7.5 FPS and calculate a picture quality rating of a GOP of 7.5 FPS.

단계(2240)에서, SURP(300) GOP의 프레임들에 대한 화질 만족도 정보의 부호화를 수행할 수 있다.In step 2240, the SURP 300 may perform encoding of image quality satisfaction information for the frames of the GOP.

단계(1650)에서, SURP(300)는 GOP의 부호화를 수행할 수 있다. SURP(300)의 GOP들의 복수의 프레임들의 부호화를 수행할 수 있다.In step 1650, the SURP 300 may perform the encoding of the GOP. It is possible to perform encoding of a plurality of frames of the GOPs of the SURP 300.

도 22의 실시예에서는, GOP들의 모든 프레임들이 부호화될 수 있다. 모든 프레임들 중 어떤 프레임을 재생할 것인가는 최소 화질 만족도에 따라 부호화의 단계에서 결정될 수 있다.
In the embodiment of FIG. 22, all frames of GOPs can be encoded. Which frame to reproduce from among all the frames can be determined at the encoding stage according to the minimum image quality satisfaction.

도 23a는 일 예에 따른 75% 이상의 화질 만족도를 유지하는 구성을 나타낸다.FIG. 23A shows a configuration for maintaining image quality satisfaction of 75% or more according to an example.

도 23a에서, 사각형은 프레임을 나타낼 수 있다. 도 23a에서 제1 GOP는 "프레임 0" 내지 "프레임 7"을 포함할 수 있다. 제2 GOP는 "프레임 8" 내지 "프레임 15"를 포함할 수 있다. 사각형 내부의 "TI"는 프레임의 시간적 식별자의 값을 나타낼 수 있다. 사각형 내부의 "I", "P" 및 "B"는 프레임의 타입을 나타낼 수 있다. 사각형 위 또는 아래의 숫자는 프레임의 화질 만족도를 나타낼 수 있다.In Figure 23A, the rectangle may represent a frame. In Figure 23A, the first GOP may include "frame 0" through "frame 7 ". The second GOP may include "frame 8" through "frame 15 ". The "TI" inside the rectangle may represent the value of the temporal identifier of the frame. The "I "," P ", and "B" inside the rectangle can indicate the type of frame. The number above or below the rectangle can indicate the image quality satisfaction of the frame.

SURP(300)는 비디오의 GOP들의 각 GOP를 최소 화질 만족도를 충족시키기 위한 최소의 FPS로 변환할 수 있고, 변환된 각 GOP를 인코딩할 수 있다. 이 경우, 최소 화질 만족도를 충족시키기 위해 요구되는 최소한의 비트 율로 부호화된 비디오가 전송될 수 있다. 이러한 방식은 전송의 측면에서 유리할 수 있다.The SURP 300 can convert each GOP of the GOPs of the video into a minimum FPS to satisfy the minimum image quality satisfaction, and can encode each converted GOP. In this case, the encoded video at the minimum bit rate required to meet the minimum picture quality satisfaction may be transmitted. This scheme can be advantageous in terms of transmission.

도 23a에서는, 최소한 75%의 화질 만족도가 보장되도록 TS가 적용된 경우가 예시되었다.In Fig. 23A, a case in which TS is applied so as to ensure at least 75% of image quality satisfaction is exemplified.

도 23a에서, 점선 내의 프레임들은 부호화 또는 전송되지 않는 프레임들을 나타낼 수 있다. 도 23a에서, 첫 번째 GOP에서는, 75%의 최소 화질 만족도가 충족되도록, 프레임들이 30 FPS로 전송될 수 있다. 또한, 두 번째 GOP에서는, 75%의 최소 화질 만족도가 충족되도록, 프레임들이 15 FPS로 전송될 수 있다.In Fig. 23A, frames in the dashed line may indicate frames that are not encoded or transmitted. In Fig. 23A, in the first GOP, frames may be transmitted at 30 FPS so that a minimum picture quality satisfaction of 75% is satisfied. Also, in the second GOP, frames may be transmitted at 15 FPS so that a minimum picture quality satisfaction of 75% is satisfied.

SURP(300)는 TI들의 화질 만족도 정보를 비트스트림에 포함시킬 수 있다. 복호화 장치(2700)는 화질 만족도 정보를 참조하여 최소 화질 만족도를 충족시키기 위한 최소의 FPS를 결정할 수 있고, 결정된 FPS를 위해 요구되는 프레임들을 부호화 장치로부터 수신할 수 있다. 복호화기는 수신된 프레임들을 재생할 수 있다.The SURP 300 may include the image quality satisfaction information of the TIs in the bitstream. The decoding apparatus 2700 can determine the minimum FPS for satisfying the minimum picture quality satisfaction with reference to the picture quality satisfaction information and can receive the frames required for the determined FPS from the encoding apparatus. The decoder may play the received frames.

복호화 장치(2700)는 화질 만족도 정보를 참조하여 GOP에 대하여 최소 화질 만족도를 충족시키기 위한 최소의 FPS를 결정할 수 있다. 복호화 장치(2700)는 결정된 FPS를 위해 요구되는 프레임들의 복호화를 수행할 수 있다. 여기에서, 프레임들은 GOP의 프레임들일 수 있다. 결정된 FPS를 위해 요구되는 프레임들을 부호화 장치로부터 수신할 수 있다. 복호화기는 수신된 프레임들을 재생할 수 있다.The decoding apparatus 2700 can determine the minimum FPS for satisfying the minimum picture quality satisfaction with respect to the GOP by referring to the picture quality satisfaction information. Decryption unit 2700 may perform decryption of the frames required for the determined FPS. Here, the frames may be frames of a GOP. And receive frames required for the determined FPS from the encoder. The decoder may play the received frames.

복호화 장치(2700)는 TS를 통해 비디오 또는 GOP의 프레임들 중 복호화를 수행하기로 결정된 프레임만을 선택적으로 부호화 장치(2500)로부터 획득할 수 있다.The decoding apparatus 2700 can selectively obtain only a frame determined to perform decoding among the video or GOP frames from the encoding apparatus 2500 through the TS.

복호화 장치(2700)는 GOP의 프레임들 중 최소 화질 만족도의 이하의 화질 만족도를 갖는 프레임을 선택적으로 디코딩할 수 있다. 최소 화질 만족도보다 더 큰 화질 만족도를 갖는 프레임을 디코딩에서 제외함으로써 복호화 장치(2700)는 시청자가 인식하는 화질을 저하시키지 않으면서도 효율적으로 비디오의 전송 및 재생을 수행할 수 있다.
The decoding apparatus 2700 can selectively decode a frame having the following picture quality satisfaction level of the minimum picture quality satisfaction among the frames of the GOP. By excluding a frame having a picture quality greater than the minimum picture quality satisfaction from decoding, the decoding apparatus 2700 can efficiently perform video transmission and reproduction without deteriorating the image quality recognized by the viewer.

도 23b는 일 예에 따른 75%의 화질 만족도를 기준으로 FPS의 변경의 여부를 결정하는 구성을 나타낸다.FIG. 23B shows a configuration for determining whether to change the FPS based on 75% picture quality satisfaction according to an example.

도 23b에서, 사각형은 프레임을 나타낼 수 있다. 도 23b에서 제1 GOP는 "프레임 0" 내지 "프레임 7"을 포함할 수 있다. 제2 GOP는 "프레임 8" 내지 "프레임 15"를 포함할 수 있다. 사각형 내부의 "TI"는 프레임의 시간적 식별자의 값을 나타낼 수 있다. 사각형 내부의 "I", "P" 및 "B"는 프레임의 타입을 나타낼 수 있다. 사각형 위 또는 아래의 숫자는 프레임의 화질 만족도를 나타낼 수 있다.In Fig. 23B, the rectangle may represent a frame. In Figure 23B, the first GOP may include "frame 0" through "frame 7 ". The second GOP may include "frame 8" through "frame 15 ". The "TI" inside the rectangle may represent the value of the temporal identifier of the frame. The "I "," P ", and "B" inside the rectangle can indicate the type of frame. The number above or below the rectangle can indicate the image quality satisfaction of the frame.

SURP(300)는 기본적으로 GOP의 FPS를 30으로 변경하되, 화질 만족도가 75%보다 작을 경우 GOP의 FPS를 변경하지 않고 소스 비디오의 FPS를 그대로 유지할 수 있다. 말하자면, SURP(300)는 최소 화질 만족도에 기반하여 적응적으로 TS를 사용할 수 있다. 이러한 기능은 화질이 중요시되는 환경에서 특히 요구될 수 있다.The SURP 300 basically changes the FPS of the GOP to 30, but if the image quality satisfaction is less than 75%, the FPS of the source video can be maintained without changing the FPS of the GOP. That is to say, the SURP 300 can adaptively use the TS based on the minimum image quality satisfaction. Such a function may be particularly required in an environment in which image quality is important.

도 24b에서, 첫 번째의 GOP는 30 FPS로 변환되어 부호화 및 전송될 수 있고, 두 번째의 GOP는 60 FPS를 유지한 채 부호화 및 전송될 수 있다.
In FIG. 24B, the first GOP can be encoded and transmitted at 30 FPS, and the second GOP can be encoded and transmitted at 60 FPS.

도 24는 일 실시예에 따른 부호화 장치의 구조도이다.24 is a structural diagram of an encoding apparatus according to an embodiment.

부호화 장치(2400)는 제어부(2410), 부호화부(2420) 및 통신부(2430)를 포함할 수 있다.The encoding device 2400 may include a control unit 2410, an encoding unit 2420, and a communication unit 2430.

제어부(2410)는 전술된 SURP(300)에 대응할 수 있다. 예를 들면, 전술된 SURP(300)의 기능은 제어부(2410)에 의해 수행될 수 있다. 또는, 제어부(2410)는 SURP(300)를 수행할 수 있다.The control unit 2410 may correspond to the SURP 300 described above. For example, the function of the above-described SURP 300 can be performed by the control unit 2410. [ Alternatively, the control unit 2410 may perform the SURP 300.

제어부(2410)는 동영상의 프레임에 대한 선택 정보를 생성할 수 있다.The control unit 2410 may generate selection information on a moving image frame.

선택 정보는 전술된 화질 만족도 정보에 대응할 수 있다. 또는, 선택 정보는 화질 만족도 정보를 포함할 수 있다.The selection information may correspond to the above-described image quality satisfaction information. Alternatively, the selection information may include image quality satisfaction information.

선택 정보는 프레임이 비디오의 부호화에서 제외되더라도 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.The selection information may be related to the percentage of people who are not aware of the degradation of the video quality even if the frame is excluded from the encoding of the video.

또는, 선택 정보는 프레임이 복호화에서 제외되도록 동영상의 재생이 설정되더라도 재생되는 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.Alternatively, the selection information may be related to a ratio of persons who do not perceive the deterioration of the picture quality of the moving picture to be reproduced even if the reproduction of the moving picture is set so that the frame is excluded from the decoding.

선택 정보는 복수일 수 있다. 복수의 선택 정보는 동영상 또는 기본 결정 단위의 재생의 FPS 별로 계산될 수 있다. 제어부(2410)는 복수의 FPS들의 선택 정보를 계산할 수 있다.The selection information may be plural. The plurality of selection information may be calculated for each FPS of the reproduction of the moving image or the basic determination unit. The control unit 2410 may calculate selection information of a plurality of FPSs.

동영상의 화질의 저하를 인지하지 못하는 사람의 비율은 제어부(2410)의 기계 학습에 의해 계산될 수 있다. 전술된 것과 같이 동영상의 화질의 저하를 인지하지 못하는 사람의 비율은 프레임 또는 프레임을 포함하는 기본 결정 단위의 특징 벡터에 기반하여 결정될 수 있다.The ratio of persons who are not aware of the deterioration of the picture quality of the moving picture can be calculated by the machine learning of the control unit 2410. [ As described above, the ratio of persons who are not aware of the degradation of the picture quality of the moving picture can be determined based on the characteristic vector of the basic determination unit including the frame or the frame.

선택 정보는 프레임을 포함하는 복수의 프레임들 중, 프레임의 TI와 동일한 TI를 갖는 다른 프레임에게 공통적으로 적용될 수 있다. 복수의 프레임들은 기본 결정 단위의 프레임들일 수 있다. 또는, 복수의 프레임들은 GOP의 프레임들일 수 있다.The selection information can be commonly applied to other frames having the same TI as the TI of the frame among the plurality of frames including the frame. The plurality of frames may be frames of a basic determination unit. Alternatively, the plurality of frames may be frames of a GOP.

제어부(2410)는 생성된 선택 정보에 기반하여 동영상의 부호화를 수행할 수 있다.The control unit 2410 may encode the moving picture based on the generated selection information.

예를 들면, 제어부(2410)는 선택 정보의 부호화를 수행할 수 있고, 부호화된 선택 정보를 부호화된 동영상의 비트스트림에 포함시킬 수 있다.For example, the control unit 2410 may encode the selection information and may include the encoded selection information in the bitstream of the encoded moving image.

예를 들면, 제어부(2410)는 선택 정보에 기반하여 동영상의 프레임들 중 부호화할 프레임을 선택할 수 있다.For example, the control unit 2410 can select a frame to be encoded out of frames of the moving image based on the selection information.

부호화부(2420)는 전술된 부호화 장치(100)에 대응할 수 있다. 예를 들면, 전술된 부호화 장치(100)의 기능은 부호화부(2420)에 의해 수행될 수 있다. 또는, 부호화부(2420)는 부호화 장치(100)를 포함할 수 있다.The encoding unit 2420 may correspond to the encoding apparatus 100 described above. For example, the above-described function of the encoding apparatus 100 can be performed by the encoding unit 2420. [ Alternatively, the encoding unit 2420 may include an encoding device 100. [

부호화부(2420)는 동영상의 프레임의 부호화를 수행할 수 있다.The encoding unit 2420 can perform encoding of a moving image frame.

예를 들면, 부호화부(2420)은 동영상의 전체의 프레임들의 부호화를 수행할 수 있다.For example, the encoding unit 2420 may encode the entire frames of the moving picture.

예를 들면, 부호화부(2420)는 제어부(2410)에 의해 선택된 프레임의 부호화를 수행할 수 있다.For example, the encoding unit 2420 can perform encoding of a frame selected by the control unit 2410. [

통신부(2430)는 생성된 비트스트림을 복호화 장치(2700)로 전송할 수 있다.The communication unit 2430 can transmit the generated bit stream to the decoding device 2700. [

비트스트림은 부호화된 동영상의 정보를 포함할 수 있고, 부호화된 선택 정보를 포함할 수 있다.The bitstream may include information on the encoded moving picture, and may include encoded selection information.

제어부(2410), 부호화부(2420) 및 통신부(2430)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.
Functions and operations of the control unit 2410, the encoding unit 2420, and the communication unit 2430 will be described in detail below.

도 25는 일 실시예에 따른 부호화 방법의 흐름도이다.25 is a flowchart of a coding method according to an embodiment.

단계(2510)에서, 제어부(2410)는 동영상의 프레임에 대한 선택 정보를 생성할 수 있다.In step 2510, the control unit 2410 may generate selection information on the frame of the moving picture.

단계(2520)에서, 제어부(2410)는 선택 정보에 기반하여 동영상의 부호화를 수행할 수 있다.In step 2520, the control unit 2410 may perform encoding of the moving picture based on the selection information.

부호화부(2420)는 동영상의 프레임의 부호화를 수행할 수 있다. 부호화부(2420)는 동영상의 프레임의 부호화를 수행함으로써 비트스트림을 생성할 수 있다. 동영상의 부호화에 의해 생성된 비트스트림은 선택 정보를 포함할 수 있다.The encoding unit 2420 can perform encoding of a moving image frame. The encoding unit 2420 can generate a bit stream by performing encoding of a moving picture frame. The bit stream generated by the encoding of the moving picture may include selection information.

단계(2530)에서, 통신부(2430)는 부호화된 동영상의 정보를 포함하는 비트스트림을 복호화 장치(2700)로 전송할 수 있다.
In step 2530, the communication unit 2430 may transmit the bitstream including the information of the encoded moving picture to the decoding apparatus 2700. [

도 26은 일 예에 따른 동영상의 부호화를 수행하는 방법의 흐름도이다.26 is a flowchart of a method of performing motion picture coding according to an example.

도 25를 참조하여 전술된 단계(2520)은 아래의 단계들(2610 및 2620)을 포함할 수 있다.The above-described step 2520 with reference to FIG. 25 may include the following steps 2610 and 2620.

단계(2610)에서, 제어부(2410)는 프레임에 대한 선택 정보에 기반하여 프레임의 부호화 여부의 결정을 수행할 수 있다.In step 2610, the control unit 2410 may determine whether to encode the frame based on the selection information for the frame.

단계(2620)에서, 프레임의 부호화가 결정된 경우, 부호화부(2420)는 프레임의 부호화를 수행할 수 있다.In step 2620, when the encoding of the frame is determined, the encoding unit 2420 can perform encoding of the frame.

프레임의 부호화가 결정된 경우, 제어부(2410)는 부호화부(2420)에게 프레임의 부호화를 요청할 수 있다.If the encoding of the frame is determined, the control unit 2410 can request the encoding unit 2420 to encode the frame.

프레임의 부호화 여부의 결정은 프레임을 포함하는 복수의 프레임들 중 프레임의 TI와 동일한 TI를 갖는 다른 프레임에게 공통적으로 적용될 수 있다. 여기에서, 복수의 프레임들은 기본 결정 단위의 프레임들일 수 있다. 또는, 복수의 프레임들은 GOP의 프레임들일 수 있다.
The determination as to whether or not the frame is coded can be commonly applied to other frames having the same TI as the TI of the frame among the plurality of frames including the frame. Here, the plurality of frames may be frames of a basic determination unit. Alternatively, the plurality of frames may be frames of a GOP.

도 27은 일 실시예에 따른 복호화 장치의 구조도이다.27 is a structural diagram of a decoding apparatus according to an embodiment.

복호화 장치(2700)는 제어부(2710), 복호화부(2720) 및 통신부(2730)를 포함할 수 있다.The decryption apparatus 2700 may include a control unit 2710, a decryption unit 2720, and a communication unit 2730.

통신부(2730)는 부호화 장치(2400)로부터 비트스트림을 수신할 수 있다.The communication unit 2730 can receive the bit stream from the encoding device 2400. [

비트스트림은 부호화된 동영상의 정보를 포함할 수 있다. 부호화된 동영상의 정보는 부호화된 프레임의 정보를 포함할 수 있다. 비트스트림은 프레임의 선택 정보를 포함할 수 있다.The bitstream may include information of the encoded moving picture. The information of the encoded moving picture may include information of the encoded frame. The bitstream may include frame selection information.

제어부(2710)는 프레임의 선택 정보에 기반하여 프레임의 복호화 여부의 결정을 수행할 수 있다.The controller 2710 may determine whether to decode the frame based on the frame selection information.

제어부(2710)는 전술된 SURP(300)의 기능 중 복호화에 적용 가능한 기능을 수행할 수 있다. 또는, 제어부(2710)는 SURP(300)의 적어도 일부를 포함할 수 있다.The control unit 2710 may perform a function applicable to decryption among the functions of the SURP 300 described above. Alternatively, the control unit 2710 may include at least a portion of the SURP 300.

복호화부(2720)는 전술된 복호화 장치(200)에 대응할 수 있다. 예를 들면, 전술된 복호화 장치(200)의 기능은 복호화부(2720)에 의해 수행될 수 있다. 또는, 복호화부(2720)는 복호화 장치(200)를 포함할 수 있다.The decoding unit 2720 may correspond to the decoding apparatus 200 described above. For example, the function of the above-described decryption apparatus 200 may be performed by the decryption unit 2720. [ Alternatively, the decryption unit 2720 may include a decryption apparatus 200.

프레임의 복호화가 결정된 경우, 복호화부(2720)는 프레임의 복호화를 수행할 수 있다.When decryption of the frame is determined, the decryption unit 2720 can decrypt the frame.

예를 들면, 복호화부(2720)는 동영상의 전체의 프레임들의 복호화를 수행할 수 있다.For example, the decoding unit 2720 can perform decoding of the entire frames of the moving picture.

예를 들면, 복호화부(2720)는 제어부(2710)에 의해 선택된 프레임의 부호화를 수행할 수 있다.For example, the decoding unit 2720 can perform encoding of the frame selected by the control unit 2710. [

제어부(2710), 복호화부(2720) 및 통신부(2730)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.
Functions and operations of the control unit 2710, the decryption unit 2720, and the communication unit 2730 will be described in detail below.

도 28은 일 실시예에 따른 복호화 방법의 흐름도이다.28 is a flowchart of a decoding method according to an embodiment.

단계(2810)에서, 통신부(2730)은 비트 스트림을 수신할 수 있다. 통신부(2730)는 부호화된 프레임의 정보를 수신할 수 있다.In step 2810, the communication unit 2730 can receive the bit stream. The communication unit 2730 can receive the information of the encoded frame.

단계(2820)에서, 제어부(2710)는 프레임의 선택 정보에 기초하여 프레임의 복호화 여부의 결정을 수행할 수 있다.In step 2820, the control unit 2710 may determine whether to decode the frame based on the frame selection information.

상기의 결정은 비디오의 복수의 프레임들의 각 프레임에 대하여 수행될 수 있다. 여기에서, 복수의 프레임들은 기본 결정 단위의 프레임들일 수 있다. 또는, 복수의 프레임들은 GOP의 프레임들일 수 있다.The determination may be performed for each frame of a plurality of frames of video. Here, the plurality of frames may be frames of a basic determination unit. Alternatively, the plurality of frames may be frames of a GOP.

복수의 프레임들화질 만족도 정보선택 정보는 프레임이 복호화에서 제외되도록 프레임을 포함하는 동영상의 재생이 결정되더라도 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.The plurality of frames of image quality satisfaction information selection information may be related to a ratio of persons who are not aware of the degradation of the image quality of the moving image even if the reproduction of the moving image including the frame is determined so that the frame is excluded from the decoding.

또는, 선택 정보는 프레임이 복호화에서 제외되도록 프레임을 포함하는 동영상의 재생의 FPS가 결정되더라도 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.Alternatively, the selection information may be related to a ratio of persons who do not perceive degradation in the picture quality of the moving picture even though the FPS of the reproduction of the moving picture including the frame is determined so that the frame is excluded from the decoding.

선택 정보는 프레임에 대한 SEI 내에 포함될 수 있다.The selection information may be included in the SEI for the frame.

프레임의 복호화 여부는 선택 정보의 값 및 재생 정보의 값 간의 비교에 기반하여 결정될 수 있다.Whether or not the frame is decoded can be determined based on a comparison between the value of the selection information and the value of the reproduction information.

재생 정보는 프레임을 포함하는 동영상의 재생과 관련된 정보일 수 있다. 재생 정보는 최소 화질 만족도에 대응할 수 있다. 또는, 재생 정보는 최소 화질 만족도를 포함할 수 있다.The reproduction information may be information related to the reproduction of the moving picture including the frame. The reproduction information can correspond to the minimum image quality satisfaction. Alternatively, the playback information may include a minimum image quality satisfaction.

재생 정보는 프레임을 포함하는 동영상의 재생의 FPS와 관련된 정보일 수 있다.The playback information may be information related to the FPS of playback of the moving image including the frame.

예를 들면, 선택 정보의 값이 재생 정보의 값보다 더 크면, 제어부(2710)는 프레임에 대해 복호화를 수행하지 않는 것을 결정할 수 있다. 또는, 선택 정보의 값이 재생 정보의 이상이면 제어부(2710)는 프레임에 대해 복호화가 수행할 것을 결정할 수 있다.For example, if the value of the selection information is larger than the value of the reproduction information, the control unit 2710 can determine not to perform decoding on the frame. Alternatively, if the value of the selection information is equal to or larger than the reproduction information, the control unit 2710 can determine to decode the frame.

결정은 프레임을 포함하는 복수의 프레임들 중 프레임의 TI와 동일한 TI를 갖는 다른 프레임에게 공통적으로 적용될 수 있다. 복수의 프레임들은 기본 예측 단위의 프레임들일 수 있다. 또는, 복수의 프레임들은 GOP의 프레임들일 수 있다.The determination can be commonly applied to other frames having the same TI as the TI of the frame among the plurality of frames including the frame. The plurality of frames may be frames of a basic prediction unit. Alternatively, the plurality of frames may be frames of a GOP.

복호화 장치(2700)복수의 프레임들임의복호화 장치(2700)는 비디오의 전체의 프레임들 중 복호화하기로 결정된 임의0Decoding Apparatus 2700 Multiple Frames The arbitrary decoding apparatus 2700 decodes arbitrary frames 0,

단계(2830)에서, 프레임의 복호화가 결정된 경우, 복호화부(2720)는 복호화가 결정된 프레임에 대하여 프레임의 복호화를 수행할 수 있다.In step 2830, when decryption of the frame is determined, the decryption unit 2720 can decrypt the frame with respect to the decrypted frame.

복호화 장치(2700)Decryption device 2700

상술한 실시예들에서, 방법들은 일련의 단계 또는 유닛으로서 순서도를 기초로 설명되고 있으나, 본 발명은 단계들의 순서에 한정되는 것은 아니며, 어떤 단계는 상술한 바와 다른 단계와 다른 순서로 또는 동시에 발생할 수 있다. 또한, 당해 기술 분야에서 통상의 지식을 가진 자라면 순서도에 나타난 단계들이 배타적이지 않고, 다른 단계가 포함되거나, 순서도의 하나 또는 그 이상의 단계가 본 발명의 범위에 영향을 미치지 않고 삭제될 수 있음을 이해할 수 있을 것이다.In the above-described embodiments, although the methods are described on the basis of a flowchart as a series of steps or units, the present invention is not limited to the order of the steps, and some steps may occur in different orders or simultaneously . It will also be understood by those skilled in the art that the steps depicted in the flowchart illustrations are not exclusive and that other steps may be included or that one or more steps in the flowchart may be deleted without affecting the scope of the invention You will understand.

이상 설명된 본 발명에 따른 실시예들은 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CDROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기광 매체(magnetooptical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The embodiments of the present invention described above can be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program commands, data files, data structures, and the like, alone or in combination. The program instructions recorded on the computer-readable recording medium may be those specially designed and constructed for the present invention or may be those known and used by those skilled in the computer software arts. Examples of the computer-readable recording medium include magnetic media such as a hard disk, a floppy disk and a magnetic tape, optical recording media such as CDROM and DVD, magnetooptical media such as a floptical disk, , RAM, flash memory, and the like, which are specifically configured to store and execute program instructions. Examples of program instructions include machine language code such as those generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules for performing the processing according to the present invention, and vice versa.

이상에서 본 발명이 구체적인 구성요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명이 상기 실시예들에 한정되는 것은 아니며, 본 발명이 속하는 기술분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형을 꾀할 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, Those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등하게 또는 등가적으로 변형된 모든 것들은 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be construed as being limited to the above-described embodiments, and all of the equivalents or equivalents of the claims, as well as the following claims, I will say.

Claims

Determining whether to decode the frame based on selection information of the frame; And
Performing decoding of the frame if decoding of the frame is determined
And decoding the moving picture.

The method according to claim 1,
The determination is performed for each frame of a plurality of frames,
Wherein the plurality of frames are frames of a group of pictures (GOP).

The method according to claim 1,
Wherein the selection information is related to a ratio of a person who is not aware of a degradation in the picture quality of the moving picture even though the reproduction of the moving picture including the frame is set so that the frame is excluded from the decoding.

The method according to claim 1,
Wherein the selection information includes at least one of a moving picture decoding method and a moving picture decoding method related to a ratio of a person who does not perceive degradation of a picture quality of a moving picture even if frames per second (FPS) of the moving picture reproduction including the frame is determined so that the frame is excluded from decoding .

The method according to claim 1,
Whether or not the frame is decoded is determined based on a comparison between the value of the selection information and the value of the reproduction information,
Wherein the reproduction information is information related to reproduction of a moving picture including the frame.

6. The method of claim 5,
Wherein the reproduction information is information related to frames per second (FPS) of reproduction of a moving picture including the frame.

The method according to claim 1,
It is determined that the decoding is not performed on the frame if the value of the selection information is larger than the value of the reproduction information,
Wherein the reproduction information is information related to reproduction of a moving picture including the frame.

The method according to claim 1,
Wherein the selection information is contained in Supplemental Enhancement Information (SEI) for the frame.

The method according to claim 1,
Wherein the determination as to whether or not the decoding is to be performed is commonly applied to another frame having a temporal identifier identical to a temporal identifier of the frame among a plurality of frames including the frame.

A controller for determining whether to decode the frame based on frame selection information; And
A decoding unit for decoding the frame when decoding of the frame is determined,
And the moving picture decoding apparatus.

Generating selection information for a frame of a moving picture; And
Performing encoding of the moving picture based on the selection information
And a moving picture coding method.

12. The method of claim 11,
The encoding of the moving picture may include:
Determining whether to encode the frame based on the selection information for the frame; And
Performing encoding of the frame if encoding of the frame is determined;
And a moving picture coding method.

13. The method of claim 12,
Wherein the determination as to whether or not the encoding is performed is commonly applied to another frame having a temporal identifier identical to a temporal identifier of the frame among a plurality of frames including the frame.

12. The method of claim 11,
Wherein the selection information is related to a ratio of persons who are not aware of a decrease in picture quality of the moving picture even though the frame is excluded from coding of the video.

15. The method of claim 14,
Wherein the ratio is calculated by machine learning.

15. The method of claim 14,
Wherein the ratio is determined based on a feature vector of a frame.

12. The method of claim 11,
The selection information is plural,
Wherein the plurality of selection information is calculated for frames per second (frames per second) of reproduction of moving pictures.

12. The method of claim 11,
Wherein the bitstream generated by encoding the moving picture includes the selection information.

12. The method of claim 11,
Wherein the selection information is commonly applied to another frame having a temporal identifier equal to a temporal identifier of the frame among a plurality of frames including the frame.

12. The method of claim 11,
Wherein the selection information is related to a ratio of a person who does not perceive degradation in the picture quality of the moving picture to be reproduced even if the moving picture is set to be reproduced so that the frame is excluded from decoding.