KR102602690B1

KR102602690B1 - Method and apparatus for adaptive encoding and decoding based on image quality

Info

Publication number: KR102602690B1
Application number: KR1020160127739A
Authority: KR
Inventors: 정세윤; 김휘용; 김종호; 임성창; 양 상린; 씨 제이 쿠오 씨; 황 친
Original assignee: 한국전자통신연구원
Priority date: 2015-10-08
Filing date: 2016-10-04
Publication date: 2023-11-16
Also published as: KR20170042235A

Abstract

화질에 기반한 적응적 부호화 및 복호화를 위한 방법 및 장치가 제공된다. 부호화 장치는 비디오에 대해 최적의 초 당 프레임(Frame Per Second; FPS)를 결정할 수 있고, 결정된 FPS에 따라 비디오를 인코딩할 수 있다. 또한, 부호화 장치는 개선된 시간적 스케일러빌리티를 제공할 수 있다. 복호화 장치는 요구되는 최소 화질 만족도에 따라 비디오의 프레임들 중 재생될 프레임을 선택할 수 있다. 프레임의 선택을 통해, 복호화 장치는 개선된 시간적 스케일러빌리티를 제공할 수 있다.A method and device for adaptive encoding and decoding based on picture quality are provided. The encoding device can determine the optimal frame per second (FPS) for the video and encode the video according to the determined FPS. Additionally, the encoding device can provide improved temporal scalability. The decoding device can select a frame to be played among the frames of the video according to the minimum required image quality satisfaction. Through frame selection, the decoding device can provide improved temporal scalability.

Description

Method and device for adaptive encoding and decoding based on image quality {METHOD AND APPARATUS FOR ADAPTIVE ENCODING AND DECODING BASED ON IMAGE QUALITY}

아래의 실시예들은 영상의 복호화 방법, 복호화 장치, 부호화 방법 및 부호화 장치에 관한 것으로서, 보다 상세하게는 영상의 화질에 기반하여 부호화 및 복호화를 적응적으로 수행하는 방법 및 장치에 관한 것이다.The following embodiments relate to a video decoding method, a decoding device, an encoding method, and an encoding device. More specifically, they relate to a method and device that adaptively performs encoding and decoding based on the image quality of the image.

정보 통신 산업의 지속적인 발달을 통해 HD(High Definition) 해상도를 가지는 방송 서비스가 세계적으로 확산되었다. 이러한 확산을 통해, 많은 사용자들이 고해상도이며 고화질인 영상(image) 및/또는 비디오(video)에 익숙해지게 되었다.Through the continued development of the information and communications industry, broadcasting services with HD (High Definition) resolution have spread globally. Through this proliferation, many users have become accustomed to high-resolution, high-definition images and/or videos.

높은 화질에 대한 사용자들의 수요를 만족시키기 위하여, 많은 기관들이 차세대 영상 기기에 대한 개발에 박차를 가하고 있다. 에이치디티브이(High Definition TV; HDTV) 및 풀에이치디(Full HD; FHD) TV뿐만 아니라, FHD TV에 비해 4배 이상의 해상도를 갖는 울트라에이치디(Ultra High Definition; UHD) TV에 대한 사용자들의 관심이 증대하였고, 이러한 관심의 증대에 따라, 더 높은 해상도 및 화질을 갖는 영상에 대한 영상 부호화(encoding)/복호화(decoding) 기술이 요구된다.In order to satisfy users' demand for high image quality, many organizations are accelerating the development of next-generation imaging devices. User interest in not only High Definition TV (HDTV) and Full HD (FHD) TV, but also Ultra High Definition (UHD) TV, which has a resolution more than four times that of FHD TV. has increased, and with this increase in interest, image encoding/decoding technology for images with higher resolution and image quality is required.

UHD와 같은 신규한 비디오(video) 서비스에 있어서, 고 프레임 율(High Frame Rage; HFR)의 비디오에 대한 필요성이 높아지고 있다. 예를 들면, 고 프레임 율은 60 초 당 프레임(Frame Per Second; FPS) 이상의 프레임 재생 율일 수 있다.In new video services such as UHD, the need for high frame rate (HFR) video is increasing. For example, a high frame rate may be a frame refresh rate of 60 Frames Per Second (FPS) or higher.

그러나, 이러한 HFR의 비디오가 제공되기 위해서는 비디오의 데이터 량이 증가된다는 문제가 발생할 수 있다. 또한, 비디오의 데이터 량의 증가에 따라 비디오의 전송 및 저장에 있어서 비용의 문제 및 기술적인 문제가 발생할 수 있다.However, in order to provide such HFR video, a problem may arise in that the amount of video data increases. Additionally, as the amount of video data increases, cost issues and technical problems may arise in the transmission and storage of video.

다행히도, 인간의 시각의 특성 상, 모든 상황에서 HFR이 요구되지는 않는다, 일 예를 들면, 대부분의 사람들은 움직임이 거의 존재하지 않는 비디오에서는 30 FPS의 비디오 및 60 FPS의 비디오 간의 화질 차이 또는 화질 저하를 느끼지 못한다.Fortunately, due to the nature of human vision, HFR is not required in all situations. For example, most people will not see a difference in image quality between a 30 FPS video and a 60 FPS video in a video with little or no motion. I don't feel any degradation.

즉, 비디오의 내용에 따라, FPS가 특정한 기준 값(threshold value)의 이상이면, FPS가 더 높아지더라도, 인간의 인지적 특징에 따라, 대부분의 사람들은 화질 차이를 거의 느끼지 못할 수 있다.In other words, depending on the content of the video, if the FPS is above a certain threshold value, even if the FPS is higher, depending on human cognitive characteristics, most people may hardly notice any difference in picture quality.

일 실시예는 HFR의 비디오를 낮은 프레임 율의 비디오로 변환 할 때 발생하는 화질의 저하의 정도를 예측하는 방법 및 장치를 제공할 수 있다.One embodiment may provide a method and apparatus for predicting the degree of picture quality degradation that occurs when converting HFR video to low frame rate video.

일 실시예는 화질의 저하의 정도의 예측을 통해 부호화에 요구되는 비트 율을 감소시키는 방법 및 장치를 제공할 수 있다.One embodiment may provide a method and device for reducing the bit rate required for encoding by predicting the degree of deterioration in image quality.

일 실시예는 시간적 스케일러빌러티(Temporal Scalability; TS) 등을 통해 동영상의 프레임 율을 더 낮게 변환 할 때, 화질의 저하의 정도의 예측을 통해 생성된 정보를 사용하여 화질의 저하를 최소화하는 방법 및 장치를 제공할 수 있다. One embodiment is a method of minimizing the degradation of image quality by using information generated by predicting the degree of degradation of image quality when converting the frame rate of a video to a lower level through temporal scalability (TS), etc. and devices can be provided.

일 실시예는 시간적 스케일러빌러티를 적용함에 있어서 화질의 저하에 관련된 정보도 함께 고려하여, 화질 저하를 최소로 하는 방법 및 장치를 제공할 수 있다.One embodiment may provide a method and device for minimizing image quality degradation by also considering information related to image quality degradation when applying temporal scalability.

일 측에 있어서, 프레임의 선택 정보에 기초하여 상기 프레임의 복호화 여부의 결정을 수행하는 단계; 및 상기 프레임의 복호화가 결정된 경우 상기 프레임의 복호화를 수행하는 단계를 포함하는 동영상 복호화 방법이 제공된다.On one side, determining whether to decode the frame based on frame selection information; and performing decoding of the frame when decoding of the frame is determined.

상기 결정은 복수의 프레임들의 각 프레임에 대하여 수행될 수 있다.The decision may be performed for each frame of a plurality of frames.

상기 복수의 프레임들은 픽쳐의 그룹(Group Of Picture; GOP)의 프레임들일 수 있다.The plurality of frames may be frames of a group of pictures (GOP).

상기 선택 정보는 상기 프레임이 복호화에서 제외되도록 상기 프레임을 포함하는 동영상의 재생이 설정되더라도 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.The selection information may be related to the proportion of people who do not recognize the deterioration in the image quality of the video even if playback of the video including the frame is set to exclude the frame from decoding.

상기 선택 정보는 상기 프레임이 복호화에서 제외되도록 상기 프레임을 포함하는 동영상의 재생의 초 당 프레임(Frames Per Second; FPS)가 결정되더라도 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.The selection information may be related to the proportion of people who do not recognize the deterioration in video quality even if the frames per second (FPS) of video playback including the frame are determined so that the frame is excluded from decoding. .

상기 프레임의 상기 복호화 여부는 상기 선택 정보의 값 및 재생 정보의 값 간의 비교에 기반하여 결정될 수 있다.Whether or not to decode the frame may be determined based on comparison between the value of the selection information and the value of the reproduction information.

상기 재생 정보는 상기 프레임을 포함하는 동영상의 재생과 관련된 정보일 수 있다.The playback information may be information related to playback of a video including the frame.

상기 재생 정보는 상기 프레임을 포함하는 동영상의 재생의 초 당 프레임(Frames Per Second; FPS)과 관련된 정보일 수 있다.The playback information may be information related to frames per second (FPS) of playback of a video including the frame.

상기 선택 정보의 값이 재생 정보의 값보다 더 크면 상기 프레임에 대해서는 상기 복호화가 수행되지 않는 것이 결정될 수 있다.If the value of the selection information is greater than the value of the reproduction information, it may be determined that the decoding is not performed on the frame.

상기 선택 정보는 상기 프레임에 대한 에스이아이(Supplemental Enhancement Information; SEI) 내에 포함될 수 있다.The selection information may be included in Supplemental Enhancement Information (SEI) for the frame.

상기 복호화 여부의 결정은 상기 프레임을 포함하는 복수의 프레임들 중 상기 프레임의 시간적 식별자와 동일한 시간적 식별자를 갖는 다른 프레임에게 공통적으로 적용될 수 있다.The decision on whether to decode may be commonly applied to other frames having the same temporal identifier as the temporal identifier of the frame among a plurality of frames including the frame.

다른 일 측에 있어서, 프레임의 선택 정보에 기반하여 상기 프레임의 복호화 여부의 결정을 수행하는 제어부; 및 상기 프레임의 복호화가 결정된 경우 상기 프레임의 복호화를 수행하는 복호화부를 포함하는 동영상 복호화 장치가 제공된다.On the other hand, a control unit that determines whether to decode the frame based on frame selection information; and a decoding unit that performs decoding of the frame when decoding of the frame is determined.

또 다른 일 측에 있어서, 동영상의 프레임에 대한 선택 정보를 생성하는 단계; 및 상기 선택 정보에 기반하여 상기 동영상의 부호화를 수행하는 단계를 포함하는 동영상 부호화 방법이 제공된다.In another aspect, generating selection information for a frame of a video; and performing encoding of the video based on the selection information.

상기 동영상의 부호화를 수행하는 단계는, 상기 프레임에 대한 상기 선택 정보에 기반하여 상기 프레임의 부호화 여부의 결정을 수행하는 단계; 및 상기 프레임의 부호화가 결정된 경우 상기 프레임의 부호화를 수행하는 단계를 포함할 수 있다.The encoding of the video may include determining whether to encode the frame based on the selection information for the frame; And when the encoding of the frame is determined, it may include performing encoding of the frame.

상기 부호화 여부의 결정은 상기 프레임을 포함하는 복수의 프레임들 중 상기 프레임의 시간적 식별자와 동일한 시간적 식별자를 갖는 다른 프레임에게 공통적으로 적용될 수 있다.The decision on whether to encode may be commonly applied to other frames having the same temporal identifier as that of the frame among a plurality of frames including the frame.

상기 선택 정보는 상기 프레임이 비디오의 부호화에서 제외되더라도 상기 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.The selection information may be related to the percentage of people who do not notice a decrease in the picture quality of the video even if the frame is excluded from video encoding.

상기 비율은 기계 학습에 의해 계산될 수 있다.The ratio can be calculated by machine learning.

상기 비율은 프레임의 특징 벡터에 기반하여 결정될 수 있다.The ratio may be determined based on the feature vector of the frame.

상기 선택 정보는 복수일 수 있다.The selection information may be plural.

상기 복수의 선택 정보는 동영상의 재생의 초 당 프레임(Frames Per Second; FPS) 별로 계산될 수 있다.The plurality of selection information may be calculated for each frame per second (FPS) of video playback.

상기 동영상의 부호화에 의해 생성된 비트스트림은 상기 선택 정보를 포함할 수 있다.The bitstream generated by encoding the video may include the selection information.

상기 선택 정보는 상기 프레임을 포함하는 복수의 프레임들 중 상기 프레임의 시간적 식별자와 동일한 시간적 식별자를 갖는 다른 프레임에게 공통적으로 적용될 수 있다.The selection information may be commonly applied to other frames having the same temporal identifier as that of the frame among a plurality of frames including the frame.

상기 선택 정보는 상기 프레임이 복호화에서 제외되도록 상기 동영상의 재생이 설정되더라도 재생되는 상기 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.The selection information may be related to the proportion of people who do not recognize a decrease in the image quality of the played video even if playback of the video is set to exclude the frame from decoding.

HFR의 비디오의 단위 구간(unit interval) 별로, 단위 구간의 프레임 율을 낮출 경우 단위 구간이 화질이 얼마나 저하되는지를 예측하는 방법 및 장치가 제공된다.A method and apparatus are provided for predicting how much the picture quality of a unit interval will deteriorate when the frame rate of the unit interval is lowered for each unit interval of HFR video.

HFR의 비디오를 단위 구간 별로, 사용자에 의해 입력된 화질 차이 범위 또는 기정의된 허용 가능한 화질 차이 범위의 이내로 인지 화질을 유지하면서, 단위 구간의 프레임 율을 감소시킴으로써 비디오의 데이터 율을 감소시키는 방법 및 장치가 제공된다. 예를 들면, 기정의된 허용 가능한 화질 차이 범위는 80%의 사람들이 화질 차이를 느끼지 못하는 범위일 수 있다.A method of reducing the data rate of HFR video by reducing the frame rate of a unit section while maintaining the perceived image quality within the image quality difference range input by the user or within the predefined allowable image quality difference range, and A device is provided. For example, a predefined acceptable range of difference in picture quality may be a range in which 80% of people do not notice a difference in picture quality.

TS의 적용에 있어서 화질이 얼마나 저하되는지를 예측하는 화질 저하 예측 정보를 고려하는 방법 및 장치가 제공된다.A method and apparatus are provided that consider image quality degradation prediction information that predicts how much image quality will deteriorate in the application of TS.

화질 저하 예측 정보를 고려함으로써 사용자에 의해 입력된 화질 저하 범위 또는 기정의된 화질 저하 범위에 해당하는 단위 구간에 대해서만 TS를 적용하는 방법 및 장치가 제공된다.A method and apparatus are provided for applying TS only to a unit section corresponding to an image quality degradation range input by a user or a predefined image quality degradation range by considering image quality degradation prediction information.

도 1은 본 발명이 적용되는 부호화 장치의 일 실시예에 따른 구성을 나타내는 블록도이다.
도 2는 본 발명이 적용되는 복호화 장치의 일 실시예에 따른 구성을 나타내는 블록도이다.
도 3은 일 실시예에 따른 SURP의 동작을 설명한다.
도 4는 일 예에 따른 화질 평가 결과를 도시한다.
도 5는 일 실시예에 따른 특징 벡터(feature vector)의 추출의 절차를 셜명한다.
도 6은 일 예에 따른 공간적 임의의 측정을 위한 예측 모델을 나타낸다.
도 7은 일 예에 따른 SRM을 도시한다.
도 8은 일 예에 따른 SM을 도시한다.
도 9는 일 예에 따른 SCM을 도시한다.
도 10는 일 예에 따른 VSM을 도시한다.
도 11은 일 에에 따른 SIM을 도시한다.
도 12은 일 예에 따른 시간적 예측 모델의 생성에 사용되는 프레임 및 예측 대상 프레임 간의 관계를 도시한다.
도 13a는 일 예에 따른 연속된 3개의 프레임들 중 첫 번째의 프레임을 나타낸다.
도 13b는 일 예에 따른 연속된 3개의 프레임들 중 두 번째의 프레임을 나타낸다.
도 13c는 일 예에 따른 연속된 3개의 프레임들 중 세 번째 프레임을 나타낸다.
도 13d는 일 예에 따른 연속된 3개의 프레임들에 대한 시간적 임의 맵(Temporal Randomness Map; TRM)을 나타낸다.
도 13e는 일 예에 따른 연속된 3개의 프레임들에 대한 공간시간적 영향 맵(SpatioTemporal Influence Map; STIM)을 나타낸다.
도 13f는 일 예에 따른 연속된 3개의 프레임들에 대한 가중치가 부여된 공간시간적 영향 맵(Weighted SpatioTemporal Influence Map; WSTIM)을 나타낸다.
도 14는 일 예에 따른 FPS 별 화질 만족도를 나타낸다.
도 15는 일 예에 따른 GOP 별로 결정된 최적 프레임 율을 나타낸다.
도 16은 일 예에 따른 최적 프레임 율을 사용하는 GOP의 부호화 방법의 흐름도이다.
도 17은 일 예에 따른 최적 프레임 율의 결정 방법의 흐름도이다.
도 18은 일 예에 따른 시간적 식별자(temporal identifier)를 갖는 프레임들의 계층적인 구조를 나타낸다.
도 19는 일 예에 따른 화질 만족도를 포함하는 메시지를 나타낸다.
도 20은 일 예에 따른 화질 만족도 테이블의 인덱스를 포함하는 메시지를 나타낸다.
도 21은 일 예에 따른 화질 만족도들의 전체를 포함하는 메시지를 나타낸다.
도 22는 일 예에 따른 화질 만족도 정보의 생성 방법을 도시한다.
도 23a는 일 예에 따른 75% 이상의 화질 만족도를 유지하는 구성을 나타낸다.
도 23b는 일 예에 따른 75%의 화질 만족도를 기준으로 FPS의 변경의 여부를 결정하는 구성을 나타낸다.
도 24는 일 실시예에 따른 부호화 장치의 구조도이다.
도 25는 일 실시예에 따른 부호화 방법의 흐름도이다.
도 26은 일 예에 따른 동영상의 부호화를 수행하는 방법의 흐름도이다.
도 27은 일 실시예에 따른 복호화 장치의 구조도이다.
도 28은 일 실시예에 따른 복호화 방법의 흐름도이다.1 is a block diagram showing the configuration of an encoding device to which the present invention is applied according to an embodiment.
Figure 2 is a block diagram showing the configuration of a decoding device according to an embodiment to which the present invention is applied.
Figure 3 explains the operation of SURP according to one embodiment.
Figure 4 shows image quality evaluation results according to an example.
Figure 5 explains the procedure for extracting a feature vector according to an embodiment.
Figure 6 shows a prediction model for spatial random measurement according to an example.
Figure 7 shows SRM according to an example.
Figure 8 shows SM according to one example.
Figure 9 shows SCM according to an example.
Figure 10 shows a VSM according to an example.
Figure 11 shows the SIM according to work.
Figure 12 shows the relationship between a frame used to generate a temporal prediction model and a prediction target frame according to an example.
FIG. 13A shows the first frame of three consecutive frames according to an example.
FIG. 13B shows the second frame among three consecutive frames according to an example.
FIG. 13C shows the third frame among three consecutive frames according to an example.
FIG. 13D shows a temporal randomness map (TRM) for three consecutive frames according to an example.
FIG. 13E shows a SpatioTemporal Influence Map (STIM) for three consecutive frames according to an example.
FIG. 13F shows a Weighted SpatioTemporal Influence Map (WSTIM) for three consecutive frames according to an example.
Figure 14 shows image quality satisfaction by FPS according to an example.
Figure 15 shows the optimal frame rate determined for each GOP according to an example.
Figure 16 is a flowchart of a GOP encoding method using an optimal frame rate according to an example.
17 is a flowchart of a method for determining an optimal frame rate according to an example.
Figure 18 shows a hierarchical structure of frames with temporal identifiers according to an example.
Figure 19 shows a message including image quality satisfaction according to an example.
Figure 20 shows a message including an index of an image quality satisfaction table according to an example.
Figure 21 shows a message including all picture quality satisfaction levels according to an example.
Figure 22 illustrates a method of generating image quality satisfaction information according to an example.
Figure 23a shows a configuration for maintaining image quality satisfaction of 75% or more according to an example.
Figure 23b shows a configuration for determining whether to change FPS based on 75% image quality satisfaction according to an example.
Figure 24 is a structural diagram of an encoding device according to an embodiment.
Figure 25 is a flowchart of an encoding method according to an embodiment.
Figure 26 is a flowchart of a method for encoding a video according to an example.
Figure 27 is a structural diagram of a decoding device according to an embodiment.
Figure 28 is a flowchart of a decryption method according to an embodiment.

후술하는 예시적 실시예들에 대한 상세한 설명은, 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 실시예를 실시할 수 있기에 충분하도록 상세히 설명된다. 다양한 실시예들은 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 실시예의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 예시적 실시예들의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다.For a detailed description of the exemplary embodiments described below, refer to the accompanying drawings, which illustrate specific embodiments by way of example. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments. It should be understood that the various embodiments are different from one another but are not necessarily mutually exclusive. For example, specific shapes, structures and characteristics described herein with respect to one embodiment may be implemented in other embodiments without departing from the spirit and scope of the invention. Additionally, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the embodiment. Accordingly, the detailed description that follows is not to be taken in a limiting sense, and the scope of the exemplary embodiments is limited only by the appended claims, together with all equivalents to what those claims assert if properly described.

도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다. 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.Similar reference numbers in the drawings refer to identical or similar functions across various aspects. The shapes and sizes of elements in the drawings may be exaggerated for clearer explanation.

어떤 구성요소(component)가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 상기의 2개의 구성요소들이 서로 간에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있으나, 상기의 2개의 구성요소들의 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 또한, 예시적 실시예들에서 특정 구성을 "포함"한다고 기술하는 내용은 상기의 특정 구성 이외의 구성을 배제하는 것이 아니며, 추가적인 구성이 예시적 실시예들의 실시 또는 예시적 실시예들의 기술적 사상의 범위에 포함될 수 있음을 의미한다.When a component is said to be “connected” or “connected” to another component, the two components may be directly connected or connected to each other, but It should be understood that other components may exist in the middle of the components. In addition, the description of “including” a specific configuration in the exemplary embodiments does not exclude configurations other than the specific configuration, and does not mean that additional configurations are included in the implementation of the exemplary embodiments or the technical idea of the exemplary embodiments. This means that it can be included in the scope.

제1 및 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기의 구성요소들은 상기의 용어들에 의해 한정되어서는 안 된다. 상기의 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하여 지칭하기 위해서 사용된다. 예를 들어, 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Terms such as first and second may be used to describe various components, but the components should not be limited by the above terms. The above terms are used to distinguish one component from other components. For example, a first component may be named a second component without departing from the scope of rights, and similarly, the second component may also be named a first component.

또한 실시예들에 나타나는 구성요소들은 서로 다른 특징적인 기능들을 나타내기 위해 독립적으로 도시되는 것으로, 각 구성요소가 분리된 하드웨어나 하나의 소프트웨어 구성 단위로만 이루어짐을 의미하지 않는다. 즉, 각 구성요소는 설명의 편의상 각각의 구성요소로 나열된 것이다. 예를 들면, 구성요소들 중 적어도 두 개의 구성요소들이 하나의 구성요소로 합쳐질 수 있다. 또한, 하나의 구성요소가 복수의 구성요소들로 나뉠 수 있다. 이러한 각 구성요소의 통합된 실시예 및 분리된 실시예 또한 본질에서 벗어나지 않는 한 권리범위에 포함된다.Additionally, the components appearing in the embodiments are shown independently to represent different characteristic functions, and this does not mean that each component consists of separate hardware or a single software component. That is, each component is listed as a separate component for convenience of explanation. For example, at least two of the components may be combined into one component. Additionally, one component may be divided into multiple components. Integrated and separate embodiments of each of these components are also included in the scope of rights as long as they do not deviate from the essence.

또한, 일부의 구성요소는 본질적인 기능을 수행하는 필수적인 구성요소는 아니고 단지 성능을 향상시키기 위한 선택적 구성요소일 수 있다. 실시예들은 실시예의 본질을 구현하는데 필수적인 구성부만을 포함하여 구현될 수 있고, 예를 들면, 단지 성능 향상을 위해 사용되는 구성요소와 같은, 선택적 구성요소가 제외된 구조 또한 권리 범위에 포함된다.Additionally, some components may not be essential components that perform essential functions, but may simply be optional components to improve performance. Embodiments may be implemented including only components essential to implement the essence of the embodiment, and structures excluding optional components, for example, components used only to improve performance, are also included in the scope of rights.

이하에서는, 기술분야에서 통상의 지식을 가진 자가 실시예들을 용이하게 실시할 수 있도록 하기 위하여, 첨부된 도면을 참조하여 실시예들을 상세히 설명하기로 한다. 실시예들을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 명세서의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, embodiments will be described in detail with reference to the attached drawings so that those skilled in the art can easily implement the embodiments. In describing the embodiments, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present specification, the detailed description will be omitted.

이하에서, 영상은 비디오(video)을 구성하는 하나의 픽쳐를 의미할 수 있으며, 비디오 자체를 나타낼 수도 있다. 예를 들면, "영상의 부호화 및/또는 복호화"는 "비디오의 부호화 및/또는 복호화"를 의미할 수 있으며, "비디오를 구성하는 영상들 중 하나의 영상의 부호화 및/또는 복호화"를 의미할 수도 있다.Hereinafter, an image may refer to a single picture constituting a video, and may also represent the video itself. For example, “encoding and/or decoding of an image” may mean “encoding and/or decoding of a video,” and may mean “encoding and/or decoding of one of the images that constitute a video.” It may be possible.

이하에서, 실시예들에서 사용된 숫자는 일 예에 불과한 것으로, 실시예에서 명시된 값은 아닌 다른 값으로 대체될 수 있다.Hereinafter, the numbers used in the examples are merely examples and may be replaced with values other than those specified in the examples.

실시예에서, 부호화에 적용되는 정보, 동작, 기능은 부호화에 대해 상응하는 방식으로 복호화에 대해서도 적용될 수 있다.
In embodiments, information, operations, and functions that apply to encoding may also apply to decoding in a corresponding manner to encoding.

전술된 것과 같은 인간의 시각 특성을 이용하여, 프레임 율을 적응적으로 자동으로 조정할 수 있다면, 다수의 사람들을 대상으로 인지 화질(perceptual quality)의 측면에서는 화질의 열화 없이 비디오의 데이터 량이 감소될 수 있다. 인간의 시각 특성을 HFR 비디오에 적용하여 비디오를 구간 별로 최적의 화면 재생율로 변환할 수 있다면 비디오의 데이터 량의 증가가 최소화될 수 있다. 따라서, HFR 비디오의 전송 및 저장 과정에서 발생하는 비용적인 문제 및 기술적인 문제가 해결될 수 있다.If the frame rate can be adaptively and automatically adjusted using human visual characteristics as described above, the amount of video data can be reduced without deteriorating image quality in terms of perceptual quality for a large number of people. there is. If human visual characteristics can be applied to HFR video and the video can be converted to the optimal screen refresh rate for each section, the increase in the amount of video data can be minimized. Therefore, cost problems and technical problems that arise during the transmission and storage process of HFR video can be solved.

프레임 율의 적응적이고 자동적인 조정을 위해서는, 비디오를 HFR에서 저 프레임 율로 변경할 경우 나타나는 화질 차이가 어느 정도인가가 정확하게 예측될 수 있어야 한다. 이러한 예측의 기술은 비디오의 부호화 전처리(preprocessing) 과정, 부호화 과정, 전송 과정 및/또는 복호화 과정 중에서 유용하게 이용될 수 있다.Adaptive and automatic adjustment of frame rates requires that it be possible to accurately predict the difference in picture quality that will occur when changing video from HFR to a lower frame rate. This prediction technology can be usefully used during the video encoding preprocessing process, encoding process, transmission process, and/or decoding process.

그러나, 종래의 기술에 의해서는, HFR의 비디오를 저 프레임 율의 비디오를 변환하였을 때 발생하는 화질 차이가 정확하게 예측될 수 없다.However, using conventional technology, the difference in picture quality that occurs when converting HFR video to low frame rate video cannot be accurately predicted.

또한, 화질의 조정과 관련된 종래의 기술로서 시간적 스케일러빌러티(Temporal Scalability; TS) 기술이 있다. TS 기술은 비디오에 TS를 적용하였을 때 비디오에 대한 화질이 어떻게 변화할 지를 고려하지 않고 일률적으로 비디오의 프레임 율을 조정한다. 따라서, 비디오에 TS가 적용될 경우, 비디오의 화질이 크게 저하될 수 있다는 문제가 발생한다.Additionally, a conventional technology related to image quality adjustment is Temporal Scalability (TS) technology. TS technology uniformly adjusts the frame rate of the video without considering how the video quality will change when TS is applied to the video. Therefore, when TS is applied to video, a problem arises in that the picture quality of the video may be significantly deteriorated.

아래의 실시예들에서는, 1) 화질 만족도를 예측하는 방법, 2) 화질 만족도의 예측을 통해 비디오를 최적의 저 프레임 율로 변경하는 방법 및 3) 화질 만족도의 예측에 의해 생성된 정보를 TS에 활용하는 방법이 설명된다.In the following embodiments, 1) a method for predicting image quality satisfaction, 2) a method for changing video to an optimal low frame rate through prediction of image quality satisfaction, and 3) utilizing the information generated by predicting image quality satisfaction in TS. How to do this is explained.

1) 화질 만족도를 예측하는 방법: HFR을 낮은 프레임 율로 변경할 경우 발생하는 화질 차이의 정도를 예측하는 화질 만족도 예측기가 설명된다.1) How to predict picture quality satisfaction: A picture quality satisfaction predictor that predicts the degree of picture quality difference that occurs when changing HFR to a low frame rate is explained.

2) 화질 만족도의 예측을 통해 비디오를 최적의 저 프레임 율로 변경하는 방법: 화질 만족도 예측기를 이용하여 고정된 HFR의 비디오를 기본 결정 단위 별로 최적의 낮은 프레임 율로 변경하여 부호화하는 부호화기가 설명된다.2) Method of changing video to an optimal low frame rate through prediction of picture quality satisfaction: An encoder that changes video of fixed HFR to an optimal low frame rate for each basic decision unit and encodes it using a picture quality satisfaction predictor is explained.

3) 화질 만족도의 예측에 의해 생성된 정보를 TS에 활용하는 방법: 비디오의 부호화에 의해 생성되는 비디오 부호화 스트림은 화질 만족도 예측기에 의해 생성된 화질 만족도 정보를 포함할 수 있다. 화질 정보를 비디오 부호화 스트림의 전송 과정 또는 복호화 과정 중 TS를 위해 사용될 수 있다. 화질 만족도 정보에 의해 종래에는 가능하지 않았던 개선된 TS가 제공될 수 있다. 화질 만족도 정보에 의해 TS가 적용될 경우 비디오의 화질이 어떻게 변화할 지가 고려될 수 있다. 이러한 고려를 통해 화질 저하를 최소화하는 개선된 TS가 제공될 수 있다.
3) Method of utilizing information generated by prediction of picture quality satisfaction in TS: A video encoding stream generated by video encoding may include picture quality satisfaction information generated by a picture quality satisfaction predictor. Picture quality information can be used for TS during the transmission or decoding process of a video encoding stream. An improved TS that was previously not possible can be provided based on image quality satisfaction information. How the video quality will change when TS is applied can be considered based on picture quality satisfaction information. Through these considerations, an improved TS that minimizes image quality degradation can be provided.

우선, 실시예들에서 사용되는 용어를 설명한다.First, terms used in the embodiments will be explained.

유닛(unit): "유닛"은 영상의 부호화 및 복호화의 단위를 나타낼 수 있다. 유닛 및 블록(block)의 의미들은 동일할 수 있다. 또한, 용어 "유닛" 및 "블록"은 서로 교체되어 사용될 수 있다.Unit: “Unit” may represent a unit of video encoding and decoding. The meanings of unit and block may be the same. Additionally, the terms “unit” and “block” may be used interchangeably.

유닛(또는, 블록)은 샘플의 MxN 배열일 수 있다. M 및 N은 각각 양의 정수일 수 있다. 유닛은 흔히 2차원의 샘플의 배열을 의미할 수 있다. 샘플은 픽셀 또는 픽셀 값일 수 있다. A unit (or block) may be an MxN array of samples. M and N can each be positive integers. A unit can often refer to a two-dimensional array of samples. Samples can be pixels or pixel values.

영상의 부호화 및 복호화에 있어서, 유닛은 하나의 영상의 분할에 의해 생성된 영역일 수 있다. 하나의 영상은 복수의 유닛들로 분할될 수 있다. 영상의 부호화 및 복호화에 있어서, 유닛의 종류에 따라서 유닛에 대한 기정의된 처리가 수행될 수 있다. 기능에 따라서, 유닛의 타입은 매크로 유닛(Macro Unit), 코딩 유닛(Coding Unit; CU), 예측 유닛(Prediction Unit; PU) 및 변환 유닛(transform Unit; TU) 등으로 분류될 수 있다. 하나의 유닛은 유닛에 비해 더 작은 크기를 갖는 하위 유닛으로 더 분할될 수 있다. In video encoding and decoding, a unit may be an area created by dividing one image. One image can be divided into multiple units. In encoding and decoding images, predefined processing may be performed on units depending on the type of unit. Depending on the function, the type of unit can be classified into macro unit (Macro Unit), coding unit (CU), prediction unit (PU), and transform unit (TU). One unit may be further divided into subunits having a smaller size compared to the unit.

블록 분할 정보는 유닛의 깊이(depth)에 관한 정보를 포함할 수 있다. 깊이 정보는 유닛이 분할되는 회수 및/또는 정도를 나타낼 수 있다. Block division information may include information about the depth of the unit. Depth information may indicate the number and/or extent to which a unit is divided.

하나의 유닛은 트리 구조(tree structure)에 기반하여 깊이 정보(depth)를 가지면서 계층적으로 복수의 하위 유닛들로 분할될 수 있다. 말하자면, 유닛 및 상기의 유닛의 분할에 의해 생성된 하위 유닛은 노드 및 상기의 노드의 자식 노드에 각각 대응할 수 있다. 각각의 분할된 하위 유닛은 깊이 정보를 가질 수 있다. 유닛의 깊이 정보는 유닛이 분할된 회수 및/또는 정도를 나타내므로, 하위 유닛의 분할 정보는 하위 유닛의 크기에 관한 정보를 포함할 수도 있다. One unit may be hierarchically divided into a plurality of sub-units while having depth information based on a tree structure. In other words, a unit and a sub-unit created by division of the unit may respectively correspond to a node and a child node of the node. Each divided sub-unit may have depth information. Since the depth information of a unit indicates the number and/or extent to which the unit is divided, the division information of the sub-unit may include information about the size of the sub-unit.

트리 구조에서, 가장 상위 노드는 분할되지 않은 최초의 유닛에 대응할 수 있다. 가장 상위 노드는 루트 노드(root node)로 칭해질 수 있다. 또한, 가장 상위 노드는 최소의 깊이 값을 가질 수 있다. 이 때, 가장 상위 노드는 레벨 0의 깊이를 가질 수 있다. In a tree structure, the highest node may correspond to the first undivided unit. The highest node may be referred to as the root node. Additionally, the highest node may have the minimum depth value. At this time, the highest node may have a depth of level 0.

레벨 1의 깊이를 갖는 노드는 최초의 유닛이 한 번 분할됨에 따라 생성된 유닛을 나타낼 수 있다. 레벨 2의 깊이를 갖는 노드는 최초의 유닛이 두 번 분할됨에 따라 생성된 유닛을 나타낼 수 있다. A node with a depth of level 1 may represent a unit created as the original unit is divided once. A node with a depth of level 2 may represent a unit created by dividing the original unit twice.

레벨 n의 깊이를 갖는 노드는 최초의 유닛이 n번 분할됨에 따라 생성된 유닛을 나타낼 수 있다. A node with a depth of level n may represent a unit created as the original unit is divided n times.

리프 노드는 가장 하위의 노드일 수 있으며, 더 분할될 수 없는 노드일 수 있다. 리프 노드의 깊이는 최대 레벨일 수 있다. 예를 들면, 최대 레벨의 기정의된 값은 3일 수 있다. A leaf node may be the lowest node and may be a node that cannot be further divided. The depth of the leaf node may be at the maximum level. For example, the predefined value of the maximum level may be 3.

변환 유닛(Transform Unit): 변환 유닛은 변환, 역변환, 양자화, 역양자화, 변환 계수 부호화, 및 변환 계수 복호화 등과 같은 잔차 신호(residual signal) 부호화 및/또는 잔여 신호 복호화에 있어서의 기본 유닛일 수 있다. 하나의 변환 유닛은 더 작은 크기를 갖는 다수의 변환 유닛들 분할될 수 있다.Transform Unit: A transform unit may be a basic unit in residual signal coding and/or residual signal decoding, such as transform, inverse transform, quantization, inverse quantization, transform coefficient coding, and transform coefficient decoding. . One transformation unit may be divided into multiple transformation units with smaller sizes.

예측 유닛(Prediction Unit): 예측 유닛은 예측 또는 보상(compensation)의 수행에 있어서의 기본 단위일 수 있다. 예측 유닛은 분할에 의해 다수의 파티션(partition)들이 될 수 있다. 다수의 파티션들 또한 예측 또는 보상의 수행에 있어서의 기본 단위일 수 있다. 예측 유닛의 분할에 의해 생성된 파티션 또한 예측 유닛일 수 있다.Prediction Unit: A prediction unit may be a basic unit in performing prediction or compensation. A prediction unit may be divided into multiple partitions. Multiple partitions may also be the basic unit in performing prediction or compensation. A partition created by dividing a prediction unit may also be a prediction unit.

복원된 이웃 유닛(Reconstructed Neighbor Unit): 복원된 이웃 유닛은 부호화 대상 유닛 또는 복호화 대상 유닛의 주변에 이미 부호화 또는 복호화되어 복원된 유닛일 수 있다. 복원된 이웃 유닛은 대상 유닛에 대한 공간적(spatial) 인접 유닛 또는 시간적(temporal) 인접 유닛일 수 있다.Reconstructed Neighbor Unit: The reconstructed neighbor unit may be a unit that has already been encoded or decoded and restored around the encoding target unit or the decoding target unit. The restored neighboring unit may be a spatially adjacent unit or a temporally adjacent unit to the target unit.

예측 유닛 파티션: 예측 유닛 파티션은 예측 유닛이 분할된 형태를 의미할 수 있다.Prediction unit partition: Prediction unit partition may refer to the form in which the prediction unit is divided.

파라미터 세트(Parameter Set): 파라미터 세트는 비트스트림 내의 구조(structure) 중 헤더(header) 정보에 해당할 수 있다. 예를 들면, 파라미터 세트는 시퀀스 파라미터 세트(sequence parameter set), 픽쳐 파라미터 세트(picture parameter set) 및 적응 파라미터 세트(adaptation parameter set) 등을 포함할 수 있다.Parameter Set: The parameter set may correspond to header information among the structures in the bitstream. For example, the parameter set may include a sequence parameter set, a picture parameter set, an adaptation parameter set, etc.

율왜곡 최적화(ratedistortion optimization): 부호화 장치는 코딩 유닛의 크기, 예측 모드, 예측 유닛의 크기, 움직임 정보 및, 변환 유닛의 크기 등의 조합을 이용해서 높은 부호화 효율을 제공하기 위해 율왜곡 최적화를 사용할 수 있다.Rate distortion optimization: The encoding device uses rate distortion optimization to provide high coding efficiency using a combination of coding unit size, prediction mode, prediction unit size, motion information, and transformation unit size. You can.

율왜곡 최적화 방식은 상기의 조합들 중에서 최적의 조합을 선택하기 위해 각 조합의 율왜곡 비용(ratedistortion cost)을 계산할 수 있다. 율왜곡 비용은 아래의 수학식 1을 이용하여 계산될 수 있다. 일반적으로 상기 율왜곡 비용이 최소가 되는 조합이 율왜곡 최적화 방식에 있어서의 최적의 조합으로 선택될 수 있다. The rate distortion optimization method can calculate the rate distortion cost of each combination to select the optimal combination among the above combinations. The rate distortion cost can be calculated using Equation 1 below. In general, the combination that minimizes the rate distortion cost can be selected as the optimal combination in the rate distortion optimization method.

D는 왜곡을 나타낼 수 있다. D는 변환 블록 내에서 원래의 변환 계수들 및 복원된 변환 계수들 간의 차이 값들의 제곱들의 평균(mean square error)일 수 있다.D may indicate distortion. D may be the mean square error of the difference values between the original transform coefficients and the restored transform coefficients within the transform block.

R은 율을 나타낼 수 있다. R은 관련된 문맥 정보를 이용한 비트 율을 나타낼 수 있다.R can represent a rate. R can represent the bit rate using related context information.

λ는 라그랑지안 승수(Lagrangian multiplier)를 나타낼 수 있다. R은 예측 모드, 움직임 정보 및 부호화 블록 플래그(coded block flag) 등과 같은 부호화 파라미터 정보뿐만 아니라, 변환 계수의 부호화에 의해 발생하는 비트도 포함할 수 있다.λ may represent a Lagrangian multiplier. R may include not only encoding parameter information such as prediction mode, motion information, and coded block flag, but also bits generated by encoding of transform coefficients.

부호화 장치는 정확한 D 및 R을 계산하기 위해 인터 예측 및/또는 인트라 예측, 변환, 양자화, 엔트로피 부호화, 역양자화, 역변환 등의 과정을 수행하는데, 이러한 과정은 부호화 장치에서의 복잡도를 크게 증가시킬 수 있다.The encoding device performs processes such as inter prediction and/or intra prediction, transformation, quantization, entropy encoding, inverse quantization, and inverse transformation to calculate accurate D and R, and these processes can significantly increase the complexity of the encoding device. there is.

참조 픽쳐(reference picture): 참조 픽쳐는 인터 예측 또는 움직임 보상에 사용되는 영상일 수 있다. 참조 픽쳐는 인터 예측 또는 움직임 보상을 위해 대상 유닛이 참조하는 참조 유닛을 포함하는 픽쳐일 수 있다. 픽쳐 및 영상의 의미들은 동일할 수 있다. 또한, 용어 "픽쳐" 및 "영상"은 서로 교체되어 사용될 수 있다.Reference picture: A reference picture may be an image used for inter prediction or motion compensation. A reference picture may be a picture that includes a reference unit referenced by the target unit for inter prediction or motion compensation. The meanings of a picture and a video may be the same. Additionally, the terms “picture” and “image” may be used interchangeably.

참조 픽쳐 리스트(reference picture list): 참조 픽쳐 리스트는 인터 예측 또는 움직임 보상에 사용되는 참조 영상들을 포함하는 리스트일 수 있다. 참조 픽쳐 리스트의 종류는 리스트 조합(List Combined; LC), 리스트 0(List 0; L0) 및 리스트 1(List 1; L1) 등이 있을 수 있다.Reference picture list: The reference picture list may be a list containing reference pictures used for inter prediction or motion compensation. Types of reference picture lists may include List Combined (LC), List 0 (L0), and List 1 (L1).

움직임 벡터(Motion Vector; MV): 움직임 벡터는 인터 예측에서 사용되는 2차원의 벡터일 수 있다. 예를 들면, MV는 (mv_x, mv_y)와 같은 형태로 표현될 수 있다. mv_x는 수평(horizontal) 성분을 나타낼 수 있고, mv_y 는 수직(vertical) 성분을 나타낼 수 있다.Motion Vector (MV): A motion vector may be a two-dimensional vector used in inter prediction. For example, MV can be expressed in a form such as (mv _x , mv _y ). mv _x can represent the horizontal component, and mv _y can represent the vertical component.

MV는 대상 픽쳐 및 참조 픽쳐 간의 오프셋(offset)을 나타낼 수 있다. MV may indicate an offset between a target picture and a reference picture.

탐색 영역(search range): 탐색 영역은 인터 예측 중 MV에 대한 탐색이 이루어지는 2차원의 영역일 수 있다. 예를 들면, 탐색 영역의 크기는 MxN일 수 있다. M 및 N은 각각 양의 정수일 수 있다.Search range: The search range may be a two-dimensional area where a search for MV is performed during inter prediction. For example, the size of the search area may be MxN. M and N can each be positive integers.

기본 결정 단위(basic decision unit): 기본 결정 단위는 실시예에서 처리의 대상이 되는 기본 단위를 나타낼 수 있다. 기본 결정 단위는 복수의 프레임들일 수 있으며, 픽처의 그룹(Group Of Picture; GOP)일 수 있다.Basic decision unit: The basic decision unit may represent a basic unit that is the target of processing in an embodiment. The basic decision unit may be a plurality of frames or a group of pictures (GOP).

픽처의 그룹(Group Of Picture; GOP): 실시예에서의 GOP의 의미는 비디오의 부호화에 있어서 일반적으로 사용되는 GOP의 의미와 동일할 수 있다. 말하자면, GOP의 시작 프레임은 아이(I) 프레임일 수 있으며, GOP의 시작 프레임 외 다른 프레임들은 I 프레임을 직접적으로 참조하는 프레임이거나 I 프레임을 간접적으로 참조하는 프레임일 수 있다.Group of Pictures (GOP): The meaning of GOP in the embodiment may be the same as the meaning of GOP generally used in video encoding. In other words, the start frame of the GOP may be an I frame, and frames other than the start frame of the GOP may be frames that directly reference the I frame or frames that indirectly reference the I frame.

프레임 율(frame rate): 프레임 율은 비디오의 시간적인 해상도를 나타낼 수 있다. 프레임 율은 1 초 당 몇 장의 화면(들)이 재생(display)되는지를 의미할 수 있다.Frame rate: Frame rate can indicate the temporal resolution of the video. Frame rate may mean how many screen(s) are displayed per second.

초 당 프레임(Frape Per Second; FPS): FPS는 프레임 율을 나타내는 단위일 수 있다. 예를 들면, "30 FPS"는 1 초 당 30장의 프레임들이 재생됨을 나타낼 수 있다.Frames Per Second (FPS): FPS may be a unit representing frame rate. For example, “30 FPS” may indicate that 30 frames are played per second.

고 프레임 율(High Frame Rate; HFR): 소정의 기준 값 이상의 프레임 율을 나타낼 수 있다. 예를 들면, 한국 및 미국의 기준으로는 60 이상의 FPS를 갖는 비디오가 HFR 비디오일 수 있고, 유럽의 기준으로는 50 이상의 FPS를 갖는 비디오가 HFR 비디오일 수 있다. 실시예들에서는 60 FPS의 비디오가 HFR 비디오로 예시된다.High Frame Rate (HFR): Can indicate a frame rate higher than a predetermined reference value. For example, according to Korean and American standards, a video with an FPS of 60 or more may be an HFR video, and according to European standards, a video with an FPS of 50 or more may be an HFR video. In embodiments, video at 60 FPS is illustrated as HFR video.

최적 프레임 율(Optimal Frame Rate): 최적 프레임 율은 비디오의 내용(content)에 따라 결정되는 프레임 율일 수 있다. 최적 프레임 율은 각 기본 결정 단위 별로 결정될 수 있다. 최적 프레임 율의 이상으로 비디오 또는 기본 결정 단위의 프레임 율이 증가되어도, 사람들은 프레임 율의 증가에 따른 화질 차이를 느끼지 못할 수 있다. 말하자면, 최적 프레임 율은 사람들이 HFR에 비해 비디오의 화질의 저하를 느끼지 못하게 하는 최소의 프레임 율일 수 있다. 또한, 최적 프리임 율은, HFR 비디오의 FPS를 단계적으로 감소시키면서 원래의 HFR 비디오 및 감소된 FPS의 비디오 간의 화질 차이를 측정할 때, 화질 저하가 발생하기 직전의 FPS 일 수 있다. 여기에서, 화질 저하의 발생은 사용자에 의해 설정된 비율 또는 기설정된 비율 이상의 사람들이 화질 차이를 인식하는 것을 의미할 수 있다.Optimal Frame Rate: The optimal frame rate may be a frame rate determined depending on the content of the video. The optimal frame rate can be determined for each basic decision unit. Even if the frame rate of the video or basic decision unit is increased beyond the optimal frame rate, people may not notice a difference in picture quality due to the increase in frame rate. In other words, the optimal frame rate may be the minimum frame rate at which people do not notice a decrease in video quality compared to HFR. Additionally, the optimal frame rate may be the FPS just before picture quality degradation occurs when measuring the picture quality difference between the original HFR video and the video of the reduced FPS while gradually reducing the FPS of the HFR video. Here, the occurrence of image quality degradation may mean that a difference in image quality is recognized by a percentage set by the user or by more than a preset percentage of people.

호모지니어스 비디오(homogenous video): 호모지니어스 비디오는 비디오를 구성하는 모든 기본 결정 단위들이 동일하거나 유사한 특성을 갖는 비디오일 수 있다.Homogeneous video: A homogeneous video may be a video in which all basic decision units that make up the video have the same or similar characteristics.

화질 저하 인지율: 화질 저하 인지율은 기본 결정 단위 별로 HFR 비디오의 프레임 율이 감소되는 경우에 화질 저하를 인지할 수 있는 사람의 비율을 나타낸다. 통상적으로, FPS는 1/2 배로 단계적으로 변경될 수 있다. 화질 저하 인지율의 단위는 %일 수 있다.Perception rate of image quality degradation: The perception rate of image quality decline represents the proportion of people who can recognize the deterioration in image quality when the frame rate of HFR video is reduced for each basic decision unit. Typically, FPS can be changed in steps of 1/2. The unit of recognition rate of image quality degradation may be %.

화질 저하 인지율 예측기: 화질 저하 인지율 예측기는 화질 저하 인지율을 예측하는 예측기일 수 있다.Image quality degradation recognition rate predictor: The image quality degradation recognition rate predictor may be a predictor that predicts the image quality degradation recognition rate.

화질 만족도(satisfied user ratio): 화질 만족도는 기본 결정 단위 별로 HFR 비디오의 FPS가 감소되는 경우에 HFR 비디오의 화질에 비해 감소된 프레임 율의 비디오의 화질에 만족하는 사람의 비율일 수 있다. 또는, 화질 만족도는 기본 결정 단위 별로 HFR 미디어의 FPS가 감소되는 경우에 HFR 비디오의 화질 및 감소된 FPS의 비디오의 화질 간의 차이를 인지하지 못하는 사람의 비율일 수 있다.Satisfied with picture quality (satisfied user ratio): Satisfaction with picture quality may be the ratio of people who are satisfied with the picture quality of the video with a reduced frame rate compared to the picture quality of the HFR video when the FPS of the HFR video is reduced for each basic decision unit. Alternatively, satisfaction with picture quality may be the ratio of people who do not recognize the difference between the picture quality of HFR video and the picture quality of video with reduced FPS when the FPS of HFR media is reduced for each basic decision unit.

통상적으로, FPS는 1/2 배로 단계적으로 변경될 수 있다. 여기에서, 사람의 비율은 1a/b의 값을 백분율로 표현한 것일 수 있다. a는 주관적 화질 평가를 통해 HFR 비디오의 화질 및 감소된 FPS 의 비디오의 화질 간에 차이가 있다고 평가한 사람들의 수일 수 있다. b는 전체의 사람들의 수일 수 있다. 예를 들면, 60 FPS의 비디오 및 30 FPS의 비디오에 대해, 모두 5명의 사람들이 주관적 화질 평가를 하였고, 5명 중 3명이 60 FPS의 비디오 및 30 FPS의 비디오에게 동일한 화질 점수를 부여하였고, 남은 2명은 60 FPS의 비디오에 비해 30 FPS의 비디오에게 더 낮은 화질 점수를 부여하였으면, 화질 만족도는 12/5 = 60%일 수 있다. 이 때, 화질 저하 인지율은 40%일 수 있다. 즉, 화질 만족도 및 화질 저하 인지율의 합은 100%일 수 있다.Typically, FPS can be changed in steps of 1/2. Here, the ratio of people may be expressed as a percentage of the value of 1a/b. a may be the number of people who evaluated that there was a difference between the image quality of the HFR video and the image quality of the reduced FPS video through subjective image quality evaluation. b may be the total number of people. For example, for the 60 FPS video and the 30 FPS video, a total of 5 people made subjective picture quality evaluations, and 3 out of 5 people gave the same picture quality score to the 60 FPS video and the 30 FPS video, and the remaining If two people gave a lower picture quality score to the 30 FPS video compared to the 60 FPS video, the satisfaction with picture quality could be 12/5 = 60%. At this time, the recognition rate of image quality deterioration may be 40%. In other words, the sum of image quality satisfaction and perceived image quality deterioration may be 100%.

화질 만족도 예측기(Satisfied User Ratio Predictor; SURP): SURP는 화질 만족도를 예측하는 예측기일 수 있다. 화질 저하 인지율 예측기 및 SURP의 구성 및 동작은 서로 유사할 수 있으며, 양자는 단순히 출력 값에 있어서만 차이룰 가질 수 있다.Satisfied User Ratio Predictor (SURP): SURP may be a predictor that predicts satisfaction with image quality. The configuration and operation of the picture quality degradation perception rate predictor and SURP may be similar to each other, and the two may simply differ in the output value.

선택 정보: 실시예에서, 화질 정보는 화질 저하 인지율 또는 화질 만족도일 수 있다. 용어 "선택 정보"는 용어 "인지 화질 정보"와 동일한 의미로 사용될 수 있고, 양 용어들은 서로 교체하여 사용될 수 있다.
Optional information: In an embodiment, the image quality information may be a perceived rate of image quality degradation or satisfaction with image quality. The term “selection information” may be used with the same meaning as the term “perceived image quality information,” and both terms may be used interchangeably.

부호화 장치 및 복호화 장치의 기본적인 동작Basic operations of encoding and decoding devices

도 1은 본 발명이 적용되는 부호화 장치의 일 실시예에 따른 구성을 나타내는 블록도이다.1 is a block diagram showing the configuration of an encoding device to which the present invention is applied according to an embodiment.

부호화 장치(100)는 비디오 부호화 장치 또는 영상 부호화 장치일 수 있다. 비디오는 하나 이상의 영상들을 포함할 수 있다. 부호화 장치(100)는 비디오의 하나 이상의 영상들을 시간에 따라 순차적으로 부호화할 수 있다.The encoding device 100 may be a video encoding device or an image encoding device. A video may contain one or more images. The encoding device 100 may sequentially encode one or more images of a video according to time.

도 1을 참조하면, 부호화 장치(100)는 인터 예측부(110), 인트라 예측부(120), 스위치(115), 감산기(125), 변환부(130), 양자화부(140), 엔트로피 부호화부(150), 역양자화부(160), 역변환부(170), 가산기(175), 필터부(180) 및 참조 픽쳐 버퍼(190)를 포함할 수 있다.Referring to FIG. 1, the encoding device 100 includes an inter prediction unit 110, an intra prediction unit 120, a switch 115, a subtractor 125, a transform unit 130, a quantization unit 140, and an entropy encoding unit. It may include a unit 150, an inverse quantization unit 160, an inverse transform unit 170, an adder 175, a filter unit 180, and a reference picture buffer 190.

부호화 장치(100)는 입력 영상에 대해 인트라 모드 및/또는 인터 모드로 부호화를 수행할 수 있다. 입력 영상은 현재 부호화의 대상인 현재 영상으로 칭해질 수 있다.The encoding device 100 may perform encoding on an input image in intra mode and/or inter mode. The input image may be referred to as the current image that is currently the target of encoding.

또한, 부호화 장치(100)는 입력 영상에 대한 부호화를 통해 부호화의 정보를 포함하는 비트스트림을 생성할 수 있고, 생성된 비트스트림을 출력할 수 있다.Additionally, the encoding device 100 can generate a bitstream including encoding information through encoding of an input image and output the generated bitstream.

인트라 모드가 사용되는 경우, 스위치(115)는 인트라로 전환될 수 있다. 인터 모드가 사용되는 경우, 스위치(115)는 인터로 전환될 수 있다.If intra mode is used, switch 115 can be toggled to intra. When inter mode is used, switch 115 can be switched to inter.

부호화 장치(100)는 입력 영상의 입력 블록에 대한 예측 블록을 생성할 수 있다. 또한, 부호화 장치(100)는 예측 블록이 생성된 후, 입력 블록 및 예측 블록의 차분(residual)을 부호화할 수 있다. 입력 블록은 현재 부호화의 대상인 현재 블록으로 칭해질 수 있다.The encoding device 100 may generate a prediction block for an input block of an input image. Additionally, the encoding device 100 may encode the residual of the input block and the prediction block after the prediction block is generated. The input block may be referred to as the current block that is the target of current encoding.

예측 모드가 인트라 모드인 경우, 인트라 예측부(120)는 현재 블록의 주변에 있는, 이미 부호화된 블록의 픽셀 값을 참조 픽셀로서 이용할 수 있다. 인트라 예측부(120)는 참조 픽셀을 이용하여 현재 블록에 대한 공간적 예측을 수행할 수 있고, 공간적 예측을 통해 현재 블록에 대한 예측 샘플들을 생성할 수 있다.When the prediction mode is intra mode, the intra prediction unit 120 may use the pixel value of an already encoded block surrounding the current block as a reference pixel. The intra prediction unit 120 may perform spatial prediction for the current block using a reference pixel and generate prediction samples for the current block through spatial prediction.

인터 예측부(110)는 움직임 예측부 및 움직임 보상부를 포함할 수 있다.The inter prediction unit 110 may include a motion prediction unit and a motion compensation unit.

예측 모드가 인터 모드인 경우, 움직임 예측부는, 움직임 예측 과정에서 참조 영상으로부터 현재 블록과 가장 매치가 잘 되는 영역을 검색할 수 있고, 현재 블록 및 검색된 영역에 대한 움직임 벡터를 도출할 수 있다. 참조 영상은 참조 픽쳐 버퍼(190)에 저장될 수 있으며, 참조 영상에 대한 부호화 및/또는 복호화가 처리될 때 참조 픽쳐 버퍼(190)에 저장될 수 있다.When the prediction mode is inter mode, the motion prediction unit can search for the region that best matches the current block from the reference image during the motion prediction process and can derive motion vectors for the current block and the searched region. The reference image may be stored in the reference picture buffer 190, and may be stored in the reference picture buffer 190 when encoding and/or decoding of the reference image is processed.

움직임 보상부는 움직임 벡터를 이용하는 움직임 보상을 수행함으로써 예측 블록을 생성할 수 있다. 여기에서, 움직임 벡터는 인터 예측에 사용되는 2차원 벡터일 수 있다. 또한 움직임 벡터는 현재 영상 및 참조 영상 간의 오프셋(offset)을 나타낼 수 있다.The motion compensation unit may generate a prediction block by performing motion compensation using a motion vector. Here, the motion vector may be a two-dimensional vector used for inter prediction. Additionally, the motion vector may indicate an offset between the current image and the reference image.

감산기(125)는 입력 블록 및 예측 블록의 차분인 잔차 블록(residual block)을 생성할 수 있다. 잔차 블록은 잔차 신호로 칭해질 수도 있다.The subtractor 125 may generate a residual block, which is the difference between the input block and the prediction block. A residual block may also be referred to as a residual signal.

변환부(130)는 잔차 블록에 대해 변환(transform)을 수행하여 변환 계수를 생성할 수 있고, 생성된 변환 계수(transform coefficient)를 출력할 수 있다. 여기서, 변환 계수는 잔차 블록에 대한 변환을 수행함으로써 생성된 계수 값일 수 있다. 변환 생략(transform skip) 모드가 적용되는 경우, 변환부(130)는 잔차 블록에 대한 변환을 생략할 수도 있다.The transform unit 130 may generate a transform coefficient by performing transformation on the residual block and output the generated transform coefficient. Here, the transformation coefficient may be a coefficient value generated by performing transformation on the residual block. When the transform skip mode is applied, the transform unit 130 may skip transforming the residual block.

변환 계수에 양자화를 적용함으로써 양자화된 변환 계수 레벨(transform coefficient level)이 생성될 수 있다. 이하, 실시예들에서는 양자화된 변환 계수 레벨도 변환 계수로 칭해질 수 있다.A quantized transform coefficient level can be generated by applying quantization to the transform coefficient. Hereinafter, in embodiments, the quantized transform coefficient level may also be referred to as a transform coefficient.

양자화부(140)는 변환 계수를 양자화 파라미터에 맞춰 양자화함으로써 양자화된 변환 계수 레벨(quantized transform coefficient level)을 생성할 수 있다. 양자화부(140)는 생성된 양자화된 변환 계수 레벨을 출력할 수 있다. 이때, 양자화부(140)에서는 양자화 행렬을 사용하여 변환 계수를 양자화할 수 있다.The quantization unit 140 may generate a quantized transform coefficient level by quantizing the transform coefficient according to the quantization parameter. The quantization unit 140 may output the generated quantized transform coefficient level. At this time, the quantization unit 140 may quantize the transform coefficient using a quantization matrix.

엔트로피 부호화부(150)는, 양자화부(140)에서 산출된 값들 및/또는 부호화 과정에서 산출된 부호화 파라미터 값들 등에 기초하여 확률 분포에 따른 엔트로피 부호화를 수행함으로써 비트스트림(bitstream)을 생성할 수 있다. 엔트로피 부호화부(150)는 생성된 비트스트림을 출력할 수 있다.The entropy encoding unit 150 may generate a bitstream by performing entropy encoding according to a probability distribution based on the values calculated by the quantization unit 140 and/or the encoding parameter values calculated during the encoding process. . The entropy encoding unit 150 may output the generated bitstream.

엔트로피 부호화부(150)는 영상의 픽셀의 정보 외에 영상의 복호화를 위한 정보에 대한 엔트로피 부호화를 수행할 수 있다. 예를 들면, 영상의 복호화를 위한 정보는 신택스 엘리먼트(syntax element) 등을 포함할 수 있다. The entropy encoding unit 150 may perform entropy encoding on information for decoding the image in addition to pixel information of the image. For example, information for decoding an image may include syntax elements, etc.

부호화 파라미터는 부호화 및/또는 복호화를 위해 요구되는 정보일 수 있다. 부호화 파라미터는 부호화 장치에서 부호화되어 복호화 장치로 전달되는 정보를 포함할 수 있고, 부호화 혹은 복호화 과정에서 유추될 수 있는 정보를 포함할 수 있다. 예를 들면, 복호화 장치로 전달되는 정보로서, 신택스 엘리먼트가 있다.Encoding parameters may be information required for encoding and/or decoding. The encoding parameter may include information that is encoded in the encoding device and transmitted to the decoding device, and may include information that can be inferred during the encoding or decoding process. For example, information transmitted to the decoding device includes a syntax element.

예를 들면, 부호화 파라미터는 예측 모드, 움직임 벡터, 참조 픽쳐 색인(index), 부호화 블록 패턴(pattern), 잔차 신호 유무, 변환 계수, 양자화된 변환 계수, 양자화 파라미터, 블록 크기, 블록 분할(partition) 정보 등의 값 또는 통계를 포함할 수 있다. 예측 모드는 인트라 예측 모드 또는 인터 예측 모드를 가리킬 수 있다.For example, coding parameters include prediction mode, motion vector, reference picture index, coding block pattern, presence or absence of residual signal, transform coefficient, quantized transform coefficient, quantization parameter, block size, block partition. It may contain values such as information or statistics. Prediction mode may refer to intra prediction mode or inter prediction mode.

잔차 신호는 원 신호 및 예측 신호 간의 차이(difference)를 의미할 수 있다. 또는, 잔차 신호는 원신호 및 예측 신호 간의 차이를 변환(transform)함으로써 생성된 신호일 수 있다. 또는, 잔차 신호는 원 신호 및 예측 신호 간의 차이를 변환 및 양자화함으로써 생성된 신호일 수 있다. 잔차 블록은 블록 단위의 잔차 신호일 수 있다.The residual signal may refer to the difference between the original signal and the predicted signal. Alternatively, the residual signal may be a signal generated by transforming the difference between the original signal and the predicted signal. Alternatively, the residual signal may be a signal generated by transforming and quantizing the difference between the original signal and the predicted signal. The residual block may be a residual signal in block units.

엔트로피 부호화가 적용되는 경우, 높은 발생 확률을 갖는 심볼에 적은 수의 비트가 할당될 수 있고, 낮은 발생 확률을 갖는 심볼에 많은 수의 비트가 할당될 수 있다. 이러한 할당을 통해 심볼이 표현됨에 따라, 부호화의 대상인 심볼들에 대한 비트열(bitstring)의 크기가 감소될 수 있다. 따라서, 엔트로피 부호화를 통해서 영상 부호화의 압축 성능이 향상될 수 있다. When entropy coding is applied, a small number of bits may be assigned to symbols with a high probability of occurrence, and a large number of bits may be assigned to symbols with a low probability of occurrence. As symbols are expressed through this allocation, the size of the bitstring for the symbols that are the target of encoding can be reduced. Therefore, the compression performance of video encoding can be improved through entropy coding.

또한, 엔트로피 부호화를 위해 지수 골롬(exponential golomb), 문맥적응형 가변 길이 코딩(ContextAdaptive Variable Length Coding; CAVLC) 및 문맥적응형 이진 산술 코딩(ContextAdaptive Binary Arithmetic Coding; CABAC) 등과 같은 부호화 방법이 사용될 수 있다. 예를 들면, 엔트로피 부호화부(150)는 가변 길이 부호화(Variable Lenghth Coding/Code; VLC) 테이블을 이용하여 엔트로피 부호화를 수행할 수 있다. 예를 들면, 엔트로피 부호화부(150)는 대상 심볼에 대한 이진화(binarization) 방법을 도출할 수 있다. 또한, 엔트로프 부호화부(150)는 대상 심볼/빈(bin)의 확률 모델(probability model)을 도출할 수 있다. 엔트로피 부호화부(150)는 도출된 이진화 방법 또는 확률 모델을 사용하여 엔트로피 부호화를 수행할 수도 있다.Additionally, for entropy coding, coding methods such as exponential golomb, ContextAdaptive Variable Length Coding (CAVLC), and ContextAdaptive Binary Arithmetic Coding (CABAC) can be used. . For example, the entropy encoding unit 150 may perform entropy encoding using a Variable Lenghth Coding/Code (VLC) table. For example, the entropy encoding unit 150 may derive a binarization method for the target symbol. Additionally, the entrope encoder 150 can derive a probability model of the target symbol/bin. The entropy encoding unit 150 may perform entropy encoding using a derived binarization method or a probability model.

부호화 장치(100)에 의해 인터 예측을 통한 부호화를 수행되기 때문에, 부호화된 현재 영상은 이후에 처리되는 다른 영상(들)에 대하여 참조 영상으로서 사용될 수 있다. 따라서, 부호화 장치(100)는 부호화된 현재 영상을 다시 복호화할 수 있고, 복호화된 영상을 참조 영상으로서 저장할 수 있다. 복호화를 위해 부호화된 현재 영상에 대한 역양자화 및 역변환이 처리될 수 있다.Since encoding through inter prediction is performed by the encoding device 100, the encoded current image can be used as a reference image for other image(s) to be processed later. Accordingly, the encoding device 100 can decode the current encoded image again and store the decoded image as a reference image. For decoding, inverse quantization and inverse transformation of the encoded current image may be processed.

양자화된 계수는 역양자화부(160)에서 역양자화될(inversely quantized) 수 있고. 역변환부(170)에서 역변환될(inversely transformed) 수 있다. 역양자화 및 역변환된 계수는 가산기(175)를 통해 예측 블록과 합해질 수 있다, 역양자화 및 역변환된 계수 및 예측 블록을 합함으로써 복원(reconstructed) 블록이 생성될 수 있다.The quantized coefficient may be inversely quantized in the inverse quantization unit 160. It may be inversely transformed in the inverse transformation unit 170. The inverse-quantized and inverse-transformed coefficients can be combined with the prediction block through the adder 175. A reconstructed block can be generated by summing the inverse-quantized and inverse-transformed coefficients and the prediction block.

복원 블록은 필터부(180)를 거칠 수 있다. 필터부(180)는 디블록킹 필터(deblocking filter), 에스에이오(Sample Adaptive Offset; SAO), 에이엘어프(Adaptive Loop Filter; ALF) 중 적어도 하나 이상을 복원 블록 또는 복원 픽쳐에 적용할 수 있다. 필터부(180)는 적응적(adaptive) 인루프(inloop) 필터로 칭해질 수도 있다.The restored block may pass through the filter unit 180. The filter unit 180 may apply at least one of a deblocking filter, Sample Adaptive Offset (SAO), and Adaptive Loop Filter (ALF) to the restored block or restored picture. The filter unit 180 may also be referred to as an adaptive inloop filter.

디블록킹 필터는 블록들 간의 경계에서 발생한 블록 왜곡을 제거할 수 있다. SAO는 코딩 에러에 대한 보상을 위해 픽셀 값에 적정 오프셋(offset) 값을 더할 수 있다. ALF는 복원 영상 및 원래의 영상을 비교한 값에 기반하여 필터링을 수행할 수 있다. 필터부(180)를 거친 복원 블록은 참조 픽쳐 버퍼(190)에 저장될 수 있다.
The deblocking filter can remove block distortion occurring at the boundaries between blocks. SAO can add an appropriate offset value to the pixel value to compensate for coding errors. ALF can perform filtering based on a comparison value between the restored image and the original image. The restored block that has passed through the filter unit 180 may be stored in the reference picture buffer 190.

도 2는 본 발명이 적용되는 복호화 장치의 일 실시예에 따른 구성을 나타내는 블록도이다.Figure 2 is a block diagram showing the configuration of a decoding device according to an embodiment to which the present invention is applied.

복호화 장치(200)는 비디오 복호화 장치 또는 영상 복호화 장치일 수 있다.The decoding device 200 may be a video decoding device or an image decoding device.

도 2를 참조하면, 복호화 장치(200)는 엔트로피 복호화부(210), 역양자화부(220), 역변환부(230), 인트라 예측부(240), 인터 예측부(250), 가산기(255), 필터부(260) 및 참조 픽쳐 버퍼(270)를 포함할 수 있다.Referring to FIG. 2, the decoding device 200 includes an entropy decoding unit 210, an inverse quantization unit 220, an inverse transform unit 230, an intra prediction unit 240, an inter prediction unit 250, and an adder 255. , may include a filter unit 260 and a reference picture buffer 270.

복호화 장치(200)는 부호화 장치(100)에서 출력된 비트스트림을 수신할 수 있다. 복호화 장치(200)는 비트스트림에 대하여 인트라 모드 및/또는 인터 모드의 복호화를 수행할 수 있다. 또한, 복호화 장치(200)는 복호화를 통해 복원 영상을 생성할 수 있고, 생성된 복원 영상을 출력할 수 있다.The decoding device 200 may receive the bitstream output from the encoding device 100. The decoding device 200 may perform intra-mode and/or inter-mode decoding on the bitstream. Additionally, the decoding device 200 can generate a restored image through decoding and output the generated restored image.

예를 들면, 복호화에 사용되는 예측 모드에 따른 인트라 모드 또는 인터 모드로의 전환은 스위치에 의해 이루어질 수 있다. 복호화에 사용되는 예측 모드가 인트라 모드인 경우 스위치가 인트라로 전환될 수 있다. 복호화에 사용되는 예측 모드가 인터 모드인 경우 스위치가 인터로 전환될 수 있다.For example, switching to intra mode or inter mode according to the prediction mode used for decoding can be accomplished by a switch. If the prediction mode used for decoding is intra mode, the switch may be switched to intra. If the prediction mode used for decoding is inter mode, the switch may be switched to inter.

복호화 장치(200)는 입력된 비트스트림으로부터 복원된 잔차 블록(reconstructed residual block)을 획득할 수 있고, 예측 블록을 생성할 수 있다. 복원된 잔차 블록 및 예측 블록이 획득되면, 복호화 장치(200)는 복원된 잔차 블록과 및 예측 블록을 더함으로써 복원 블록을 생성할 수 있다.The decoding device 200 can obtain a reconstructed residual block from the input bitstream and generate a prediction block. Once the restored residual block and the prediction block are obtained, the decoding device 200 may generate a restored block by adding the restored residual block and the prediction block.

엔트로피 복호화부(210)는 확률 분포에 기초하여 비트스트림에 대한 엔트로피 복호화를 수행함으로써 심볼들을 생성할 수 있다. 생성된 심볼들은 양자화된 계수(quantized coefficient) 형태의 심볼을 포함할 수 있다. 여기에서, 엔트로피 복호화 방법은 상술된 엔트로피 부호화 방법과 유사할 수 있다. 예를 들면, 엔트로피 복호화 방법은 상술된 엔트로피 부호화 방법의 역과정일 수 있다.The entropy decoding unit 210 may generate symbols by performing entropy decoding on the bitstream based on a probability distribution. Generated symbols may include symbols in the form of quantized coefficients. Here, the entropy decoding method may be similar to the entropy encoding method described above. For example, the entropy decoding method may be the reverse process of the entropy encoding method described above.

양자화된 계수는 역양자화부(220)에서 역양자화될 수 있다. 또한, 역양자화된 계수는 역변환부(230)에서 역변환될 수 있다. 양자화된 계수가 역양자화 및 역변환 된 결과로서, 복원된 잔차 블록이 생성될 수 있다. 이때, 역양자화부(220)는 양자화된 계수에 양자화 행렬을 적용할 수 있다.The quantized coefficient may be inverse quantized in the inverse quantization unit 220. Additionally, the inverse quantized coefficient may be inversely transformed in the inverse transform unit 230. As a result of inverse quantization and inverse transformation of the quantized coefficients, a restored residual block may be generated. At this time, the inverse quantization unit 220 may apply a quantization matrix to the quantized coefficient.

인트라 모드가 사용되는 경우, 인트라 예측부(240)는 현재 블록 주변의 이미 부호화된 블록의 픽셀 값을 이용하는 공간적 예측을 수행함으로써 예측 블록을 생성할 수 있다.When the intra mode is used, the intra prediction unit 240 may generate a prediction block by performing spatial prediction using pixel values of already encoded blocks surrounding the current block.

인터 예측부(250)는 움직임 보상부를 포함할 수 있다. 인터 모드가 사용되는 경우, 움직임 보상부는 움직임 벡터 및 참조 영상을 이용하는 움직임 보상을 수행함으로써 예측 블록을 생성할 수 있다. 참조 영상은 참조 픽쳐 버퍼(270)에 저장될 수 있다.The inter prediction unit 250 may include a motion compensation unit. When inter mode is used, the motion compensation unit can generate a prediction block by performing motion compensation using a motion vector and a reference image. The reference image may be stored in the reference picture buffer 270.

복원된 잔차 블록 및 예측 블록은 가산기(255)를 통해 더해질 수 있다. 가산기(255)는 복원된 잔차 블록 및 예측 블록을 더함으로써 복원 블록을 생성할 수 있다.The restored residual block and prediction block can be added through an adder 255. The adder 255 may generate a restored block by adding the restored residual block and the prediction block.

복원 블록은 필터부(260)를 거칠 수 있다. 필터부(260)는 디블록킹 필터, SAO 및 ALF 중 적어도 하나 이상을 복원 블록 또는 복원 픽쳐에 적용할 수 있다. 필터부(260)는 복원 영상을 출력할 수 있다. 복원 영상은 참조 픽쳐 버퍼(270)에 저장되어 인터 예측에 사용될 수 있다.
The restored block may pass through the filter unit 260. The filter unit 260 may apply at least one of a deblocking filter, SAO, and ALF to a reconstructed block or a reconstructed picture. The filter unit 260 may output a restored image. The reconstructed image can be stored in the reference picture buffer 270 and used for inter prediction.

화질 만족도 예측기(Satisfied User Ratio Predictor; SURP)Satisfied User Ratio Predictor (SURP)

도 3은 일 실시예에 따른 SURP의 동작을 설명한다.Figure 3 explains the operation of SURP according to one embodiment.

SURP(300)의 입력은 기본 결정 단위일 수 있다. 도 3에서 도시된 것과 같이, 기본 결정 단위는 GOP일 수 있다. SURP(300)의 출력은 변경된 FPS의 GOP에 대한 예측 화질 만족도일 수 있다.The input of SURP 300 may be the basic decision unit. As shown in FIG. 3, the basic decision unit may be GOP. The output of SURP 300 may be the predicted picture quality satisfaction for the GOP of the changed FPS.

예를 들면, 입력되는 GOP는 60 FPS의 8 개의 프레임들을 갖는 GOP일 수 있다.For example, the input GOP may be a GOP with 8 frames at 60 FPS.

예측 화질 만족도를 계산하기 위해, SURP(300)는 GOP의 FPS를 변경할 수 있다. 예를 들면, SURP(300)는 X FPS의 GOP를 Y FPS의 GOP로 변경할 수 있다. 예를 들면, Y는 aX이며, a는 1/2일 수 있다. 여기에서, "1/2"는 단지 예시적인 값으로, a는 1/4, 1/8 및 1/16 등과 같이 1/2ⁿ의 값을 가질 수 있다. n은 1 이상의 정수일 수 있다.To calculate the predicted picture quality satisfaction, SURP 300 may change the FPS of the GOP. For example, SURP 300 may change the GOP of X FPS to the GOP of Y FPS. For example, Y may be aX, and a may be 1/2. Here, “1/2” is just an exemplary value, and a may have a value of 1/2 ⁿ , such as 1/4, 1/8, and 1/16. n may be an integer of 1 or more.

X는 변경 전의 GOP의 FPS를 나타낼 수 있다. Y는 변경 후의 GOP의 FPS를 나타낼 수 있다. X=2ⁿ일 때, Y는 X/2, X/2², ... X/2ⁿ 중의 하나의 값일 수 있다. 예를 들면, X의 값이 60일 때, Y의 값은 30일 수 있다. 즉, GOP는 60 FPS에서 30 FPS로 변환될 수 있다.X may represent the FPS of the GOP before change. Y may represent the FPS of the GOP after change. When X=2 ⁿ , Y can be one of X/2, X/2 ² , ... X/2 ⁿ . For example, when the value of X is 60, the value of Y may be 30. That is, GOP can be converted from 60 FPS to 30 FPS.

예측 화질 만족도는, GOP의 FPS가 60에서 30으로 변경되었다는 가정 하에, 변경 전의 GOP의 화질 및 변경 후의 GOP의 화질 간의 화질 차이를 인지하지 못하는 사람들의 비율에 대한 예측 값일 수 있다. 또는, 예측 화질 만족도는, GOP의 FPS가 60에서 30으로 변경되었다는 가정 하에, 변경 전의 GOP의 화질에 비해 변경 후의 GOP의 화질에도 만족하는 사람들의 비율에 대한 예측 값일 수 있다.Predicted image quality satisfaction may be a predicted value for the proportion of people who do not recognize the difference in image quality between the image quality of the GOP before the change and the image quality of the GOP after the change, assuming that the FPS of the GOP has been changed from 60 to 30. Alternatively, the predicted picture quality satisfaction may be a predicted value for the ratio of people who are satisfied with the picture quality of the GOP after the change compared to the picture quality of the GOP before the change, under the assumption that the FPS of the GOP has been changed from 60 to 30.

SURP(300)는 기계 학습을 통해 예측 화질 만족도를 생성할 수 있다.SURP (300) can generate predicted image quality satisfaction through machine learning.

이하에서, FPS의 변경 전의 GOP를 소스(source) GOP로 칭할 수 있다. 또한, FPS의 변경 후의 GOP를 타겟(target) GOP로 칭할 수 있다. 또한, FPS의 변경 전의 비디오를 소스(source) 비디오로 칭할 수 있다. 또한, FPS의 변경 후의 비디오를 타겟(target) 비디오로 칭할 수 있다.
Hereinafter, the GOP before the FPS change may be referred to as the source GOP. Additionally, the GOP after changing the FPS may be referred to as a target GOP. Additionally, the video before the FPS change may be referred to as the source video. Additionally, the video after changing the FPS may be referred to as a target video.

기계 학습을 통한 예측 화질 만족도의 생성에 대한 고려 사항.Considerations for generating predicted image quality satisfaction through machine learning.

실시예에서, SURP(300)는 GOP 별로 FPS를 변경할 때, FPS의 변경에 의해 얼만큼의 화질 저하가 발생하는 가를 측정하기 위해 기계 학습(machine learning)을 사용할 수 있다.In an embodiment, the SURP 300 may use machine learning to measure how much image quality is degraded due to the change in FPS when changing the FPS for each GOP.

기계 학습을 위해서는, 훈련 데이터(training data가 요구된다. 말하자면, 기계 학습을 위한 그라운드 트루스(ground truth)가 요구될 수 있다. 실시예에서는, 인간의 인지 화질을 유지하는 것이 목적 중의 하나이므로, 인간의 인지 화질에 대한 데이터를 획득하기 위해 주관적 화질 평가 실험을 통해서 데이터를 획득하는 방법이 이용될 수 있다.For machine learning, training data is required. In other words, ground truth for machine learning may be required. In the embodiment, one of the purposes is to maintain human perception image quality, so the human To obtain data on perceived image quality, a method of obtaining data through a subjective image quality evaluation experiment can be used.

여기에서, 주관적 화질 평가를 통해 인지 화질에 대한 데이터를 획득하는 것은 단지 일 실시예일 수 있다. 주관적 화질 평가를 통해 인지 화질을 직접 측정하는 대신, SURP(300)는 인지 화질을 모델링한 메져(measure)를 사용하여 인지 화질을 측정할 수 있다. 예를 들면, 메져는 구조적 유사도(Structural SIMilarity; SSIM) 또는 비디오 품질 메트릭(Video Quality Metric; VQM) 등일 수 있다.Here, acquiring data on perceived image quality through subjective image quality evaluation may be just an example. Instead of directly measuring perceived image quality through subjective image quality evaluation, SURP 300 can measure perceived image quality using a measure that models perceived image quality. For example, the measure may be Structural SIMilarity (SSIM) or Video Quality Metric (VQM).

통상적으로, 주관적 화질로서 최종적으로 획득되는 인지 화질의 값로서, 주로 평균 의견 점수(Mean Opinion Score; MOS) 값이 사용된다. MOS를 실시예의 주관적 화질 평가에 적용함에 있어서, 아래의 1) 내지 3)과 같은 문제가 발생할 수 있다.Typically, the Mean Opinion Score (MOS) value is mainly used as the value of perceived image quality that is finally obtained as subjective image quality. When applying MOS to the subjective image quality evaluation of the embodiment, problems such as 1) to 3) below may occur.

1) 첫 번째로, 종래의 사용자가 점수를 부여하는 방식에서는, 사용자에 따라서 점수의 기준이 서로 간에 상이할 수 있다. 또한, 심지어, 점수를 부여하는 평가 과정에서, 동일한 비디오에 대해 동일인이 서로 다른 점수들을 부여하는 경우도 종종 발생할 수 있다.1) First, in the conventional method in which users assign scores, the standards for scores may be different depending on the user. Additionally, in the evaluation process of assigning scores, it may sometimes occur that the same person gives different scores to the same video.

2) 두 번째로, 비디오들의 MOS 값들은 서로 상이하기 때문에, 서로 상이한 MOS 값들이 기계 학습을 위해 직접적으로 이용되기 어렵다는 문제가 발생할 수 있다. 이러한 문제에 대한 해결을 위해, SURP(300)는, MOS 대신, 상대적인 인지 화질 정보를 사용할 수 있다. 예를 들면, 상대적인 인지 화질 정보로서 MOS 차이(Difference MOS; DMOS)가 있다. 만약, 소스 비디오의 MOS 값 및 타겟 비디오의 MOS 값 간의 차이가 주관적 화질 평가에 이용된다면, 이러한 주관적 화질 평가의 결과가 기계 학습에 활용될 수 있다. 이러한 주관적 화질 평가를 위해서는 자격 연속 품질 척도(DoubleStimulus Continuous QualityScale; DSCQS) 및 이중 자격 열화 척도(Double Stimulus Impairment Scale; DSIS) 등과 같은 모든 평가 항목에서 소스 비디오에 대한 화질 평가가 요구될 수 있다. 따라서, 이러한 화질 평가에 의해 화질 평가의 실험의 시간이 증가하는 문제가 발생할 수 있다. 2) Second, because the MOS values of the videos are different from each other, a problem may arise that it is difficult to use the different MOS values directly for machine learning. To solve this problem, SURP 300 can use relative perceived image quality information instead of MOS. For example, there is Difference MOS (DMOS) as relative perceived image quality information. If the difference between the MOS value of the source video and the MOS value of the target video is used for subjective picture quality evaluation, the results of this subjective picture quality evaluation can be used for machine learning. Such subjective picture quality evaluation may require picture quality evaluation of the source video in all evaluation items such as DoubleStimulus Continuous QualityScale (DSCQS) and Double Stimulus Impairment Scale (DSIS). Therefore, this image quality evaluation may cause the problem of increasing the time for the image quality evaluation experiment.

3) 세 번째로, 종래의 화질 측정 방법은 각 비디오 시퀀스 별로 8초 이상의 측정이 이루어지도록 권고한다. 즉, 화질 측정의 결과는 시퀀스 레벨의 결과라고 볼 수 있다. 기본 결정 단위에 대한 판별을 위한 기계 학습이 이러한 결과를 이용하여 이루어진다면, 좋은 성능이 획득되지 못할 수 있다. 3) Third, the conventional image quality measurement method recommends that measurements be made for more than 8 seconds for each video sequence. In other words, the results of picture quality measurement can be viewed as sequence level results. If machine learning to determine the basic decision unit is performed using these results, good performance may not be obtained.

따라서, 일 실시예에서는, 기계 학습에 적합한 그라운드 트루스를 획득하기 위한 주관적 화질 평가 방볍 및 화질 평가에 사용되는 비디오 시퀀스의 조건이 제안된다.
Accordingly, in one embodiment, a subjective image quality evaluation method for obtaining ground truth suitable for machine learning and conditions of a video sequence used for image quality evaluation are proposed.

기계 학습machine learning

SURP(300)는 기본 결정 단위 별로 최적의 FPS를 결정할 수 있다. 기본 결정 단위 별로 FPS가 결정되기 위해서는, 이러한 결정을 위한 그라운드 트루스의 데이터가 획득되어야 할 수 있다.SURP (300) can determine the optimal FPS for each basic decision unit. In order to determine the FPS for each basic decision unit, ground truth data for this decision may need to be obtained.

이하의 설명에서는, 편의상 GOP가 기본 결정 단위인 것으로 간주되었으며, GOP의 크기는 8인 것으로 간주되었다. 말하자면, 하나의 GOP는 8개의 프레임들을 포함할 수 있다. 이러한 기본 결정 단위 및 GOP의 크기에 대한 가정은 단지 예시적인 것이다.In the following description, for convenience, the GOP has been considered to be the basic decision unit, and the size of the GOP has been assumed to be 8. That is, one GOP may contain 8 frames. These assumptions about the size of the basic decision unit and GOP are illustrative only.

그라운드 트루스의 획득 및 주관적 화질 평가 등에 관련된 아래의 설명은 다른 종류의 기본 결정 단위 및 다른 크기의 기본 결정 단위에도 동일 또는 유사한 방식으로 적용될 수 있다.The following explanations related to acquisition of ground truth and subjective image quality evaluation can be applied in the same or similar manner to other types of basic crystal units and basic crystal units of different sizes.

주관적 화질 평가는 시퀀스 레벨에서만 평가를 수행할 수 있다. 반면, 기계 학습을 위해서는 GOP의 단위에 대한 평가 결과가 요구된다. 이러한 차이를 극복하기 위해서, 비디오 시퀀스의 모든 GOP들이 동일하거나 유사한 내용적(content) 특성 만을 포함하도록 하기 위해 화질 평가에 사용되는 비디오 시퀀스의 길이가 조정될 수 있다. 말하자면, 비디오의 모든 GOP들의 특성들은 서로 유사할 수 있다. 실시예에서, 비디오의 모든 GOP들의 특성들이 서로 유사한 비디오를 호모지니어스 비디오라고 칭할 수 있다.Subjective image quality evaluation can only be performed at the sequence level. On the other hand, for machine learning, evaluation results for GOP units are required. To overcome these differences, the length of the video sequence used for picture quality evaluation can be adjusted to ensure that all GOPs in the video sequence contain only the same or similar content characteristics. That is to say, the characteristics of all GOPs in a video may be similar to each other. In an embodiment, a video in which the characteristics of all GOPs of a video are similar to each other may be referred to as a homogeneous video.

호모지니어스 비디오의 길이는 기정의된 값으로 제한될 수 있다. 예를 들면, 호모지니어스 비디오의 길이는 5초가 초과하지 않도록 조정될 수 있다.The length of a homogeneous video may be limited to a predefined value. For example, the length of a homogeneous video can be adjusted to not exceed 5 seconds.

이러한 유사한 특성들의 GOP들로 구성된 호모지니어스 비디오가 사용되기 때문에, 시퀀스 레벨에서 획득된 화질 평가 결과가 GOP의 단위의 화질 정보로 사용되는 것이 가능할 수 있다. 또한, SURP(300)는 이러한 화질 정보를 사용하는 기계 학습을 통해 SURP(300)의 성능을 향상시킬 수 있다.
Since homogenous video composed of GOPs with similar characteristics is used, it may be possible for the image quality evaluation results obtained at the sequence level to be used as image quality information in units of GOPs. Additionally, SURP (300) can improve its performance through machine learning using this image quality information.

도 4는 일 예에 따른 화질 평가 결과를 도시한다.Figure 4 shows image quality evaluation results according to an example.

주관적 화질 평가의 일 예로, 소스 비디오는 60 FPS의 HFR 비디오일 수 있고, 에이치디(HD) 해상도의 비디오일 수 있다. 타겟 비디오는 30 FPS의 비디오일 수 있다.As an example of subjective image quality evaluation, the source video may be an HFR video at 60 FPS and may be a video with HD resolution. The target video may be a video at 30 FPS.

SURP(300)는 소스 비디오의 시간적 해상도를 1/n로 낮춰서 타겟 비디오를 생성할 수 있다. n은 2일 수 있다. 또는, n은 2^m일 수 있고, m은 1 이상의 정수일 수 있다.SURP (300) can generate a target video by lowering the temporal resolution of the source video to 1/n. n may be 2. Alternatively, n may be 2 ^m , and m may be an integer of 1 or more.

SURP(300)는 소스 비디오에 프레임 드롭(frame drop)을 적용함으로써 타겟 비디오를 생성할 수 있다. 프레임 드롭된 소스 비디오에서 기정의된 프레임을 제거하는 것일 수 있다. 예를 들면, 프레임 드롭은 소스 비디오의 프레임들 중 짝수 번째의 프레임(들)을 제거하는 것일 수 있다.SURP 300 can generate a target video by applying frame drop to the source video. This may be removing a predefined frame from the frame-dropped source video. For example, frame dropping may be removing even-numbered frame(s) among the frames of the source video.

주관적 화질 평가 방법으로서, 페어와이즈 비교(pairwise comparison) 방식이 사용될 수 있다. As a subjective image quality evaluation method, a pairwise comparison method can be used.

페어와이즈 비교 방식은 각 평가 문항 별로, 2 개의 비디오들을 평가자에게 보여주고, 2 개의 비디오들 중 어떤 비디오의 화질이 더 좋은가를 질의하는 방법일 수 있다. 평가자는 각 평가 문항 별로 2개의 비디오들 중에 더 좋은 화질의 비디오를 선택할 수 있다. 또는, 페어와이즈 비교 방식은 각 평가 문항 별로, 2 개의 비디오들 중 제1 비디오의 화질이 더 좋은지, 제2 비디오의 화질이 더 좋은지, 아니면 2개의 비디오들의 화질들이 같은 지를 질의할 수도 있다. 말하자면, 평가자는 각 평가 문항 별로 3개의 항목들 중 하나를 선택할 수 있다. 실시예에서는, 3개의 항목들을 사용하는 방식이 예시되었다.The pairwise comparison method may be a method of showing two videos to the evaluator for each evaluation question and asking which of the two videos has better picture quality. The evaluator can select the video with better quality among the two videos for each evaluation question. Alternatively, the pairwise comparison method may query for each evaluation question whether the first video has better picture quality, the second video has better picture quality, or the picture quality of the two videos is the same. That is, the evaluator can select one of three items for each evaluation question. In the example, a method using three items was illustrated.

평가 문항 별로 3개의 항목들 중에서 하나가 선택되는 방식은, 1) 2개의 비디오들의 화질들이 유사하고, 2) 상대적으로 적은 평가자들이 실험에 참여한 경우에서, 보다 정확한 화질 평가 결과를 도출할 수 있다.The method of selecting one of three items for each evaluation question can lead to more accurate picture quality evaluation results when 1) the picture quality of the two videos is similar and 2) a relatively small number of evaluators participate in the experiment.

화질 평가에 있어서, 각 평가 문항 별로 고 FPS의 비디오 및 저 FPS의 비디오의 보여지는 순서는 랜덤(random)하게 조정될 수 있다. 이러한 랜덤한 조정은 평가자가 2개의 비디오들 중 어떤 것이 더 큰 FPS의 비디오인 가에 대한 선입견을 가지지 못하게 할 할 수 있다.In evaluating picture quality, the order in which high FPS videos and low FPS videos are shown for each evaluation question can be randomly adjusted. This random adjustment may prevent the evaluator from having a preconceived idea about which of the two videos is the video with the higher FPS.

또한, 비디오 시퀀스의 길이가 소정의 기준 값 이하인 경우, 각 평가 문항 별로 2개의 비디오들이 2 회씩 또는 2 회 이상씩 평가자에게 보여질 수 있다. 예를 들면, 비디오 A 및 비디오 B는 A, B, A, B의 순서로 평가자에게 보여질 수 있다.Additionally, if the length of the video sequence is less than a predetermined standard value, two videos for each evaluation question may be shown to the evaluator twice or more than twice. For example, video A and video B may be shown to the evaluator in the following order: A, B, A, B.

도 4에서, x 축 상의 바(bar)는 화질 평가의 대상인 비디오를 나타낸다. x 축 밑의 문자열은 비디오의 명칭을 나타낸다. y 축은 3개의 항목들의 평가자에게 선택된 비율들을 나타낸다. 3개의 항목들은 "고 화질(즉, 60 FPS)의 비디오의 화질이 더 좋다", "양 비디오들의 화질들이 같다" 및 "저 화질(즉, 30 FPS)의 비디오의 화질이 더 좋다"이다. 각 항목의 선택된 비율은, 전체 평가자들 중 각 항목을 선택한 평가자의 비율을 나타낼 수 있다. 비율의 단위는 퍼센트(%)일 수 있다.In Figure 4, a bar on the x-axis represents a video that is the subject of image quality evaluation. The string below the x-axis represents the name of the video. The y-axis represents the percentages selected by the raters of the three items. The three items are “the quality of the video with high quality (i.e., 60 FPS) is better,” “the quality of both videos is the same,” and “the quality of the video with low quality (i.e., 30 FPS) is better.” The selected ratio of each item may represent the ratio of evaluators who selected each item among all evaluators. The unit of ratio may be percent (%).

항목들 중 "양 비디오들의 화질들이 같다" 및 "저 화질 비디오의 화질이 더 좋다"가 선택된 것은, 비디오의 변환 후에도 화질이 유지된 것으로 간주될 수 있다. 말하자면, 항목들 중 "양 비디오들의 화질들이 같다" 및 "저 화질 비디오의 화질이 더 좋다"가 선택된 것은, 변환된 비디오의 화질 또한 평가자를 만족시킨다는 것을 의미할 수 있다. 말하자면, 평가자들 중 "고 화질 비디오의 화질이 더 좋다"의 항목을 선택한 평가자만이 2개의 비디오들의 화질들 간의 차이를 느꼈다고 간주될 수 있다.Among the items, if “the picture quality of both videos is the same” and “the picture quality of the low-quality video is better” is selected, it can be considered that the picture quality is maintained even after conversion of the video. In other words, selecting “the quality of both videos is the same” and “the quality of the low-quality video is better” among the items may mean that the quality of the converted video also satisfies the evaluator. In other words, among the evaluators, only the one who selected the item “the picture quality of the high-definition video is better” can be considered to have felt the difference between the picture qualities of the two videos.

따라서, "양 비디오들의 화질들이 같다"가 선택된 비율 및 "저 화질 비디오의 화질이 더 좋다"가 선택된 비율의 합이 2개의 비디오들의 화질들 간의 차이를 인식하지 못하는 사람의 비율을 나타내는 "화질 만족도"로서 사용될 수 있다. "화질 만족도"의 단위는 퍼센트(%)일 수 있다.Therefore, the sum of the proportion selecting “the quality of both videos is the same” and the proportion selecting “the quality of the lower quality video is better” is the “picture quality satisfaction” which represents the proportion of people who do not recognize the difference between the quality of the two videos. "It can be used as. The unit of “picture quality satisfaction” may be percentage (%).

도 4에서는, 17 개의 비디오 시퀀스들이 화질 차이를 인식하지 못하는 사람의 비율이 높은 순으로 정렬되었다. 도 4에 따르면, 7 개의 비디오 시퀀스에 대해서는 참가자의 75% 이상의 사람들이 고 화질 비디오 및 저 화질 비디오의 화질들 간의 차이를 느끼지 못한다. In Figure 4, 17 video sequences are sorted in descending order of the percentage of people who do not recognize differences in picture quality. According to Figure 4, for the 7 video sequences, more than 75% of the participants did not notice the difference between the quality of the high-definition video and the low-definition video.

전술된 것과 유사하게, HFR의 소스 비디오로부터 다양한 타겟 비디오들이 생성될 수 있다. 예를 들면, 타겟 비디오들의 FPS는 소스 비디오의 FPS의 1/4 및 1/8 등일 수 있다. 이러한 타겟 비디오들의 각각 및 소스 비디오에 대해서도 화질 만족도가 측정될 수 있다.Similar to what was described above, various target videos can be generated from the HFR's source video. For example, the FPS of the target videos may be 1/4 and 1/8 of the FPS of the source video, etc. Satisfaction with picture quality can be measured for each of these target videos and also for the source video.

이러한 주관적 화질 평가의 결과는 기계 학습을 위한 그라운드 트루스로서 이용될 수 있다. 그라운드 트루스의 데이터의 품질은 평가자의 수가 증가할수록 향상될 수 있다. 또한, 그라운드 트루스의 데이터의 품질은 평가의 대상인 비디오 시퀀스의 개수가 증가할수록 향상될 수 있다.The results of this subjective image quality evaluation can be used as ground truth for machine learning. The quality of Ground Truth's data can improve as the number of evaluators increases. Additionally, the quality of ground truth data can be improved as the number of video sequences subject to evaluation increases.

페어와이즈 비교가 아닌 DSCQS가 주관적 화질 평가를 위해 사용된 경우, 1) DSCQS의 "고 화질 비디오의 오피니언 점수(opinion score)가 저 화질 비디오의 오피니언 점수보다 더 높은 경우"는 페어와이즈 비교의 "고 화질 비디오의 화질이 더 좋다"에 대응할 수 있다. 2) DSCQS의 "고 화질 비디오의 오피니언 점수(opinion score)가 저 화질 비디오의 오피니언 점수가 같은 경우"는 페어와이즈 비교의 "양 비디오들의 화질들이 같다"에 대응할 수 있다. 3) DSCQS의 "고 화질 비디오의 오피니언 점수(opinion score)가 저 화질 비디오의 오피니언 점수보다 더 낮은 경우"는 페어와이즈 비교의 "저 화질 비디오의 화질이 더 좋다"에 대응할 수 있다. 이러한 대응의 관계를 통해, DSCQS가 사용된 경우에도 전술된 페어와이즈 비교가 사용된 경우와 동일한 방식으로 화질 만족도가 측정될 수 있다.
When DSCQS, rather than pairwise comparison, is used for subjective image quality evaluation: 1) DSCQS's "opinion score of high quality video is higher than that of low quality video" means "high quality video" of pairwise comparison; You can respond to “The video quality is better.” 2) DSCQS's "if the opinion score of the high-quality video is the same as that of the low-quality video" can correspond to "the quality of both videos is the same" in the pairwise comparison. 3) DSCQS's "if the opinion score of the high-quality video is lower than that of the low-quality video" can correspond to "the low-quality video has better quality" of the pairwise comparison. Through this correspondence relationship, even when DSCQS is used, image quality satisfaction can be measured in the same way as when the pairwise comparison described above is used.

특징 벡터Feature vector

SURP(300)는 기계 학습을 이용하는 예측기일 수 있다. 기계 학습의 일 예로, SURP(300)는 서포트 벡터 머신(Support Vector Machine; SVM)을 이용할 수 있다. SURP 300 may be a predictor using machine learning. As an example of machine learning, SURP 300 may use a Support Vector Machine (SVM).

비디오의 FPS가 변경되었을 때 비디오의 화질의 저하를 예측하기 위해서는 비디오의 화질에 영향을 미치는 특징이 특징 벡터로서 추출될 수 있어야 한다. 이하에서는 공간적 마스킹 효과(spatial masking effect), 시간적 마스킹 효과(temporal masking effect), 주목(salient) 영역 및 대비 민감도(constract sensitivity) 등의 인간의 시각 특성을 반영하는 다양한 특징 벡터 추출 방법이 설명된다.In order to predict the deterioration of video quality when the FPS of the video changes, features that affect video quality must be extracted as feature vectors. Below, various feature vector extraction methods that reflect human visual characteristics such as spatial masking effect, temporal masking effect, salient area, and contrast sensitivity are described.

SVM은, 입력 프레임의 전체 보다는, 예측을 위해 요구되는 입력 프레임의 정보를 보다 더 잘 표현하는 특징 벡터를 일반적으로 사용할 수 있다. 즉, SVM의 예측 성능은 어떤 정보를 특징 벡터로서 사용하는 가에 의해 좌우될 수 있다.SVM can generally use a feature vector that better represents the information of the input frame required for prediction, rather than the entire input frame. In other words, the prediction performance of SVM can be influenced by what information is used as a feature vector.

실시예에서, 특징 벡터가 인지 화질의 특성을 반영하기 위해, 특징 벡터는 비디오의 부분이 인지 화질에 적은 영향을 미치는지 아니면 인지 화질에 큰 영향을 미치는 지가 고려되도록 설계될 수 있다.In an embodiment, so that the feature vector reflects the characteristics of perceived image quality, the feature vector may be designed to take into account whether a portion of the video has a small effect on the perceived image quality or a large effect on the perceived image quality.

예를 들면, 비디오 중 인지 화질에 미치는 영향이 적은 부분은 공간적 마스킹 효과가 큰 부분 및 시간적 마스킹 효과가 큰 부분일 수 있다. 비디오 중 인지 화질에 미치는 영향이 큰 부분은 대비 민감도(Contrast Sensitivity; CS)가 높은 부분일 수 있다. 또한, 시각적 주목도(Visual Saliency; VS) 또한 인지 화질에 큰 영향을 미칠 수 있다. 따라서, 특징 벡터의 설계에 있어서 VS도 고려될 수 있다.
For example, parts of the video that have little impact on perceived quality may be parts with a large spatial masking effect and parts with a large temporal masking effect. The part of the video that has a large impact on perceived image quality may be the part with high contrast sensitivity (CS). Additionally, visual saliency (VS) can also have a significant impact on perceived image quality. Therefore, VS can also be considered in designing feature vectors.

도 5는 일 실시예에 따른 특징 벡터(feature vector)의 추출의 절차를 설명한다.Figure 5 explains a procedure for extracting a feature vector according to an embodiment.

단계(510)에서, SURP(300)는 각 프레임에 대해 공간적 임의 맵(Spatial Randomness Map; SRM)을 생성할 수 있다.At step 510, SURP 300 may generate a spatial randomness map (SRM) for each frame.

공간적 마스킹(spatial masking) 효과를 특징 벡터에 반영하기 위해 SRM이 사용될 수 있다. 시각적으로 잘 인지되지 않는 영역은 주로 임의(Randomness)가 높은 영역일 수 있다. 따라서, 공간적 임의(Spatial Randoness; SR)의 계산을 통해, 프레임 중 공간적 마스킹 영역이 결정될 수 있다. SRM can be used to reflect the spatial masking effect in the feature vector. Areas that are not well recognized visually may be areas with high randomness. Therefore, the spatial masking area in the frame can be determined through calculation of spatial randomness (SR).

SR의 영역의 특성은 주변의 다른 영역의 특성과 상이할 수 있다. 이러한 상이함은 주변의 영역의 정보로부터는 SR의 영역의 예측되기 어렵다는 것을 나타낼 수 있다. 따라서, SR을 측정하기 위해, 도 6과 같이 주변 영역 X으로부터 중앙 영역 Y가 예측될 수 있다.
The characteristics of the SR area may be different from the characteristics of other surrounding areas. This difference may indicate that it is difficult to predict the area of the SR from information on the surrounding area. Therefore, to measure SR, the central area Y can be predicted from the peripheral area X as shown in FIG. 6.

도 6은 일 예에 따른 공간적 임의의 측정을 위한 예측 모델을 나타낸다.Figure 6 shows a prediction model for spatial random measurement according to an example.

도 6에서는 중앙의 Y 픽셀이 주변의 X 픽셀들로부터 예측되는 것이 도시되었다.
In Figure 6, it is shown that the central Y pixel is predicted from the surrounding X pixels.

다시 도 5를 참조한다.Refer again to Figure 5.

Y에 대한 최적 예측은 아래의 수학식 2와 같이 표현될 수 있다.The optimal prediction for Y can be expressed as Equation 2 below.

u는 공간적 위치를 나타낼 수 있다. H는 주변의 X 값들로부터 Y 값에 대한 최적의 예측을 제공하는 변환 행렬일 수 있다. 최소 평균 에러의 최적화 방법이 사용될 경우, H는 아래의 수학식 3과 같이 표현될 수 있다.u can represent a spatial location. H may be a transformation matrix that provides optimal prediction of the Y value from surrounding X values. When the optimization method of minimum average error is used, H can be expressed as Equation 3 below.

R_xy는 X 및 Y의 공분산(crosscorrelation) 행렬일 수 있다.R _xy may be the crosscorrelation matrix of X and Y.

R_x는 X 자신에 대한 자기 상관(auto correlation) 행렬일 수 있다.R _x may be an autocorrelation matrix for X itself.

근사된 슈도인버스 행렬(approximated pseudoinverse matrix) 기법을 이용하여, R_x의 역행렬은 아래의 수학식 4에 따라 획득될 수 있다.Using the approximated pseudoinverse matrix technique, the inverse matrix of R _x can be obtained according to Equation 4 below.

전술된 수학식 2, 수학식 3 및 수학식 4에 따라 아래의 수학식 5이 성립할 수 있다.According to Equation 2, Equation 3, and Equation 4 described above, Equation 5 below can be established.

SURP(300)는 수학식 5에 기반하여 프레임의 각 픽셀 별로 SRM을 획득할 수 있다.
SURP 300 can obtain SRM for each pixel of the frame based on Equation 5.

도 7은 일 예에 따른 SRM을 도시한다.Figure 7 shows SRM according to an example.

도 7에서, 밝은 픽셀은 주변의 픽셀로부터 예측이 잘 되지 않는 픽셀을 나타낼 수 있다. 말하자면, 밝은 픽셀은 공간적 임의가 높은 영역, 즉 공간적 마스킹 효과가 큰 영역을 나타낼 수 있다.
In FIG. 7, a bright pixel may represent a pixel that is not well predicted from surrounding pixels. In other words, bright pixels may indicate areas of high spatial randomness, i.e. areas with a large spatial masking effect.

다시 도 5를 참조한다.Refer again to Figure 5.

단계(515)에서, SURP(300)는 각 프레임에 대해 에지 맵(Edge Map: EM)을 생성할 수 있다.In step 515, SURP 300 may generate an edge map (EM) for each frame.

SURP(300)는 각 프레임에 대해 소벨 에지(sobel edge) 연산을 사용하여 에지 맵을 생성할 수 있다.SURP 300 may generate an edge map using a Sobel edge operation for each frame.

단계(520)에서, SURP(300)는 각 프레임에 대해 평탄 맵(Smoothness Map; SM)을 생성할 수 있다.At step 520, SURP 300 may generate a smoothness map (SM) for each frame.

SM은 프레임의 배경이 평탄한 영역인지 여부를 판별하기 위해 사용될 수 있다.SM can be used to determine whether the background of the frame is a flat area.

SURP(300)는 SRM의 블록들에 대해여 각 블록의 평탄 값을 계산할 수 있다. 각 블록의 평탄 값은 아래의 수학식 6에 의해 계산될 수 있다.SURP (300) can calculate the smoothness value of each block for blocks of SRM. The flatness value of each block can be calculated by Equation 6 below.

N_lc는 블록 내에서 기준 값보다 더 낮은 공간적 임의를 갖는 픽셀의 개수일 수 있다. 여기에서, 기준 값보다 더 낮은 공간적 임의는 낮은 복잡도(low complexity)를 의미할 수 있다.N _lc may be the number of pixels within the block with spatial randomness lower than the reference value. Here, spatial randomness lower than the reference value may mean low complexity.

W_b ²는 블록의 면적, 즉 블록 내의 픽셀들의 개수를 나타낼 수 있다. 예를 들면, W_b는 32일 수 있다. W_b의 값이 32인 것은, 32x32의 블록이 사용된다는 것을 나타낼 수 있다.W _b ² may represent the area of the block, that is, the number of pixels within the block. For example, W _b may be 32. The value of W _b is 32, which may indicate that a 32x32 block is used.

SURP(300)는 각 블록의 내부의 모든 픽셀들의 값을 상기의 블록의 평탄 값으로 설정함으로써 SM을 생성할 수 있다.
SURP 300 can generate SM by setting the values of all pixels inside each block to the flatness value of the block.

도 8은 일 예에 따른 SM을 도시한다.Figure 8 shows SM according to one example.

SM에서 블록 별로 동일한 픽셀 값이 설정될 수 있다.In SM, the same pixel value can be set for each block.

SM에서, 블록이 더 밝을수록, 배경은 더 평탄할 수 있다. 말하자면, 밝은 블록은 평탄한 배경을 나타낼 수 있다. 밝은 블록의 에지는 어두운 블록의 에지에 비해 더 잘 인지될 수 있다.
In SM, the brighter the blocks, the flatter the background can be. That is, bright blocks can represent a flat background. The edges of bright blocks can be better perceived compared to the edges of dark blocks.

다시 도 5를 참조한다.Refer again to Figure 5.

단계(525)에서, SURP(300)는 각 프레임에 대해 공간적 대비 맵(Spatial Contrast Map; SCM)을 생성할 수 있다.At step 525, SURP 300 may generate a spatial contrast map (SCM) for each frame.

자극 민감도(stimulus sensitivity)는 공간적 대비(Spatial Contrast)와 관련될 수 있다. 공간적 대비를 특징 벡터에 반영하기 위해 SCM이 사용될 수 있다.Stimulus sensitivity can be related to spatial contrast. SCM can be used to reflect spatial contrast in feature vectors.

SCM은 에지 및 평탄의 측정에 의해 획득될 수 있다. 평탄한(smooth) 배경 영역에서는 자극에 대해서 민감한 반응이 발생할 수 있다. 여기에서, 자극은 에지를 의미할 수 있다. 반면, 낮은 평탄을 갖는 배경 영역에서의 에지는 마스킹 효과로 인해 자극에 대한 낮은 민감도를 가질 수 있다. SCM은 이러한 특성을 반영하기 위한 맵일 수 있다.SCM can be obtained by measuring edges and flatness. Sensitive responses to stimuli may occur in smooth background areas. Here, the stimulus may mean an edge. On the other hand, edges in background areas with low flatness may have low sensitivity to stimulation due to the masking effect. SCM can be a map to reflect these characteristics.

SCM은 SM 및 EM의 서로 대응하는 픽셀들의 픽셀 값들의 곱들일 수 있다. SURP(300)는 SM 및 EM의 동일한 위치의 픽셀들의 픽셀 값의 곱을 SIM의 픽셀 값으로 설정할 수 있다.SCM may be a product of pixel values of corresponding pixels of SM and EM. SURP 300 may set the product of pixel values of pixels at the same location in SM and EM as the pixel value of SIM.

SURP(300)는 아래의 수학식 7에 따라 SIM을 생성할 수 있다.SURP (300) can generate SIM according to Equation 7 below.

Y는 SCM, EM 및 SM의 각각에서의 동일한 위치의 픽셀일 수 있다.
Y may be a pixel at the same location in each of SCM, EM, and SM.

도 9는 일 예에 따른 SCM을 도시한다.Figure 9 shows SCM according to an example.

도 9에서, 밝은 픽셀은 강한 에지이면서 평탄이 높은 배경을 나타낼 수 있다. 말하자면, 밝은 픽셀은 강한 에지를 나타내는 픽셀이거나, 평탄이 높은 배경을 나타내는 픽셀일 수 있다.
In Figure 9, bright pixels may indicate a strong edge and a highly flat background. That is, a bright pixel may be a pixel representing a strong edge, or a pixel representing a highly flat background.

다시 도 5를 참조한다.Refer again to Figure 5.

단계(530)에서, SURP(300)는 각 프레임에 대해 시각적 주목도 맵(Visual Saliency Map; VSM)을 생성할 수 있다.At step 530, SURP 300 may generate a Visual Saliency Map (VSM) for each frame.

인간의 시각 특성 상, 인간이 관심을 가지는 영역에 대한 자극은 인간이 관심을 가지지 않는 영역에 대한 자극보다 더 큰 영향을 미칠 수 있다. 이러한 시각 특징이 특징 벡터에 반영되도록, 시각적 주목도의 정보가 사용될 수 있다.Due to the characteristics of human vision, stimulation of areas of interest to humans may have a greater impact than stimulation of areas of interest to humans. Information on visual attention may be used so that these visual features are reflected in the feature vector.

VSM의 일 예로, 하렐(Harel) 및 페로나(Perona)에 의해 제안된 그래픽 기반 시각적 주목도 방법(graphic based visual saliency method)이 사용될 수 있다.
As an example of VSM, the graphic based visual saliency method proposed by Harel and Perona can be used.

도 10는 일 예에 따른 VSM을 도시한다.Figure 10 shows a VSM according to an example.

도 10에서, 밝은 영역은 사람이 프레임의 영상을 볼 때 관심을 갖는 영역을 나타낼 수 있다. 말하자면, 밝은 영역은 주목도가 높은 영역일 수 있다. 어두운 영역은 사람이 프레임의 영상을 볼 때 관심을 가지지 않는 영역을 나타낼 수 있다. 말하자면, 어두운 영역은 주목도가 낮은 영역일 수 있다.
In FIG. 10, a bright area may represent an area of interest to a person when viewing an image of a frame. In other words, a bright area may be an area of high attention. A dark area may represent an area that a person is not interested in when viewing the image in the frame. In other words, a dark area may be an area of low attention.

다시 도 5를 참조한다.Refer again to Figure 5.

단계(535)에서, SURP(300)는 각 프레임에 대해 공간적 영향 맵(Spatial Influence Map; SIM)을 생성할 수 있다.At step 535, SURP 300 may generate a spatial influence map (SIM) for each frame.

SIM은, 시각적인 자극인 예지의 주변 정보뿐만 아니라, 픽셀이 시각적인 주목도(visual saliency) 영역에 해당하는지 여부를 반영하는 맵일 수 있다.SIM may be a map that reflects not only surrounding information of precognition, which is a visual stimulus, but also whether a pixel corresponds to a visual saliency area.

SIM은 SCM 및 VSM의 서로 대응하는 픽셀들의 픽셀 값들의 곱들일 수 있다. SURP(300)는 SCM 및 VSM의 동일한 위치의 픽셀들의 픽셀 값의 곱을 SCM의 픽셀 값으로 설정할 수 있다.SIM may be a product of pixel values of corresponding pixels of SCM and VSM. SURP 300 may set the product of pixel values of pixels at the same location in the SCM and VSM as the pixel value of the SCM.

SURP(300)는 아래의 수학식 8에 따라 SCM을 생성할 수 있다.SURP (300) can generate SCM according to Equation 8 below.

Y는 SIM, SCM 및 VSM의 각각에서의 동일한 위치의 픽셀일 수 있다.
Y may be a pixel at the same location in each of SIM, SCM, and VSM.

도 11은 일 에에 따른 SIM을 도시한다.
Figure 11 shows the SIM according to work.

다시 도 5를 참조한다.Refer again to Figure 5.

SURP(300)는 기본 결정 단위의 프레임들의 공간적 특성 및 시간적 특성에 기반하여 특징 벡터를 획득할 수 있다. 전술된 단계들(510, 515, 520, 525, 530 및 535)에서는 프레임의 공간적 특성을 반영하여 특징 벡터를 획득하는 과정이 설명되었다. 아래의 단계들(540, 545, 550 및 555)에서는 프레임의 시간적 특성을 반영하여 특징 벡터를 획득하는 과정이 설명된다.SURP 300 may acquire a feature vector based on the spatial and temporal characteristics of the frames of the basic decision unit. In the above-described steps 510, 515, 520, 525, 530, and 535, the process of acquiring a feature vector by reflecting the spatial characteristics of the frame was explained. In the steps 540, 545, 550, and 555 below, the process of acquiring a feature vector by reflecting the temporal characteristics of the frame is explained.

시간적 마스킹 효과가 높은 영역은 불규칙한 움직임을 갖는 영역 또는 갑작스러운 움직임을 갖는 영역일 수 있다. 즉, 임의가 높은 영역은 시간적 마스킹 효과가 높은 영역일 수 있으며, 프레임에서 임의가 높은 영역을 검출함으로써 시간적 마스킹 효과가 높은 영역이 결정될 수 있다.An area with a high temporal masking effect may be an area with irregular movement or an area with sudden movement. That is, an area with high randomness may be an area with a high temporal masking effect, and an area with a high temporal masking effect can be determined by detecting an area with high randomness in the frame.

시간적 특성에 기반하여 특징 벡터를 획득함으로써, SURP(300)는 불규칙한 움직임 또는 갑작스러운 움직임에 민감하게 반응하는 인간의 시각적 특성을 특징 벡터의 획득에 대하여 반영할 수 있다.By acquiring a feature vector based on temporal characteristics, the SURP 300 can reflect human visual characteristics that react sensitively to irregular or sudden movements in the acquisition of the feature vector.

전술된 것과 같이, 공간적 임의는 주변 픽셀로부터 예측하는 모델에 의해 계산될 수 있다. 이러한 계산 방식과 유사하게, 시간적 임의는 이전의 프레임들로부터 현재의 프레임을 예측하는 모델을 통해 계산될 수 있다.As described above, spatial randomness can be computed by a model predicting from surrounding pixels. Similar to this calculation method, temporal randomness can be calculated through a model that predicts the current frame from previous frames.

시간적 임의를 계산하기 위해, 우선 GOP의 프레임들의 입력 구간은 2 개의 구간들로 분할될 수 있다.To calculate temporal randomness, first the input interval of frames of the GOP can be divided into two intervals.

구간은 아래의 수학식 9와 같이 표현될 수 있다.The section can be expressed as Equation 9 below.

Y는 구간을 나타낼 수 있다. k는 구간의 시작 프레임을 나타낼 수 있다. l은 구간의 마지막 프레임으로 구성됨을 나타낼 수 있다. 말하자면, Y_k ^l은 구간이 k 번째 프레임부터 l 번째 프레임으로 구성됨을 나타낼 수 있다.Y may represent a section. k may represent the start frame of the section. l may indicate that it consists of the last frame of the section. In other words, Y _k ^l may indicate that the section consists of the kth frame to the lth frame.

수학식 9가 행렬로서 저장될 때, 행렬의 열들은 프레임들에 각각 대응할 수 있다. 말하자면, 구간을 행렬로서 저장하는 것은 아래의 1) 및 2)의 단계로 이루어질 수 있다.When Equation 9 is stored as a matrix, the columns of the matrix may each correspond to frames. In other words, storing the section as a matrix can be accomplished through steps 1) and 2) below.

1) 구간의 프레임들은 1 차원의 행 벡터의 형태로 배열될 수 있다. 예를 들면, 프레임의 두 번째 행은 프레임의 첫 번째 행에 연쇄(concatenate)될 수 있다. 프레임의 행들은 순차적으로 연쇄될 수 있다.1) The frames in the section can be arranged in the form of a one-dimensional row vector. For example, the second row of a frame can be concatenated to the first row of a frame. Rows of a frame can be chained sequentially.

2) 행 벡터는 열 벡터의 형태로 전치(transpose)될 수 있다. 프레임들의 열 백터들로 구성된 행렬이 저장될 수 있다.2) Row vectors can be transposed in the form of column vectors. A matrix consisting of column vectors of frames may be stored.

각 구간의 길이가 d일 때, 2 개의 구간들은 Y_k+d ^l 및 Y_k ^l ^d로 표현될 수 있다.
When the length of each section is d, the two sections can be expressed as Y _k+d ^l and Y _k ^l ^d .

단계(540)에서, SURP(300)는 시간적 예측 모델(Temporal Prediction Model; TPM)을 생성할 수 있다.At step 540, SURP 300 may generate a temporal prediction model (TPM).

아래의 도 12에서는, GOP의 크기가 8이면서, d의 값이 4인 경우가 예시된다.
In FIG. 12 below, a case where the GOP size is 8 and the value of d is 4 is illustrated.

도 12은 일 예에 따른 시간적 예측 모델의 생성에 사용되는 프레임 및 예측 대상 프레임 간의 관계를 도시한다.Figure 12 shows the relationship between a frame used to generate a temporal prediction model and a prediction target frame according to an example.

도 12에서, "프레임 0"은 제1 프레임을 나타낼 수 있고, "프레임 1"은 제2 프레임을 나타낼 수 있다. 말하자면 "프레임 n"은 제n+1 프레임일 수 있다.In FIG. 12, “frame 0” may represent the first frame, and “frame 1” may represent the second frame. In other words, “frame n” may be the n+1th frame.

도 12에 따르면, "프레임 0"부터 "프레임 3"까지가 첫 번째 구간일 수 있고, "프레임 4"부터 "프레임 7"까지가 두 번째 구간일 수 있다.According to FIG. 12, “Frame 0” to “Frame 3” may be the first section, and “Frame 4” to “Frame 7” may be the second section.

첫 번째 구간의 프레임들은 TPM을 위해 사용될 수 있으며, TPM에 의해 두 번째 구간의 프레임들이 예측될 수 있다.
Frames in the first section can be used for TPM, and frames in the second section can be predicted by TPM.

다시 도 5를 참조한다.Refer again to Figure 5.

TPM은 GOP의 첫 번째 구간을 이용하여 생성될 수 있다. 예를 들면, 크기가 8인 GOP에 대해, TPM은 제1 프레임, 제2 프레임, 제3 프레임 및 제4 프레임을 이용하여 생성될 수 있다.TPM can be created using the first section of GOP. For example, for a GOP of size 8, the TPM may be generated using the first frame, second frame, third frame, and fourth frame.

SURP(300)는 아래의 수학식 10에 기반하여 시간적 예측을 수행할 수 있다.SURP (300) can perform temporal prediction based on Equation 10 below.

여기에서, A는 최적 예측을 제공하는 변환 행렬일 수 있다.Here, A may be a transformation matrix that provides the optimal prediction.

T는 예측 행렬일 수 있다.T may be a prediction matrix.

A는 슈도 인버스(pseudo inverse)를 이용하여 아래의 수학식 11과 같이 계산될 수 있다.A can be calculated as Equation 11 below using pseudo inverse.

그러나, 수학식 11의 각 행렬들은 매우 크기 때문에, 수학식 11의 적용은 실질적으로 불가능할 수 있다.However, since each matrix in Equation 11 is very large, application of Equation 11 may be practically impossible.

아래의 수학식 12의 상태 시퀀스(state sequence) 표현을 사용함에 따라 행렬 A가 보다 용이하게 계산될 수 있다.Matrix A can be calculated more easily by using the state sequence expression in Equation 12 below.

여기에서, Y는 X의 상태 행렬(state matrix)일 수 있다.Here, Y may be the state matrix of X.

C는 부호화(encoding) 행렬일 수 있다.C may be an encoding matrix.

W는 바이어스 행렬일 수 있다.W may be a bias matrix.

아래의 수학식 13은 Y에 대한 특이 값 분해(singular value decomposition)를 나타낼 수 있다. 이 때, W는 영 행렬로 가정될 수 있다.Equation 13 below can represent singular value decomposition for Y. At this time, W can be assumed to be a zero matrix.

수학식 12 및 수학식 13을 비교하면, SURP(300)는 아래의 수학식 14에 따라 최적 상태 벡터(optimal state vector)를 도출할 수 있다.Comparing Equation 12 and Equation 13, SURP (300) can derive an optimal state vector according to Equation 14 below.

SURP는 아래의 수학식 15에 따라 시간적 예측 에러인 시간적 임의를 획득할 수 있다.SURP can obtain temporal randomness, which is a temporal prediction error, according to Equation 15 below.

도 13a 내지 13c는 연속된 일 예에 따른 3개의 프레임들을 나타낸다.13A to 13C show three consecutive frames according to an example.

도 13a는 일 예에 따른 연속된 3개의 프레임들 중 첫 번째의 프레임을 나타낸다.FIG. 13A shows the first frame of three consecutive frames according to an example.

도 13b는 일 예에 따른 연속된 3개의 프레임들 중 두 번째의 프레임을 나타낸다.FIG. 13B shows the second frame among three consecutive frames according to an example.

도 13c는 일 예에 따른 연속된 3개의 프레임들 중 세 번째 프레임을 나타낸다.
FIG. 13C shows the third frame among three consecutive frames according to an example.

다음으로, 3개의 프레임들에 대한 맵들이 도시된다.Next, maps for three frames are shown.

도 13d는 일 예에 따른 연속된 3개의 프레임들에 대한 시간적 임의 맵(Temporal Randomness Map; TRM)을 나타낸다.FIG. 13D shows a temporal randomness map (TRM) for three consecutive frames according to an example.

도 13e는 일 예에 따른 연속된 3개의 프레임들에 대한 공간시간적 영향 맵(SpatioTemporal Influence Map; STIM)을 나타낸다.FIG. 13E shows a SpatioTemporal Influence Map (STIM) for three consecutive frames according to an example.

도 13f는 일 예에 따른 연속된 3개의 프레임들에 대한 가중치가 부여된 공간시간적 영향 맵(Weighted SpatioTemporal Influence Map; WSTIM)을 나타낸다.FIG. 13F shows a Weighted SpatioTemporal Influence Map (WSTIM) for three consecutive frames according to an example.

TRM, STIM 및 WSTIM에 대해서 아래에서 상세하게 설명된다.
TRM, STIM and WSTIM are described in detail below.

다시 도 5를 참조한다.Refer again to Figure 5.

단계(545)에서, SURP(300)는 시간적 임의 맵(Temporal Randomness Map; TRM)을 생성할 수 있다.At step 545, SURP 300 may generate a Temporal Randomness Map (TRM).

TRM은 제4 프레임, 제5 프레임, 제6 프레임 및 제7 프레임에 대한 맵일 수 있다.TRM may be a map for the fourth frame, fifth frame, sixth frame, and seventh frame.

TRM에서, 밝은 픽셀은 시간적 임의가 큰 영역을 나타낼 수 있다. 시간적 임의가 큰 영역은 시각적인 인지 효과가 적은 영역일 수 있다.In TRM, bright pixels may indicate areas of large temporal randomness. An area with large temporal randomness may be an area with little visual recognition effect.

단계(550)에서, SURP(300)는 공간시간적 영향 맵(SpatioTemporal Influence Map; STIM)을 생성할 수 있다.At step 550, SURP 300 may generate a SpatioTemporal Influence Map (STIM).

STIM은 제4 프레임, 제5 프레임, 제6 프레임 및 제8 프레임에 대한 맵일 수 있다.STIM may be a map for the fourth frame, fifth frame, sixth frame, and eighth frame.

사람이 비디오를 볼 때, 사람의 시각적 특성에 따라 사람은 공간적 시각적 자극 및 시간적 시각적 자극을 함께 인지할 수 있다. 따라서, 특징 벡터에서도 공간적 특성 및 시간적 특성이 동시에 반영될 필요가 있다.When a person watches a video, the person can perceive both spatial and temporal visual stimuli depending on the person's visual characteristics. Therefore, spatial characteristics and temporal characteristics need to be simultaneously reflected in the feature vector.

이러한 공간적 특성 및 시간적 특성의 반영을 위해, 아래의 수학식 16과 같이 STIM이 정의될 수 있다.To reflect these spatial and temporal characteristics, STIM can be defined as shown in Equation 16 below.

Y는 특정한 위치의 픽셀을 나타낼 수 있다.Y can represent a pixel at a specific location.

STIM은 SIM 및 TRM의 서로 대응하는 픽셀들의 픽셀 값들의 곱들일 수 있다. SURP(300)는 SIM 및 TRM의 동일한 위치의 픽셀들의 픽셀 값의 곱을 STIM의 픽셀 값으로 설정할 수 있다.STIM may be a product of pixel values of corresponding pixels of SIM and TRM. SURP 300 may set the product of pixel values of pixels at the same location in SIM and TRM as the pixel value of STIM.

단계(555)에서, SURP(300)는 가중치가 부여된 공간시간적 영향 맵(Weighted SpatioTemporal Influence Map; WSTIM)을 생성할 수 있다.At step 555, SURP 300 may generate a Weighted SpatioTemporal Influence Map (WSTIM).

WSTIM은 제4 프레임, 제5 프레임, 제6 프레임 및 제8 프레임에 대한 맵일 수 있다.WSTIM may be a map for the fourth frame, fifth frame, sixth frame, and eighth frame.

WSTIM은 전체의 자극에서 상대적으로 큰 자극을 강조할 수 있다.WSTIM can emphasize relatively large stimuli from the overall stimuli.

SURP(300)는 SIM의 픽셀의 값이 SIM의 픽셀들의 평균 값으로 나뉘어진 값을 WSTIM의 픽셀 값으로 설정할 수 있다.SURP 300 may set the pixel value of the SIM divided by the average value of the pixels of the SIM as the pixel value of the WSTIM.

WSTIM은 프레임들 내에 작은 물체의 빠른 움직임의 자극이 있는 경우에 효과적일 수 있다.WSTIM can be effective when there is stimulation of fast movement of small objects within the frames.

단계(560)에서, SURP(300)는 SURP(300)는 특성 표현자(feature representor)의 기능을 수행할 수 있다. SURP(300)는 TRIM, STIM 및 WSTIM 중 적어도 하나의 결과를 출력할 수 있다.In step 560, SURP 300 may perform the function of a feature representor. SURP 300 may output at least one result among TRIM, STIM, and WSTIM.

SURP(300)는 프레임의 기정의된 크기의 블록들의 각 블록 별로 TRIM, STIM 및 WSTIM 중 적어도 하나의 평균을 계산할 수 있다. SURP(300)는 블록들의 평균들을 내림차순으로 정렬된 순서로 출력할 수 있다. 예를 들면, 기정의된 크기는 64x64일 수 있다.SURP 300 may calculate the average of at least one of TRIM, STIM, and WSTIM for each block of blocks of a predefined size in the frame. SURP 300 can output the averages of blocks in descending sorted order. For example, the predefined size may be 64x64.

전술된 맵들은 영상과 같은 데이터를 나타내는 행렬의 형태로 표현될 수 있다. 전술된 맵들을 특징 벡터에 적합한 벡터의 형태로 표현하는 특성 표현자의 과정이 요구된다.The maps described above may be expressed in the form of a matrix representing data such as an image. A feature descriptor process is required to express the above-described maps in the form of vectors suitable for feature vectors.

SURP(300)는 맵들의 각 맵을 기정의된 크기의 블록의 단위로 분할할 수 있다. 예를 들면, 비디오가 에이치디(HD) 비디오인 경우, 블록의 크기는 64x64일 수 있다. SURP(300)는 각 블록 별로, 각 블록의 픽셀들의 평균 값을 계산할 수 있다. 또한, SURP(300)는 위치에 대한 종속성을 갖지 않게 하기 위해, 맵의 블록들을 평균 값이 큰 블록부터 작은 블록으로의 순서로 정렬할 수 있다.SURP 300 may divide each map into blocks of a predefined size. For example, if the video is HD video, the block size may be 64x64. SURP 300 can calculate the average value of the pixels of each block for each block. Additionally, in order to avoid dependency on location, the SURP 300 may sort the blocks of the map in order from blocks with a large average value to blocks with a small average value.

이러한 과정을 통해 각 맵은 히스토그램과 같은 1차원 배열의 형태로 표현될 수 있다.Through this process, each map can be expressed in the form of a one-dimensional array such as a histogram.

비디오가 HD 비디오인 경우, 1차원 배열의 크기는 507일 수 있다.If the video is HD video, the size of the one-dimensional array may be 507.

1차원 배열은 블록의 평균 값의 순서로 정렬된 배열일 수 있다. 따라서, 일반적으로, 큰 값들은 1차원 배열의 앞 부분에 배치될 수 있고, 뒤로 갈수록 작은 값들이 배치될 수 있다. 따라서, 1차원 배열의 값들 중 앞의 일부 만이 특징 벡터로서 사용되더라도, 1차원 배열의 값들의 전부가 사용된 것에 비해 성능의 차이가 크지 않을 수 있다. 이러한 특징에 따라, SURP(300)는 1차원 배열의 값들의 전체를 특징 벡터로서 사용하는 대신, 큰 값들을 갖는 앞의 일부 만을 특징 벡터로서 사용할 수 있다. 여기에서, 일부분은 기정의된 길이 또는 기정의된 비율일 수 있다. 1차원 배열에서, 큰 값들은 배열의 앞 부분에 있을 수 있고, 있을 수 있다. 일 예로, SURP(300)는 1차원 배열의 값들 중 앞의 100개만을 사용할 수 있다.
A one-dimensional array may be an array sorted in order of the average value of the block. Therefore, in general, large values can be placed at the front of a one-dimensional array, and smaller values can be placed towards the back. Therefore, even if only the first part of the values of the one-dimensional array are used as the feature vector, the difference in performance may not be large compared to if all of the values of the one-dimensional array are used. According to this characteristic, SURP 300 can use only the first part with large values as a feature vector, instead of using all of the values of the one-dimensional array as a feature vector. Here, the portion may be a predefined length or a predefined ratio. In a one-dimensional array, larger values can and should be at the front of the array. As an example, SURP 300 can use only the first 100 values of a one-dimensional array.

모델의 학습Training of the model

SURP(300)는 SVM을 이용할 수 있다. SURP(300)의 기계 학습은 아래의 1) 내지 3)의 작업들일 수 있다.SURP (300) can use SVM. The machine learning of SURP (300) may be tasks 1) to 3) below.

1) SURP(300)는 GOP로부터 특징 벡터를 추출할 수 있다.1) SURP 300 can extract feature vectors from GOP.

2) SURP(300)는 추출된 특징 벡터를 SVM의 입력으로서 사용하여 출력 예측을 수행할 수 있다.2) SURP 300 can perform output prediction using the extracted feature vector as an input to SVM.

3) SURP(300)는 출력 예측에 의해 출력된 값이 그라운드 트루스에 최대한 가까운 값을 갖도록 SVM을 생성할 수 있다.3) SURP (300) can generate an SVM so that the value output by output prediction has a value as close as possible to the ground truth.

아래에서는, 60 FPS의 비디오가 30 FPS의 비디오로 변경될 때의 화질의 저하를 예측하기 위해 모델을 학습하는 일 예가 설명된다.Below, an example of learning a model to predict the degradation of picture quality when a 60 FPS video is changed to a 30 FPS video is described.

우선, 호모지니어스 비디오의 데이터 세트에 대한 주관적 화질 평가를 통해 그라운드 트루스가 이미 획득되었다.First, ground truth has already been obtained through subjective image quality evaluation of the homogeneous video dataset.

다음으로, SURP(300)는 60 FPS의 비디오에 대해 TRM, STIM 및 WSTIM의 각각을 구할 수 있다, 또한, SURP(300)는 비디오의 FPS가 30이라는 내부적인 가정 하에 TRM, STIM 및 WSTIM의 각각을 구할 수 있다. 또한, SURP(300)는 60 FPS의 비디오의 TRM, STIM 및 WSTIM을 특징 표현자를 통해 1차원 배열로 변환할 수 있고, SURP(300)는 30 FPS의 비디오의 TRM, STIM 및 WSTIM을 특징 표현자를 통해 1차원 배열로 변환할 수 있다.Next, SURP (300) can obtain each of TRM, STIM, and WSTIM for a video of 60 FPS. Additionally, SURP (300) can obtain each of TRM, STIM, and WSTIM under the internal assumption that the FPS of the video is 30. can be obtained. In addition, SURP (300) can convert the TRM, STIM, and WSTIM of the 60 FPS video into a one-dimensional array through a feature descriptor, and the SURP (300) can convert the TRM, STIM, and WSTIM of the 30 FPS video into a feature descriptor. It can be converted to a one-dimensional array.

다음으로, SURP(300)는 통해 60 FPS의 비디오의 특징 벡터 및 30 FPS의 비디오의 특징 벡터가 그라운드 트루스에 상응하는 차이를 갖도록 SVM에 대한 트레이닝을 수행할 수 있다.Next, SURP 300 can perform training on SVM so that the feature vector of the 60 FPS video and the feature vector of the 30 FPS video have a difference corresponding to the ground truth.

다음으로, 학습이 완료되면, SURP(300)는 SVM의 내부를 더 이상 변경하지 않고, 60 FPS의 GOP를 사용하여 60 FPS의 GOP의 특징 벡터 및 30 FPS의 GOP의 특징 벡터를 각각 추출할 수 있다.Next, when learning is completed, SURP (300) can extract the feature vector of the 60 FPS GOP and the feature vector of the 30 FPS GOP, respectively, using the 60 FPS GOP, without further changing the internals of the SVM. there is.

SURP(300)는 60 FPS의 GOP의 특징 벡터 및 30 FPS의 GOP의 특징 벡터를 사용하여 60 FPS의 GOP의 화질 및 30 FPS의 GOP의 화질 간의 화질 차이의 정도를 예측할 수 있고, 예측된 화질 차이의 정도에 기반하여 최종적으로 화질 만족도를 생성 및 출력할 수 있다.SURP (300) can predict the degree of difference in picture quality between the picture quality of the 60 FPS GOP and the picture quality of the 30 FPS GOP using the feature vector of the 60 FPS GOP and the feature vector of the 30 FPS GOP, and the predicted picture quality difference Based on the degree of quality, the final image quality satisfaction level can be generated and output.

여기에서, 60 FPS 및 30 FPS는 단지 예시적인 것이다. SURP(300)는, 전술된 60 FPS 및 30 FPS에서의 화질 만족도의 생성과 유사하게, 다른 FPS들 간의 변환에 대해서도 화질 만족도를 생성 및 출력할 수 있다. 예를 들면, FPS는 60에서 15로 변경될 수 있다. 또는, FPS는 120에서 60으로 변경될 수 있다.Here, 60 FPS and 30 FPS are merely examples. SURP 300 can generate and output image quality satisfaction for conversion between different FPS, similar to the generation of image quality satisfaction for 60 FPS and 30 FPS described above. For example, FPS can be changed from 60 to 15. Alternatively, FPS can be changed from 120 to 60.

SVM의 특징 벡터를 생성하기 위해서는 전술된 SIM, TRM, STIM 및 WSTIM 중 하나의 맵만이 사용될 수 있고, 2개 이상의 맵들이 동시에 사용될 수도 있다.To generate a feature vector of SVM, only one map among the above-described SIM, TRM, STIM, and WSTIM can be used, and two or more maps can be used simultaneously.

또한, 여러 개의 SVM이 캐쉬케이드(cascade)로 결합될 수 있다. 예를 들면, 우선 TRM을 특징 벡터로 사용하는 SVM이 적용될 수 있고, 다음으로 STM 및 직전의 SVM의 결과를 특징 벡터로 사용하는 SVM이 적용될 수 있다. 또한, 마지막으로 WSTIM 및 직전의 SVM의 결과를 특징 벡터로 사용하는 SVM이 적용될 수 있다.Additionally, multiple SVMs can be combined in a cascade. For example, first, SVM using TRM as a feature vector may be applied, and then SVM using the results of STM and the immediately preceding SVM as a feature vector may be applied. Additionally, finally, an SVM that uses the results of WSTIM and the previous SVM as a feature vector can be applied.

최적의 프레임 율의 결정Determination of optimal frame rate

이하에서는 SURP(300)를 이용하여 기본 결정 단위 별로 최적 프레임 율을 결정하고, 결정된 최적 프레임 율을 사용하여 비디오에 대한 최적의 부호화를 수행하는 방법이 설명된다.Below, a method of determining the optimal frame rate for each basic decision unit using SURP (300) and performing optimal encoding of the video using the determined optimal frame rate will be described.

SURP(300)는 비디오의 기본 결정 단위에 대해서 주어진 최소의 화질 만족도 또는 기정의된 화질 만족도의 조건을 충족시키는 기본 결정 단위의 최소의 FPS를 예측할 수 있고, 최소의 화질 만족도로 변환된 기본 결정 단위를 제공할 수 있다. SURP(300)는 최소의 화질 만족도로 변환된 기본 결정 단위를 부호화할 수 있다.
SURP (300) can predict the minimum FPS of the basic decision unit that satisfies the conditions of minimum picture quality satisfaction or predefined picture quality satisfaction for the basic decision unit of the video, and the basic decision unit converted to minimum picture quality satisfaction. can be provided. SURP (300) can encode the converted basic decision unit with minimum image quality satisfaction.

도 14는 일 예에 따른 FPS 별 화질 만족도를 나타낸다.Figure 14 shows image quality satisfaction by FPS according to an example.

전술된 것과 같이 SURP(300)는 특정한 FPS로 변환된 GOP의 화질 만족도를 예측할 수 있고, 예측된 화질 만족도를 출력할 수 있다. 전술된 기능을 응용하면, SURP(300)는 비디오의 GOP들의 각각에 대해, 복수의 FPS들의 복수의 화질 만족도들을 예측 및 출력할 수 있다.As described above, the SURP 300 can predict the image quality satisfaction of a GOP converted to a specific FPS and output the predicted image quality satisfaction. By applying the above-described function, the SURP 300 can predict and output a plurality of picture quality satisfaction values of a plurality of FPS for each of the GOPs of the video.

GOP에 대해 복수의 FPS들의 복수의 화질 만족도들을 예측함에 따라, 복수의 예측된 화질 만족도들이 부호화에 사용될 수 있다.By predicting a plurality of picture quality satisfaction levels of a plurality of FPSs for a GOP, a plurality of predicted picture quality satisfaction levels can be used for encoding.

도 14에서, 가로의 첫 줄은 비디오의 복수의 GOP들을 나타낸다. 세로의 첫 줄은 GOP의 변환된 FPS를 나타낸다. 또한, GOP 별로, 변환된 FPS에서의 화질 만족도가 %가 도시되었다.In Figure 14, the first horizontal line represents multiple GOPs of the video. The first vertical line represents the converted FPS of the GOP. Additionally, for each GOP, the % satisfaction with picture quality in the converted FPS is shown.

도 14에서 소스 비디오는 60 FPS일 수 있다.In Figure 14, the source video may be 60 FPS.

SURP(300)는 타겟 비디오의 복수의 FPS들에 대해 순차적으로 기계 학습을 수행할 수 있다. 먼저, 비디오의 FPS가 30으로 변환된 경우에 대해, SURP(300)는 기계 학습을 수행할 수 있다. 또한, 비디오의 FPS가 15로 변환된 경우에 대해, SURP(300)는 기계 학습을 수행할 수 있다. 또한, 비디오의 FPS가 7.5로 변환된 경우에 대해, SURP(300)는 기계 학습을 수행할 수 있다. 말하자면, SURP(300)는 타겟 비디오의 FPS들의 각각에 대해 기계 학습을 수행할 수 있다.SURP 300 may sequentially perform machine learning on multiple FPSs of the target video. First, when the FPS of the video is converted to 30, SURP (300) can perform machine learning. Additionally, when the FPS of the video is converted to 15, SURP (300) can perform machine learning. Additionally, when the FPS of the video is converted to 7.5, SURP (300) can perform machine learning. In other words, SURP 300 can perform machine learning on each of the FPS of the target video.

기계 학습이 된 상태에서, SURP(300)는 순차적으로 복수의 FPS들의 각 FPS에 대해서, 비디오의 GOP의 화질 만족도를 예측할 수 있다. 예를 들면, 타겟 비디오의 FPS가 30이 때, 타겟 비디오의 3 개의 GOP들의 화질 만족도는 80%, 90% 및 50%일 수 있다. 타겟 비디오의 FPS가 15일 때, 3 개의 GOP들의 화질 만족도는 70%, 60% 및 45%일 수 있다.In a machine learning state, the SURP 300 can sequentially predict the image quality satisfaction of the GOP of the video for each FPS of the plurality of FPS. For example, when the FPS of the target video is 30, the image quality satisfaction of the three GOPs of the target video may be 80%, 90%, and 50%. When the FPS of the target video is 15, the image quality satisfaction of the three GOPs may be 70%, 60%, and 45%.

이러한 예측을 통해, SURP(300)는 복수의 FPS들에 대해서 GOP의 화질 만족도들을 계산할 수 있다.Through this prediction, SURP 300 can calculate GOP picture quality satisfaction levels for a plurality of FPS.

또한, SURP(300)는 소스 비디오의 GOP의 화질 만족도는 100%인 것으로 간주할 수 있다.Additionally, the SURP 300 may consider that the image quality satisfaction of the GOP of the source video is 100%.

도 14에서, 마지막의 행은 요구사항을 나타낼 수 있다. 요구사항은 요구 화질(required pixture quality) 또는 최소 화질 만족도를 나타낼 수 있다. 말하자면, 마지막의 행은 화질 만족도가 75%의 이상이 되어야 한다는 것을 나타낼 수 있다.
In Figure 14, the last row may represent a requirement. Requirements may indicate required picture quality or minimum picture quality satisfaction. In other words, the last row may indicate that the image quality satisfaction level should be 75% or higher.

도 15는 일 예에 따른 GOP 별로 결정된 최적 프레임 율을 나타낸다.Figure 15 shows the optimal frame rate determined for each GOP according to an example.

SURP(300)는 각 GOP에 대하여, 요구사항을 충족시키는 최적의 FPS를 결정할 수 있다. 여기에서, 최적의 FPP는 요구사항의 값의 이상이면서, 가장 낮은 화질 만족도를 갖는 GOP의 FPS일 수 있다. 말하자면, 최적의 FPS는 요구사항을 충족시키는 최저의 FPS일 수 있다. 만약, 변환에 의해 생성된 타겟 비디오의 FPS들 중, 요구사항의 이상인 화질 만족도를 충족시키는 FPS가 존재하지 않을 경우, 최적의 FPS는 소스 비디오의 FPS일 수 있다.SURP (300) can determine the optimal FPS that meets the requirements for each GOP. Here, the optimal FPP may be the FPS of the GOP that is greater than the requirement value and has the lowest image quality satisfaction. That being said, the optimal FPS may be the lowest FPS that meets your requirements. If, among the FPS of the target video generated by conversion, there is no FPS that satisfies the picture quality satisfaction exceeding the requirements, the optimal FPS may be the FPS of the source video.

도 15에서는 복수의 GOP들의 각각에 대해 결정된 최적 프레임 율이 도시되었다.In Figure 15, the optimal frame rate determined for each of the plurality of GOPs is shown.

예를 들면, 도 14에 따르면, 제1 GOP의 복수의 FPS들에 대한 화질 만족도들은 80%, 70% 및 50%일 수 있다. 이 중, 요구사항의 이상이며 최저의 화질 만족도는 30 FPS에서의 80일 수 있다. 따라서, 제1 GOP의 최적 프레임 율은 30 FPS일 수 있다. 제2 GOP의 복수의 FPS들에 대한 화질 만족도들은 90%, 80% 및 50%일 수 있다. 이 중, 요구사항의 이상이며 최저의 화질 만족도는 15 FPS에서의 80일 수 있다. 따라서, 제2 GOP의 최적 프레임 율은 15 FPS일 수 있다. 제3 GOP의 복수의 FPS들에 대한 화질 만족도들은 50%, 45% 및 40%일 수 있다. 화질 만족도들 중 요구사항의 이상인 것이 존재하지 않으므로, 제3 GOP는 변환될 수 없다. 따라서, 제3 GOP의 최적 프레임 율은 소스 비디오의 FPS인 60 FPS일 수 있다.For example, according to FIG. 14, the image quality satisfaction rates for the plurality of FPSs of the first GOP may be 80%, 70%, and 50%. Among these, the minimum satisfactory image quality that exceeds the requirements may be 80 at 30 FPS. Accordingly, the optimal frame rate of the first GOP may be 30 FPS. The image quality satisfaction ratings for the plurality of FPSs of the second GOP may be 90%, 80%, and 50%. Among these, the minimum satisfactory image quality that exceeds the requirements may be 80 at 15 FPS. Accordingly, the optimal frame rate of the second GOP may be 15 FPS. The image quality satisfaction ratings for the plurality of FPSs of the third GOP may be 50%, 45%, and 40%. Since none of the picture quality satisfactions exceed the requirements, the third GOP cannot be converted. Therefore, the optimal frame rate of the third GOP may be 60 FPS, which is the FPS of the source video.

복수의 GOP들의 각 GOP는 각 GOP의 최적 프레임 율로 변환될 수 있고, 비디오의 변환된 GOP들은 부호화될 수 있다.
Each GOP of the plurality of GOPs may be converted at the optimal frame rate of each GOP, and the converted GOPs of the video may be encoded.

도 16은 일 예에 따른 최적 프레임 율을 사용하는 GOP의 부호화 방법의 흐름도이다.Figure 16 is a flowchart of a GOP encoding method using an optimal frame rate according to an example.

우선, 입력 비디오는 60 FPS인 것으로 예시된다. 입력 비디오의 GOP들이 순차적으로 단계들(1610, 1620, 1630, 1640, 1650 및 1660)에 의해 처리될 수 있다.First, the input video is illustrated to be 60 FPS. GOPs of the input video may be processed by steps 1610, 1620, 1630, 1640, 1650, and 1660 sequentially.

단계(1610)에서, SURP(300)는 60 FPS의 GOP를 30 FPS의 GOP로 변환할 수 있고, 30 FPS의 GOP의 화질 만족도를 계산할 수 있다.In step 1610, the SURP 300 can convert the 60 FPS GOP into the 30 FPS GOP and calculate the image quality satisfaction of the 30 FPS GOP.

단계(1620)에서, SURP(300)는 60 FPS의 GOP를 15 FPS의 GOP로 변환할 수 있고, 15 FPS의 GOP의 화질 만족도를 계산할 수 있다.In step 1620, SURP 300 can convert the 60 FPS GOP into a 15 FPS GOP and calculate the image quality satisfaction of the 15 FPS GOP.

단계(1630)에서, SURP(300)는 60 FPS의 GOP를 7.5 FPS의 GOP로 변환할 수 있고, 7.5 FPS의 GOP의 화질 만족도를 계산할 수 있다.In step 1630, the SURP 300 can convert the 60 FPS GOP into the 7.5 FPS GOP and calculate the image quality satisfaction of the 7.5 FPS GOP.

단계(1640)에서, SURP(300)는 최적 프레임 율을 결정할 수 있다. At step 1640, SURP 300 may determine the optimal frame rate.

SURP(300)는 요구 화질을 수신할 수 있다. 요구 화질은 전술된 최소 화질 만족도를 나타낼 수 있다.SURP 300 can receive the required image quality. The required image quality may represent the minimum image quality satisfaction described above.

SURP(300)는 기정의된 FPS로 변경된 GOP들 중 최소 화질 만족도를 충족시키는 최적의 GOP를 선택할 수 있다. 여기에서, 최적의 GOP는 가장 낮은 FPS를 갖는 GOP일 수 있다.SURP 300 can select the optimal GOP that satisfies the minimum picture quality satisfaction among GOPs changed to a predefined FPS. Here, the optimal GOP may be the GOP with the lowest FPS.

또는, SURP(300)는 변경된 GOP들의 FPS들 중 최소 화질 만족도를 충족시키는 최적 프레임 율을 결정할 수 있다. SURP(300)는 변경된 GOP들의 화질 만족도들 중 최소 화질 만족도의 이상이면서, 가장 작은 화질 만족도를 갖는 GOP 또는 상기의 GOP의 FPS를 선택할 수 있다. 최적 프레임 율은 선택된 FPS 또는 선택된 GOP의 FPS일 수 있다.Alternatively, SURP 300 may determine the optimal frame rate that satisfies the minimum picture quality satisfaction among the FPS of the changed GOPs. The SURP 300 may select the GOP with the smallest image quality satisfaction or the FPS of the GOP that is greater than or equal to the minimum image quality satisfaction among the image quality satisfactions of the changed GOPs. The optimal frame rate may be the selected FPS or the FPS of the selected GOP.

단계(1650)에서, SURP(300)는 최적 프레임 율의 GOP의 FPS를 선택할 수 있다. SURP(300)는 GOP의 FPS를 최적 프레임 율로 변환할 수 있다. 또는, SURP(300)는 입력 비디오의 GOP 및 단계들(1610, 1620 및 1630)에 의해 변경된 GOP들 중 최적 프레임 율의 GOP를 선택할 수 있다.At step 1650, SURP 300 may select the FPS of the optimal frame rate GOP. SURP (300) can convert the FPS of the GOP to the optimal frame rate. Alternatively, SURP 300 may select a GOP of the optimal frame rate among the GOP of the input video and the GOPs changed by steps 1610, 1620, and 1630.

단계(1660)에서, SURP(300)는 단계(1650)에서 선택된 GOP의 부호화를 수행할 수 있다.
In step 1660, SURP 300 may perform encoding of the GOP selected in step 1650.

도 17은 일 예에 따른 최적 프레임 율의 결정 방법의 흐름도이다.17 is a flowchart of a method for determining an optimal frame rate according to an example.

도 16을 참조하여 전술된 단계(1640)는 아래의 단계들(1710, 1720 및 1730)을 포함할 수 있다.Step 1640 described above with reference to FIG. 16 may include steps 1710, 1720, and 1730 below.

SURP(300)는 변경된 GOP들에 대해, 높은 FPS의 GOP로부터 낮은 FPS의 GOP의 순서로, 순차적으로 GOP가 요구 화질을 충족시키는지 여부를 검사할 수 있다. SURP(300)는 요구 화질을 충족시키지 못하는 GOP의 바로 이전의 GOP를 선택할 수 있다.The SURP 300 may sequentially check whether the GOP satisfies the required picture quality for the changed GOPs, in the order from the high FPS GOP to the low FPS GOP. SURP 300 may select the GOP immediately preceding the GOP that does not meet the required image quality.

변경된 GOP들 중 첫 번째의 GOP가 요구 화질을 충족시키지 못하는 경우 소스 비디오의 GOP가 선택될 수 있다. 말하자면, 변경된 GOP들 중 어떤 GOP도 요구 화질을 충족시키지 못하는 경우 소스 비디오의 GOP가 선택될 수 있다.If the first GOP among the changed GOPs does not meet the required picture quality, the GOP of the source video may be selected. In other words, if none of the changed GOPs meets the required picture quality, the GOP of the source video may be selected.

변경된 GOP들 중 마지막의 GOP가 요구 화질을 충족시키지 못하는 경우 마지막의 GOP가 선택될 수 있다. 말하자면, 변경된 GOP들 중 모든 GOP들이 요구 화질을 충족시키는 경우 가장 낮은 FPS의 GOP가 선택될 수 있다.If the last GOP among the changed GOPs does not meet the required image quality, the last GOP may be selected. In other words, if all GOPs among the changed GOPs meet the required picture quality, the GOP with the lowest FPS can be selected.

단계(1710)에서, SURP(300)는 30 FPS의 GOP가 요구 화질을 충족시키는지 여부를 검사할 수 있다. 30 FPS의 GOP가 요구 화질을 충족시키지 못하는 경우 SURP(300)는 60 FPS를 최적 프레임 율로 선택할 수 있고, 60 FPS의 GOP를 제공할 수 있다. 30 FPS의 GOP가 요구 화질을 충족시키는 경우 단계(1720)가 수행될 수 있다.In step 1710, SURP 300 may check whether the GOP of 30 FPS meets the required picture quality. If the GOP of 30 FPS does not meet the required picture quality, SURP (300) can select 60 FPS as the optimal frame rate and provide the GOP of 60 FPS. Step 1720 may be performed if the GOP of 30 FPS meets the required picture quality.

단계(1720)에서, SURP(300)는 15 FPS의 GOP가 요구 화질을 충족시키는지 여부를 검사할 수 있다. 15 FPS의 GOP가 요구 화질을 충족시키지 못하는 경우 SURP(300)는 30 FPS를 최적 프레임 율로 선택할 수 있고, 30 FPS의 GOP를 제공할 수 있다. 15 FPS의 GOP가 요구 화질을 충족시키는 경우 단계(1730)가 수행될 수 있다.In step 1720, SURP 300 may check whether the GOP of 15 FPS satisfies the required picture quality. If the GOP of 15 FPS does not meet the required picture quality, the SURP 300 can select 30 FPS as the optimal frame rate and provide the GOP of 30 FPS. Step 1730 may be performed if the GOP of 15 FPS satisfies the required picture quality.

단계(1730)에서, SURP(300)는 7.5 FPS의 GOP가 요구 화질을 충족시키는지 여부를 검사할 수 있다. 7.5 FPS의 GOP가 요구 화질을 충족시키지 못하는 경우 SURP(300)는 15 FPS를 최적 프레임 율로 선택할 수 있고, 15 FPS의 GOP를 제공할 수 있다. 7.5 FPS의 GOP가 요구 화질을 충족시키는 경우 SURP(300)는 7.5 FPS를 최적 프레임 율로 선택할 수 있고, 7.5 FPS의 GOP를 제공할 수 있다.
In step 1730, SURP 300 may check whether the GOP of 7.5 FPS satisfies the required picture quality. If the GOP of 7.5 FPS does not meet the required picture quality, SURP (300) can select 15 FPS as the optimal frame rate and provide a GOP of 15 FPS. If the GOP of 7.5 FPS satisfies the required image quality, the SURP 300 can select 7.5 FPS as the optimal frame rate and provide the GOP of 7.5 FPS.

도 18은 일 예에 따른 시간적 식별자(temporal identifier)를 갖는 프레임들의 계층적인 구조를 나타낸다.Figure 18 shows a hierarchical structure of frames with temporal identifiers according to an example.

도 18에서는, 9개의 프레임들이 도시되었다. 9개의 프레임들은 "프레임 0" 내지 "프레임 8"일 수 있다.In Figure 18, nine frames are shown. The nine frames may be “Frame 0” to “Frame 8.”

각 프레임들은 시간적 식별자(Temporal Identifier; TI)를 가질 수 있다. 또한, 각 프레임은 I 프레임, P 프레임 또는 B 프레임일 수 있다.Each frame may have a temporal identifier (TI). Additionally, each frame may be an I frame, P frame, or B frame.

전술된 것과 같이, SURP(300)를 통해 복수의 FPS들로 GOP가 변환된 경우에, 복수의 변환된 GOP들의 화질 만족도가 예측될 수 있다. 비디오의 부호화에 의해 생성된 비트스트림이 화질 만족도와 관련된 화질 만족도 정보를 포함할 경우, 화질 만족도를 유지하면서, 선택적인 시간적 스케일러빌리티(Temporal Scalability; TS)가 지원되는 개선된 비트스트림이 제공될 수 있다.As described above, when a GOP is converted into a plurality of FPS through the SURP 300, the image quality satisfaction of the plurality of converted GOPs can be predicted. If the bitstream generated by video encoding includes picture quality satisfaction information related to picture quality satisfaction, an improved bitstream supporting optional temporal scalability (TS) can be provided while maintaining picture quality satisfaction. there is.

에이치이브이씨(High Efficiency Video Coding; HEVC) 규격에서는, 각 프레임의 TI는 엔에이엘(Network Abstraction Layer; NAL) 헤더를 통해 전송될 수 있다. 예를 들면, NAL 헤더의 TI 정보 "nuh_temporal_id_plus1"은 프레임의 TI를 나타낼 수 있다.In the High Efficiency Video Coding (HEVC) standard, the TI of each frame can be transmitted through a Network Abstraction Layer (NAL) header. For example, TI information “nuh_temporal_id_plus1” in the NAL header may indicate the TI of the frame.

NAL 헤더의 시간적 식별자 정보를 통해, 화질 만족도 정보를 제공하는 TS가 제공될 수 있다.A TS that provides picture quality satisfaction information can be provided through temporal identifier information in the NAL header.

GOP의 크기가 8인 경우, 도 18의 계층적인 예측 구조가 사용될 때, TI의 값이 3인 홀수 번호의 프레임들이 GOP에서 제거될 경우, GOP의 FPS가 1/2이 될 수 있다. 즉, GOP의 FPS가 60에서 30으로 변경될 수 있다.When the size of the GOP is 8 and the hierarchical prediction structure of FIG. 18 is used, if odd-numbered frames with a TI value of 3 are removed from the GOP, the FPS of the GOP may be 1/2. In other words, the FPS of GOP can be changed from 60 to 30.

추가로, TI의 값이 2인 "프레임 2" 및 "프레임 6"이 제거될 경우, GOP의 FPS가 1/4이 될 수 있다. 즉, GOP의 FPS가 60에서 15로 변경될 수 있다.Additionally, if “Frame 2” and “Frame 6” with a TI value of 2 are removed, the FPS of the GOP may be 1/4. In other words, the FPS of GOP can be changed from 60 to 15.

TS의 제공에 있어서, TI만이 사용될 경우 비디오는 고정된 FPS를 가질 수 있다. 예를 들면, 60 FPS의 비디오의 재생에 있어서, TI가 3인 프레임이 재생에서 제외될 경우, 비디오 전체에 대해 고정된 30 FPS가 적용될 수 있다. 이렇게, TI 만을 고려하는 TS가 사용될 경우, 비디오의 중간에 빠른 움직임의 객체가 존재할 경우, 빠른 움직임의 객체에 대해 시청자가 인지할 수 있는 큰 화질 저하가 발생할 수 있다.
In provision of TS, the video may have a fixed FPS if only TI is used. For example, in the playback of a 60 FPS video, if a frame with a TI of 3 is excluded from playback, a fixed 30 FPS may be applied to the entire video. In this way, when TS, which only considers TI, is used, if a fast-moving object exists in the middle of the video, a significant deterioration in image quality that the viewer can perceive for the fast-moving object may occur.

개선된 시간적 스케일러빌러티Improved temporal scalability

이하에서는, SURP(300)에 의해 생성된 화질 만족도 정보를 사용하여 개선된 시간적 스케일러빌리트를 제공하는 부호화 방법 및 복호화 방법이 설명된다.Below, an encoding method and a decoding method that provide improved temporal scalability using image quality satisfaction information generated by the SURP 300 will be described.

전술된 SURP(300)을 이용하면, GOP 별로 TS가 적용될 수 있다. 말하자면, 전술된 SURP(300)을 이용하면 복수의 FPS들의 각 FPS에 대해 GOP의 화질 만족도들이 예측될 수 있다. GOP의 화질 예측도들을 이용하면 화질을 유지하면서도 TS가 제공될 수 있다.Using the SURP 300 described above, TS can be applied for each GOP. In other words, by using the SURP 300 described above, GOP picture quality satisfaction levels can be predicted for each FPS of a plurality of FPS. By using GOP picture quality predictions, TS can be provided while maintaining picture quality.

SURP(300)는 TI들의 각각에 대해, 각 TI에 대응하는 화질 만족도 정보를 결정할 수 있다. 화질 만족도 정보는 화질 만족도를 나타낼 수 있다. 여기에서, 특정한 TI에 대응하는 화질 만족도 정보는 특정한 TI의 이상의 TI(들)의 프레임들이 부호화, 복호화 또는 재생에서 제거되었을 경우의 화질에 대한 화질 만족도를 나타낼 수 있다.SURP 300 may determine, for each TI, image quality satisfaction information corresponding to each TI. Image quality satisfaction information may indicate image quality satisfaction. Here, picture quality satisfaction information corresponding to a specific TI may indicate picture quality satisfaction with respect to picture quality when frames of more than TI(s) of a specific TI are removed from encoding, decoding, or playback.

예를 들면, "프레임 x"는 GOP가 어떤 FPS로 재생되는가에 따라 재생에 포함될 수도 있고, 재생에서 제외될 수도 있다. FPS에 따라서, "프레임 x"가 TS에 의해 복호화 또는 재생에서 제외될 수 있다. 이러한 경우, "프레임 x"가 재생에서 포함되지 않는 FPS들 중 최대의 FPS가 y이고, FPS가 y일 때의 화질 만족도가 z이고, "프레임 x"의 TI가 w이면, TI w에 대응하는 화질 만족도는 z일 수 있다.For example, “frame x” may be included in or excluded from playback depending on what FPS the GOP is being played at. Depending on the FPS, “frame x” may be excluded from decoding or playback by the TS. In this case, the maximum FPS among the FPSs in which “frame Satisfaction with image quality may be z.

또한, GOP에서, 동일한 TI를 갖는 프레임들은 동일한 화질 만족도 정보를 공통으로 가질 수 있다. 따라서, 요구되는 화질 만족도에 따라서, 후술될 복호화 장치(2700)는 특정한 TI를 갖는 프레임들의 재생 여부를 적응적으로 선택할 수 있다.Additionally, in GOP, frames with the same TI may have the same picture quality satisfaction information in common. Therefore, depending on the required image quality satisfaction, the decoding device 2700, which will be described later, can adaptively select whether to reproduce frames with a specific TI.

예를 들면, 요구되는 화질 만족도가 Z라는 것은, 대응하는 화질 만족도가 Z 이하인 TI의 프레임들이 재생에 포함되어야 한다는 것을 나타낼 수 있다. 또한, 이러한 결정은, 요구되는 화질 만족도가 Z일 때, Z보다 큰 화질 만족도에 대응하는 TI의 프레임들은 재생에서 제외될 수 있다는 것을 나타낼 수 있다.For example, if the required picture quality satisfaction is Z, it may indicate that frames of TI whose corresponding picture quality satisfaction is Z or less should be included in playback. Additionally, this decision may indicate that when the required image quality satisfaction is Z, frames of TI corresponding to an image quality satisfaction greater than Z may be excluded from playback.

TI에 대응하는 화질 만족도는 프레임의 접근 유닛(access unit)의 에스이아이(SEI) 메시지에 포함될 수 있으며, NAL 유닛 헤더에 포함될 수 있다. 이 때, SEI 메시지 또는 NAL 유닛 헤더는 화질 만족도 정보의 값을 직접 포함할 수 있다. 또는, SEI 메시지 또는 NAL 유닛 헤더는 화질 만족도 테이블의 인덱스의 값을 포함할 수 있다.Picture quality satisfaction corresponding to TI may be included in the SEI message of the access unit of the frame and may be included in the NAL unit header. At this time, the SEI message or NAL unit header may directly include the value of picture quality satisfaction information. Alternatively, the SEI message or NAL unit header may include the index value of the picture quality satisfaction table.

GOP에서, 동일한 TI를 갖는 프레임들은 동일한 화질 만족도 정보를 가질 수 있다. 따라서, GOP에서, 각 TI 별로, 각 TI의 첫 번째 프레임에서만 SEI를 통해 화질 만족도 정보가 부호화 장치(2400)로부터 복호화 장치(2700)로 전송될 수 있다. 또는, GOP의 첫 번째 프레임에서, 모든 TI들에 대한 화질 만족도 정보가 한 번에 전송될 수도 있다.
In GOP, frames with the same TI may have the same picture quality satisfaction information. Therefore, in the GOP, for each TI, picture quality satisfaction information can be transmitted from the encoding device 2400 to the decoding device 2700 through SEI only in the first frame of each TI. Alternatively, in the first frame of the GOP, picture quality satisfaction information for all TIs may be transmitted at once.

도 19는 일 예에 따른 화질 만족도를 포함하는 메시지를 나타낸다.Figure 19 shows a message including image quality satisfaction according to an example.

도 19에서, 프레임의 접근 유닛(access unit)의 에스이아이(SEI) 메시지는 화질 만족도 정보를 포함할 수 있다. 말하자면, 화질 만족도 정보는 프레임의 접근 유닛(access unit)의 에스이아이(Supplemental Enhancement Information; SEI) 메시지를 통해 제공될 수 있다.In FIG. 19, the SEI message of the access unit of the frame may include picture quality satisfaction information. In other words, picture quality satisfaction information can be provided through a Supplemental Enhancement Information (SEI) message of the access unit of the frame.

도 19에서, "SEI_Temporal_ID_SURP"는 화질 만족도 정보를 제공하기 위해 사용되는 데이터를 나타낼 수 있다. "Surp_value"는 화질 만족도 정보를 나타낼 수 있다.In FIG. 19, “SEI_Temporal_ID_SURP” may represent data used to provide image quality satisfaction information. “Surp_value” may indicate image quality satisfaction information.

예를 들면, "Surp_value"의 값이 70인 것은, 화질 만족도가 "70%"라는 것을 나타낼 수 있다.
For example, a value of “Surp_value” of 70 may indicate that the image quality satisfaction is “70%”.

도 20은 일 예에 따른 화질 만족도 테이블의 인덱스를 포함하는 메시지를 나타낸다.Figure 20 shows a message including an index of an image quality satisfaction table according to an example.

도 20에서, 프레임의 접근 유닛(access unit)의 에스이아이(SEI) 메시지는 화질 만족도 정보를 포함할 수 있다. 말하자면, 화질 만족도 정보는 프레임의 접근 유닛(access unit)의 에스이아이(Supplemental Enhancement Information; SEI) 메시지를 통해 제공될 수 있다.In FIG. 20, the SEI message of the access unit of the frame may include picture quality satisfaction information. In other words, picture quality satisfaction information can be provided through a Supplemental Enhancement Information (SEI) message of the access unit of the frame.

도 20에서, "SEI_Temporal_ID_SURP"는 화질 만족도 정보를 제공하기 위해 사용되는 데이터를 나타낼 수 있다. "Surp_value_idx"는 화질 만족도 테이블의 인덱스일 수 있다.In FIG. 20, “SEI_Temporal_ID_SURP” may represent data used to provide image quality satisfaction information. “Surp_value_idx” may be an index of the picture quality satisfaction table.

비디오 또는 GOP에서 사용되는 화질 만족도의 값들은 화질 만족도 테이블에서 테이블로서 미리 정의되어 있을 수 있다. 예를 들면, 화질 만족도 테이블은 {90, 60, 30}의 값들을 가질 수 있다. 이러한 값들은 90, 60 및 30의 화질 만족도들이 사용된다는 것을 나타낼 수 있다. "Surp_value_idx"의 값이 0이면, 화질 만족도 테이블 중 인덱스 0의 값, 즉 90 또는 90%가 화질 만족도의 값임을 나타낼 수 있다. "Surp_value_idx"의 값이 1이면, 화질 만족도 테이블 중 인덱스 1의 값, 즉 60 또는 60%가 화질 만족도의 값임을 나타낼 수 있다.Picture quality satisfaction values used in video or GOP may be predefined as a table in the picture quality satisfaction table. For example, the picture quality satisfaction table may have values of {90, 60, 30}. These values may indicate that image quality satisfaction levels of 90, 60, and 30 are used. If the value of "Surp_value_idx" is 0, it may indicate that the value of index 0 in the image quality satisfaction table, that is, 90 or 90%, is the value of image quality satisfaction. If the value of "Surp_value_idx" is 1, it may indicate that the value of index 1 in the image quality satisfaction table, that is, 60 or 60%, is the value of image quality satisfaction.

도 21은 일 예에 따른 화질 만족도들의 전체를 포함하는 메시지를 나타낸다.Figure 21 shows a message including all picture quality satisfaction levels according to an example.

도 21에서, 하나의 SEI 메시지는 GOP의 모든 프레임들의 모든 TI들의 화질 만족도 정보를 포함할 수 있다.In FIG. 21, one SEI message may include picture quality satisfaction information of all TIs of all frames of the GOP.

도 21에서, "Max_Temporl_ID"는 TI들의 최대의 개수를 나타낼 수 있다. "Temoporal ID[i]"는 값이 i 또는 i1인 TI의 인덱스를 나타낼 수 있다. 또는, "Temoporal ID[i]"는 i 번째 TI의 화질 만족도 정보를 나타낼 수 있다.In FIG. 21, “Max_Temporl_ID” may indicate the maximum number of TIs. “Temoporal ID[i]” may indicate the index of TI whose value is i or i1. Alternatively, “Temoporal ID[i]” may indicate image quality satisfaction information of the ith TI.

화질 만족도 정보는 화질 만족도의 값 자체를 나타내거나, 화질 만족도 테이블의 인덱스를 나타낼 수 있다. 도 21에서는, "Temoporal ID[i]"가 화질 만족도 테이블의 인덱스를 나타내는 것으로 예시되었다.
The image quality satisfaction information may represent the image quality satisfaction value itself or an index of an image quality satisfaction table. In FIG. 21, “Temoporal ID[i]” is illustrated as indicating an index of an image quality satisfaction table.

도 22는 일 예에 따른 화질 만족도 정보의 생성 방법을 도시한다.Figure 22 illustrates a method of generating image quality satisfaction information according to an example.

도 22는 일 예에 따른 화질 만족도 정보를 포함하는 GOP의 부호화 방법의 흐름도이다.Figure 22 is a flowchart of a GOP encoding method including picture quality satisfaction information according to an example.

우선, 입력 비디오는 60 FPS인 것으로 예시된다. 입력 비디오의 GOP들이 순차적으로 단계들(2210, 2220, 2230, 2240 및 2250에 의해 처리될 수 있다.First, the input video is illustrated to be 60 FPS. GOPs of input video may be processed sequentially in steps 2210, 2220, 2230, 2240, and 2250.

단계(2210)에서, SURP(300)는 60 FPS의 GOP를 30 FPS의 GOP로 변환할 수 있고, 30 FPS의 GOP의 화질 만족도를 계산할 수 있다.In step 2210, the SURP 300 can convert the 60 FPS GOP to the 30 FPS GOP and calculate the image quality satisfaction of the 30 FPS GOP.

단계(2220)에서, SURP(300)는 60 FPS의 GOP를 15 FPS의 GOP로 변환할 수 있고, 15 FPS의 GOP의 화질 만족도를 계산할 수 있다.In step 2220, the SURP 300 can convert the 60 FPS GOP into the 15 FPS GOP and calculate the image quality satisfaction of the 15 FPS GOP.

단계(2230)에서, SURP(300)는 60 FPS의 GOP를 7.5 FPS의 GOP로 변환할 수 있고, 7.5 FPS의 GOP의 화질 만족도를 계산할 수 있다.In step 2230, the SURP 300 can convert the 60 FPS GOP into the 7.5 FPS GOP and calculate the image quality satisfaction of the 7.5 FPS GOP.

단계(2240)에서, SURP(300) GOP의 프레임들에 대한 화질 만족도 정보의 부호화를 수행할 수 있다.In step 2240, encoding of picture quality satisfaction information for frames of the SURP 300 GOP may be performed.

단계(2250)에서, SURP(300)는 GOP의 부호화를 수행할 수 있다. SURP(300)의 GOP들의 복수의 프레임들의 부호화를 수행할 수 있다.In step 2250, SURP 300 may perform encoding of the GOP. Encoding of multiple frames of GOPs of SURP 300 can be performed.

도 22의 실시예에서는, GOP들의 모든 프레임들이 부호화될 수 있다. 모든 프레임들 중 어떤 프레임을 재생할 것인가는 최소 화질 만족도에 따라 부호화의 단계에서 결정될 수 있다.
In the embodiment of Figure 22, all frames of GOPs may be encoded. Which frame to reproduce among all frames can be determined at the encoding stage according to the minimum image quality satisfaction.

도 23a는 일 예에 따른 75% 이상의 화질 만족도를 유지하는 구성을 나타낸다.Figure 23a shows a configuration for maintaining image quality satisfaction of 75% or more according to an example.

도 23a에서, 사각형은 프레임을 나타낼 수 있다. 도 23a에서 제1 GOP는 "프레임 0" 내지 "프레임 7"을 포함할 수 있다. 제2 GOP는 "프레임 8" 내지 "프레임 15"를 포함할 수 있다. 사각형 내부의 "TI"는 프레임의 시간적 식별자의 값을 나타낼 수 있다. 사각형 내부의 "I", "P" 및 "B"는 프레임의 타입을 나타낼 수 있다. 사각형 위 또는 아래의 숫자는 프레임의 화질 만족도를 나타낼 수 있다.In Figure 23A, a square may represent a frame. In FIG. 23A, the first GOP may include “Frame 0” to “Frame 7.” The second GOP may include “Frame 8” to “Frame 15.” “TI” inside the rectangle may represent the value of the temporal identifier of the frame. “I,” “P,” and “B” inside the square may indicate the type of frame. The numbers above or below the square can indicate the level of satisfaction with the picture quality of the frame.

SURP(300)는 비디오의 GOP들의 각 GOP를 최소 화질 만족도를 충족시키기 위한 최소의 FPS로 변환할 수 있고, 변환된 각 GOP를 인코딩할 수 있다. 이 경우, 최소 화질 만족도를 충족시키기 위해 요구되는 최소한의 비트 율로 부호화된 비디오가 전송될 수 있다. 이러한 방식은 전송의 측면에서 유리할 수 있다.The SURP 300 can convert each GOP of video to the minimum FPS to satisfy minimum picture quality satisfaction, and encode each converted GOP. In this case, video encoded at the minimum bit rate required to meet minimum picture quality satisfaction can be transmitted. This method can be advantageous in terms of transmission.

도 23a에서는, 최소한 75%의 화질 만족도가 보장되도록 TS가 적용된 경우가 예시되었다.In Figure 23a, a case in which TS is applied to ensure image quality satisfaction of at least 75% is illustrated.

도 23a에서, 점선 내의 프레임들은 부호화 또는 전송되지 않는 프레임들을 나타낼 수 있다. 도 23a에서, 첫 번째 GOP에서는, 75%의 최소 화질 만족도가 충족되도록, 프레임들이 30 FPS로 전송될 수 있다. 또한, 두 번째 GOP에서는, 75%의 최소 화질 만족도가 충족되도록, 프레임들이 15 FPS로 전송될 수 있다.In FIG. 23A, frames within the dotted line may represent frames that are not encoded or transmitted. In Figure 23A, in the first GOP, frames may be transmitted at 30 FPS such that a minimum picture quality satisfaction of 75% is met. Additionally, in the second GOP, frames may be transmitted at 15 FPS so that a minimum image quality satisfaction of 75% is met.

SURP(300)는 TI들의 화질 만족도 정보를 비트스트림에 포함시킬 수 있다. 복호화 장치(2700)는 화질 만족도 정보를 참조하여 최소 화질 만족도를 충족시키기 위한 최소의 FPS를 결정할 수 있고, 결정된 FPS를 위해 요구되는 프레임들을 부호화 장치로부터 수신할 수 있다. 복호화기는 수신된 프레임들을 재생할 수 있다.SURP 300 may include image quality satisfaction information of TIs in the bitstream. The decoding device 2700 may determine the minimum FPS for satisfying the minimum image quality satisfaction by referring to the image quality satisfaction information, and may receive frames required for the determined FPS from the encoding device. The decoder can reproduce the received frames.

복호화 장치(2700)는 화질 만족도 정보를 참조하여 GOP에 대하여 최소 화질 만족도를 충족시키기 위한 최소의 FPS를 결정할 수 있다. 복호화 장치(2700)는 결정된 FPS를 위해 요구되는 프레임들의 복호화를 수행할 수 있다. 여기에서, 프레임들은 GOP의 프레임들일 수 있다. 결정된 FPS를 위해 요구되는 프레임들을 부호화 장치로부터 수신할 수 있다. 복호화기는 수신된 프레임들을 재생할 수 있다.The decoding device 2700 may determine the minimum FPS for satisfying the minimum picture quality satisfaction for the GOP by referring to the picture quality satisfaction information. The decoding device 2700 may perform decoding of frames required for the determined FPS. Here, the frames may be frames of a GOP. Frames required for the determined FPS can be received from the encoding device. The decoder can reproduce the received frames.

복호화 장치(2700)는 TS를 통해 비디오 또는 GOP의 프레임들 중 복호화를 수행하기로 결정된 프레임만을 선택적으로 부호화 장치(2400)로부터 획득할 수 있다.The decoding device 2700 may selectively obtain from the encoding device 2400 only the frames determined to be decoded among video or GOP frames through TS.

복호화 장치(2700)는 GOP의 프레임들 중 최소 화질 만족도의 이하의 화질 만족도를 갖는 프레임을 선택적으로 디코딩할 수 있다. 최소 화질 만족도보다 더 큰 화질 만족도를 갖는 프레임을 디코딩에서 제외함으로써 복호화 장치(2700)는 시청자가 인식하는 화질을 저하시키지 않으면서도 효율적으로 비디오의 전송 및 재생을 수행할 수 있다.
The decoding device 2700 may selectively decode frames having a picture quality satisfaction level below the minimum picture quality satisfaction level among the frames of the GOP. By excluding frames with a picture quality satisfaction greater than the minimum picture quality satisfaction from decoding, the decoding device 2700 can efficiently transmit and play video without degrading the picture quality perceived by the viewer.

도 23b는 일 예에 따른 75%의 화질 만족도를 기준으로 FPS의 변경의 여부를 결정하는 구성을 나타낸다.Figure 23b shows a configuration for determining whether to change FPS based on 75% image quality satisfaction according to an example.

도 23b에서, 사각형은 프레임을 나타낼 수 있다. 도 23b에서 제1 GOP는 "프레임 0" 내지 "프레임 7"을 포함할 수 있다. 제2 GOP는 "프레임 8" 내지 "프레임 15"를 포함할 수 있다. 사각형 내부의 "TI"는 프레임의 시간적 식별자의 값을 나타낼 수 있다. 사각형 내부의 "I", "P" 및 "B"는 프레임의 타입을 나타낼 수 있다. 사각형 위 또는 아래의 숫자는 프레임의 화질 만족도를 나타낼 수 있다.In Figure 23b, a square may represent a frame. In FIG. 23B, the first GOP may include “Frame 0” to “Frame 7.” The second GOP may include “Frame 8” to “Frame 15.” “TI” inside the rectangle may represent the value of the temporal identifier of the frame. “I,” “P,” and “B” inside the square may indicate the type of frame. The numbers above or below the square can indicate the level of satisfaction with the picture quality of the frame.

SURP(300)는 기본적으로 GOP의 FPS를 30으로 변경하되, 화질 만족도가 75%보다 작을 경우 GOP의 FPS를 변경하지 않고 소스 비디오의 FPS를 그대로 유지할 수 있다. 말하자면, SURP(300)는 최소 화질 만족도에 기반하여 적응적으로 TS를 사용할 수 있다. 이러한 기능은 화질이 중요시되는 환경에서 특히 요구될 수 있다.SURP (300) basically changes the FPS of the GOP to 30, but if the image quality satisfaction is less than 75%, the FPS of the source video can be maintained without changing the FPS of the GOP. In other words, SURP 300 can adaptively use TS based on minimum image quality satisfaction. This function may be especially required in environments where image quality is important.

도 23b에서, 첫 번째의 GOP는 30 FPS로 변환되어 부호화 및 전송될 수 있고, 두 번째의 GOP는 60 FPS를 유지한 채 부호화 및 전송될 수 있다.In Figure 23b, the first GOP can be converted to 30 FPS and encoded and transmitted, and the second GOP can be encoded and transmitted while maintaining 60 FPS.

도 24는 일 실시예에 따른 부호화 장치의 구조도이다.Figure 24 is a structural diagram of an encoding device according to an embodiment.

부호화 장치(2400)는 제어부(2410), 부호화부(2420) 및 통신부(2430)를 포함할 수 있다.The encoding device 2400 may include a control unit 2410, an encoding unit 2420, and a communication unit 2430.

제어부(2410)는 전술된 SURP(300)에 대응할 수 있다. 예를 들면, 전술된 SURP(300)의 기능은 제어부(2410)에 의해 수행될 수 있다. 또는, 제어부(2410)는 SURP(300)를 수행할 수 있다.The control unit 2410 may correspond to the SURP 300 described above. For example, the functions of the SURP 300 described above may be performed by the control unit 2410. Alternatively, the control unit 2410 may perform SURP (300).

제어부(2410)는 동영상의 프레임에 대한 선택 정보를 생성할 수 있다.The control unit 2410 can generate selection information about frames of a video.

선택 정보는 전술된 화질 만족도 정보에 대응할 수 있다. 또는, 선택 정보는 화질 만족도 정보를 포함할 수 있다.The selection information may correspond to the above-described image quality satisfaction information. Alternatively, the selection information may include image quality satisfaction information.

선택 정보는 프레임이 비디오의 부호화에서 제외되더라도 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.Selection information may be related to the percentage of people who do not notice a decrease in video quality even if a frame is excluded from the encoding of the video.

또는, 선택 정보는 프레임이 복호화에서 제외되도록 동영상의 재생이 설정되더라도 재생되는 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.Alternatively, the selection information may be related to the percentage of people who do not notice a decrease in the image quality of the played video even if the playback of the video is set to exclude frames from decoding.

선택 정보는 복수일 수 있다. 복수의 선택 정보는 동영상 또는 기본 결정 단위의 재생의 FPS 별로 계산될 수 있다. 제어부(2410)는 복수의 FPS들의 선택 정보를 계산할 수 있다.The selection information may be plural. A plurality of selection information may be calculated for each FPS of video or basic decision unit playback. The control unit 2410 may calculate selection information for a plurality of FPS.

동영상의 화질의 저하를 인지하지 못하는 사람의 비율은 제어부(2410)의 기계 학습에 의해 계산될 수 있다. 전술된 것과 같이 동영상의 화질의 저하를 인지하지 못하는 사람의 비율은 프레임 또는 프레임을 포함하는 기본 결정 단위의 특징 벡터에 기반하여 결정될 수 있다.The percentage of people who do not recognize the decrease in video quality can be calculated by machine learning of the control unit 2410. As described above, the proportion of people who do not recognize the deterioration in video quality can be determined based on the frame or the feature vector of the basic decision unit including the frame.

선택 정보는 프레임을 포함하는 복수의 프레임들 중, 프레임의 TI와 동일한 TI를 갖는 다른 프레임에게 공통적으로 적용될 수 있다. 복수의 프레임들은 기본 결정 단위의 프레임들일 수 있다. 또는, 복수의 프레임들은 GOP의 프레임들일 수 있다.The selection information may be commonly applied to other frames having the same TI as the TI of the frame among a plurality of frames including the frame. The plurality of frames may be frames of the basic decision unit. Alternatively, the plurality of frames may be frames of a GOP.

제어부(2410)는 생성된 선택 정보에 기반하여 동영상의 부호화를 수행할 수 있다.The control unit 2410 may perform encoding of the video based on the generated selection information.

예를 들면, 제어부(2410)는 선택 정보의 부호화를 수행할 수 있고, 부호화된 선택 정보를 부호화된 동영상의 비트스트림에 포함시킬 수 있다.For example, the control unit 2410 can perform encoding of selection information and include the encoded selection information in the bitstream of the encoded video.

예를 들면, 제어부(2410)는 선택 정보에 기반하여 동영상의 프레임들 중 부호화할 프레임을 선택할 수 있다.For example, the control unit 2410 may select a frame to be encoded among frames of a video based on selection information.

부호화부(2420)는 전술된 부호화 장치(100)에 대응할 수 있다. 예를 들면, 전술된 부호화 장치(100)의 기능은 부호화부(2420)에 의해 수행될 수 있다. 또는, 부호화부(2420)는 부호화 장치(100)를 포함할 수 있다.The encoder 2420 may correspond to the above-described encoding device 100. For example, the above-described function of the encoding device 100 may be performed by the encoding unit 2420. Alternatively, the encoder 2420 may include the encoder 100.

부호화부(2420)는 동영상의 프레임의 부호화를 수행할 수 있다.The encoder 2420 can encode frames of a video.

예를 들면, 부호화부(2420)은 동영상의 전체의 프레임들의 부호화를 수행할 수 있다.For example, the encoder 2420 may encode all frames of a video.

예를 들면, 부호화부(2420)는 제어부(2410)에 의해 선택된 프레임의 부호화를 수행할 수 있다.For example, the encoder 2420 may encode a frame selected by the control unit 2410.

통신부(2430)는 생성된 비트스트림을 복호화 장치(2700)로 전송할 수 있다.The communication unit 2430 may transmit the generated bitstream to the decoding device 2700.

비트스트림은 부호화된 동영상의 정보를 포함할 수 있고, 부호화된 선택 정보를 포함할 수 있다.The bitstream may include encoded video information and may include encoded selection information.

제어부(2410), 부호화부(2420) 및 통신부(2430)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.
The functions and operations of the control unit 2410, the encoder 2420, and the communication unit 2430 are described in detail below.

도 25는 일 실시예에 따른 부호화 방법의 흐름도이다.Figure 25 is a flowchart of an encoding method according to an embodiment.

단계(2510)에서, 제어부(2410)는 동영상의 프레임에 대한 선택 정보를 생성할 수 있다.In step 2510, the controller 2410 may generate selection information for a frame of a video.

단계(2520)에서, 제어부(2410)는 선택 정보에 기반하여 동영상의 부호화를 수행할 수 있다.In step 2520, the controller 2410 may encode the video based on the selection information.

부호화부(2420)는 동영상의 프레임의 부호화를 수행할 수 있다. 부호화부(2420)는 동영상의 프레임의 부호화를 수행함으로써 비트스트림을 생성할 수 있다. 동영상의 부호화에 의해 생성된 비트스트림은 선택 정보를 포함할 수 있다.The encoder 2420 can encode frames of a video. The encoder 2420 can generate a bitstream by encoding frames of a video. A bitstream generated by encoding a video may include selection information.

단계(2530)에서, 통신부(2430)는 부호화된 동영상의 정보를 포함하는 비트스트림을 복호화 장치(2700)로 전송할 수 있다.
In step 2530, the communication unit 2430 may transmit a bitstream including information about the encoded video to the decoding device 2700.

도 26은 일 예에 따른 동영상의 부호화를 수행하는 방법의 흐름도이다.Figure 26 is a flowchart of a method for encoding a video according to an example.

도 25를 참조하여 전술된 단계(2520)은 아래의 단계들(2610 및 2620)을 포함할 수 있다.Step 2520 described above with reference to FIG. 25 may include steps 2610 and 2620 below.

단계(2610)에서, 제어부(2410)는 프레임에 대한 선택 정보에 기반하여 프레임의 부호화 여부의 결정을 수행할 수 있다.In step 2610, the control unit 2410 may determine whether to encode the frame based on selection information about the frame.

단계(2620)에서, 프레임의 부호화가 결정된 경우, 부호화부(2420)는 프레임의 부호화를 수행할 수 있다.In step 2620, when the encoding of the frame is determined, the encoder 2420 may perform encoding of the frame.

프레임의 부호화가 결정된 경우, 제어부(2410)는 부호화부(2420)에게 프레임의 부호화를 요청할 수 있다.When the encoding of the frame is determined, the control unit 2410 may request the encoding unit 2420 to encode the frame.

프레임의 부호화 여부의 결정은 프레임을 포함하는 복수의 프레임들 중 프레임의 TI와 동일한 TI를 갖는 다른 프레임에게 공통적으로 적용될 수 있다. 여기에서, 복수의 프레임들은 기본 결정 단위의 프레임들일 수 있다. 또는, 복수의 프레임들은 GOP의 프레임들일 수 있다.
The determination of whether to encode a frame may be commonly applied to other frames having the same TI as the TI of the frame among a plurality of frames including the frame. Here, the plurality of frames may be frames of the basic decision unit. Alternatively, the plurality of frames may be frames of a GOP.

도 27은 일 실시예에 따른 복호화 장치의 구조도이다.Figure 27 is a structural diagram of a decoding device according to an embodiment.

복호화 장치(2700)는 제어부(2710), 복호화부(2720) 및 통신부(2730)를 포함할 수 있다.The decoding device 2700 may include a control unit 2710, a decoding unit 2720, and a communication unit 2730.

통신부(2730)는 부호화 장치(2400)로부터 비트스트림을 수신할 수 있다.The communication unit 2730 may receive a bitstream from the encoding device 2400.

비트스트림은 부호화된 동영상의 정보를 포함할 수 있다. 부호화된 동영상의 정보는 부호화된 프레임의 정보를 포함할 수 있다. 비트스트림은 프레임의 선택 정보를 포함할 수 있다.The bitstream may include information about encoded video. Information on the encoded video may include information on the encoded frame. The bitstream may include frame selection information.

제어부(2710)는 프레임의 선택 정보에 기반하여 프레임의 복호화 여부의 결정을 수행할 수 있다.The control unit 2710 may determine whether to decode a frame based on frame selection information.

제어부(2710)는 전술된 SURP(300)의 기능 중 복호화에 적용 가능한 기능을 수행할 수 있다. 또는, 제어부(2710)는 SURP(300)의 적어도 일부를 포함할 수 있다.The control unit 2710 may perform functions applicable to decoding among the functions of the SURP 300 described above. Alternatively, the control unit 2710 may include at least a portion of the SURP (300).

복호화부(2720)는 전술된 복호화 장치(200)에 대응할 수 있다. 예를 들면, 전술된 복호화 장치(200)의 기능은 복호화부(2720)에 의해 수행될 수 있다. 또는, 복호화부(2720)는 복호화 장치(200)를 포함할 수 있다.The decoding unit 2720 may correspond to the decoding device 200 described above. For example, the functions of the above-described decoding device 200 may be performed by the decoding unit 2720. Alternatively, the decoding unit 2720 may include the decoding device 200.

프레임의 복호화가 결정된 경우, 복호화부(2720)는 프레임의 복호화를 수행할 수 있다.If decoding of the frame is determined, the decoder 2720 may perform decoding of the frame.

예를 들면, 복호화부(2720)는 동영상의 전체의 프레임들의 복호화를 수행할 수 있다.For example, the decoder 2720 may perform decoding of all frames of a video.

예를 들면, 복호화부(2720)는 제어부(2710)에 의해 선택된 프레임의 복호화를 수행할 수 있다.For example, the decoding unit 2720 may perform decoding of a frame selected by the control unit 2710.

제어부(2710), 복호화부(2720) 및 통신부(2730)의 기능 및 동작에 대해서 아래에서 상세하게 설명된다.
The functions and operations of the control unit 2710, decoding unit 2720, and communication unit 2730 are described in detail below.

도 28은 일 실시예에 따른 복호화 방법의 흐름도이다.Figure 28 is a flowchart of a decryption method according to an embodiment.

단계(2810)에서, 통신부(2730)은 비트 스트림을 수신할 수 있다. 통신부(2730)는 부호화된 프레임의 정보를 수신할 수 있다.In step 2810, the communication unit 2730 may receive a bit stream. The communication unit 2730 can receive encoded frame information.

단계(2820)에서, 제어부(2710)는 프레임의 선택 정보에 기초하여 프레임의 복호화 여부의 결정을 수행할 수 있다.In step 2820, the control unit 2710 may determine whether to decode the frame based on frame selection information.

상기의 결정은 비디오의 복수의 프레임들의 각 프레임에 대하여 수행될 수 있다. 여기에서, 복수의 프레임들은 기본 결정 단위의 프레임들일 수 있다. 또는, 복수의 프레임들은 GOP의 프레임들일 수 있다.The above determination may be performed for each frame of a plurality of frames of video. Here, the plurality of frames may be frames of the basic decision unit. Alternatively, the plurality of frames may be frames of a GOP.

복수의 프레임들화질 만족도 정보선택 정보는 프레임이 복호화에서 제외되도록 프레임을 포함하는 동영상의 재생이 결정되더라도 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.The plurality of frames image quality satisfaction information selection information may be related to the proportion of people who do not recognize the deterioration in the image quality of the video even if playback of the video including the frame is determined so that the frame is excluded from decoding.

또는, 선택 정보는 프레임이 복호화에서 제외되도록 프레임을 포함하는 동영상의 재생의 FPS가 결정되더라도 동영상의 화질의 저하를 인지하지 못하는 사람의 비율과 관련될 수 있다.Alternatively, the selection information may be related to the proportion of people who do not recognize the deterioration in the image quality of the video even if the FPS of playback of the video including the frame is determined so that the frame is excluded from decoding.

선택 정보는 프레임에 대한 SEI 내에 포함될 수 있다.Selection information may be included within the SEI for the frame.

프레임의 복호화 여부는 선택 정보의 값 및 재생 정보의 값 간의 비교에 기반하여 결정될 수 있다.Whether or not to decode a frame may be determined based on comparison between the value of selection information and the value of reproduction information.

재생 정보는 프레임을 포함하는 동영상의 재생과 관련된 정보일 수 있다. 재생 정보는 최소 화질 만족도에 대응할 수 있다. 또는, 재생 정보는 최소 화질 만족도를 포함할 수 있다.Playback information may be information related to playback of a video including frames. Playback information may correspond to minimum image quality satisfaction. Alternatively, the reproduction information may include the minimum image quality satisfaction level.

재생 정보는 프레임을 포함하는 동영상의 재생의 FPS와 관련된 정보일 수 있다.The playback information may be information related to the FPS of playback of a video including frames.

예를 들면, 선택 정보의 값이 재생 정보의 값보다 더 크면, 제어부(2710)는 프레임에 대해 복호화를 수행하지 않는 것을 결정할 수 있다. 또는, 선택 정보의 값이 재생 정보의 이상이면 제어부(2710)는 프레임에 대해 복호화가 수행할 것을 결정할 수 있다.For example, if the value of the selection information is greater than the value of the reproduction information, the control unit 2710 may decide not to perform decoding on the frame. Alternatively, if the value of the selection information is greater than that of the reproduction information, the control unit 2710 may determine whether to perform decoding on the frame.

결정은 프레임을 포함하는 복수의 프레임들 중 프레임의 TI와 동일한 TI를 갖는 다른 프레임에게 공통적으로 적용될 수 있다. 복수의 프레임들은 기본 예측 단위의 프레임들일 수 있다. 또는, 복수의 프레임들은 GOP의 프레임들일 수 있다.The decision may be commonly applied to other frames having the same TI as the TI of the frame among a plurality of frames including the frame. The plurality of frames may be frames of a basic prediction unit. Alternatively, the plurality of frames may be frames of a GOP.

복호화 장치(2700)복수의 프레임들임의복호화 장치(2700)는 비디오의 전체의 프레임들 중 복호화하기로 결정된 임의0Decoding device 2700: A plurality of frames. The random decoding device 2700 determines a random 0 to be decoded among all frames of the video.

단계(2830)에서, 프레임의 복호화가 결정된 경우, 복호화부(2720)는 복호화가 결정된 프레임에 대하여 프레임의 복호화를 수행할 수 있다.In step 2830, when decoding of the frame is determined, the decoder 2720 may perform decoding of the frame for which decoding has been determined.

복호화 장치(2700)Decryption device (2700)

상술한 실시예들에서, 방법들은 일련의 단계 또는 유닛으로서 순서도를 기초로 설명되고 있으나, 본 발명은 단계들의 순서에 한정되는 것은 아니며, 어떤 단계는 상술한 바와 다른 단계와 다른 순서로 또는 동시에 발생할 수 있다. 또한, 당해 기술 분야에서 통상의 지식을 가진 자라면 순서도에 나타난 단계들이 배타적이지 않고, 다른 단계가 포함되거나, 순서도의 하나 또는 그 이상의 단계가 본 발명의 범위에 영향을 미치지 않고 삭제될 수 있음을 이해할 수 있을 것이다.In the above-described embodiments, the methods are described based on flowcharts as a series of steps or units, but the present invention is not limited to the order of steps, and some steps may occur in a different order or simultaneously with other steps as described above. You can. Additionally, a person of ordinary skill in the art will recognize that the steps shown in the flowchart are not exclusive and that other steps may be included or one or more steps in the flowchart may be deleted without affecting the scope of the present invention. You will understand.

이상 설명된 본 발명에 따른 실시예들은 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CDROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기광 매체(magnetooptical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Embodiments according to the present invention described above may be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present invention, or may be known and usable by those skilled in the computer software field. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CDROMs and DVDs, magnetooptical media such as floptical disks, and ROM. It includes specially configured hardware devices to store and execute program instructions, such as RAM, flash memory, etc. Examples of program instructions include not only machine language code such as that created by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform processing according to the invention and vice versa.

이상에서 본 발명이 구체적인 구성요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명이 상기 실시예들에 한정되는 것은 아니며, 본 발명이 속하는 기술분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형을 꾀할 수 있다.In the above, the present invention has been described with specific details such as specific components and limited embodiments and drawings, but this is only provided to facilitate a more general understanding of the present invention, and the present invention is not limited to the above embodiments. , a person skilled in the art to which the present invention pertains can make various modifications and variations from this description.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등하게 또는 등가적으로 변형된 모든 것들은 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and the scope of the patent claims described below as well as all modifications equivalent to or equivalent to the scope of the claims fall within the scope of the spirit of the present invention. They will say they do it.

Claims

determining whether to decode the frame based on perceived image quality information about the perceived image quality of the frame; and
If decoding of the frame is determined, performing decoding of the frame
Including,
The perceived picture quality information is related to the proportion of people who do not recognize the deterioration in picture quality of the video even if playback of the video including the frame is set to exclude the frame from decoding,
A video decoding method where the ratio is determined based on machine learning.

According to paragraph 1,
The decision is performed for each frame of the plurality of frames,
The plurality of frames are frames of a group of pictures (GOP),
The machine learning is a video decoding method that uses subjective image quality evaluation using homogeneous video composed of GOPs with similar characteristics.

determining whether to decode the frame based on perceived image quality information about the perceived image quality of the frame; and
If decoding of the frame is determined, performing decoding of the frame
Including,
The perceived picture quality information is related to the proportion of people who do not recognize the deterioration in picture quality of the video even if playback of the video including the frame is set to exclude the frame from decoding,
The ratio is determined based on the feature vector of the frame,
A video decoding method in which the feature vector is extracted by a feature vector extraction method that reflects human visual characteristics.

According to paragraph 3,
A video decoding method wherein the human visual characteristics are spatial masking effect, temporal masking effect, salient area, or contrast sensitivity.

According to paragraph 1,
A video decoding method in which the perceived image quality is measured using a measure that models the perceived image quality.

According to clause 5,
A video decoding method where the measure is structural similarity (SSIM) or video quality metric (VQM).

According to paragraph 1,
If the value of the perceived picture quality information is greater than the value of the reproduction information, it is determined that the decoding is not performed on the frame,
The playback information is information related to playback of the video including the frame.

According to paragraph 1,
The perceived image quality information is related to the proportion of people who do not recognize the deterioration in the image quality of the video even if the frames per second (FPS) of the video including the frame are determined so that the frame is excluded from decoding,
The machine learning is a video decoding method used to measure the degree of image quality degradation caused by changes in the FPS.

According to paragraph 1,
A video decoding method in which the determination of whether to decode a video is commonly applied to other frames having the same temporal identifier as that of the frame among a plurality of frames including the frame.

a control unit that determines whether or not to decode the frame based on perceived image quality information about the perceived image quality of the frame; and
A decoding unit that performs decoding of the frame when decoding of the frame is determined.
Including,
The perceived picture quality information is related to the proportion of people who do not recognize the deterioration in picture quality of the video even if playback of the video including the frame is set to exclude the frame from decoding,
The ratio is determined based on machine learning or feature vectors of the frame,
The perceived image quality is measured using a measure that models the perceived image quality,
The measure is Structural SIMilarity (SSIM) or Video Quality Metric (VQM),
A video decoding device in which the feature vector is extracted by a feature vector extraction method that reflects human visual characteristics.

Generating perceived image quality information for a frame of a video; and
Performing encoding of the video based on the perceived image quality information
Including,
The perceived picture quality information is related to the proportion of people who do not recognize the deterioration in picture quality of the video even if playback of the video including the frame is set to exclude the frame from decoding,
Video encoding method where the ratio is determined based on machine learning

According to clause 11,
The step of encoding the video is,
determining whether to encode the frame based on the perceived picture quality information for the frame; and
When encoding of the frame is determined, performing encoding of the frame
A video encoding method including.

According to clause 12,
A video encoding method in which the determination of whether to encode a video is commonly applied to other frames having the same temporal identifier as that of the frame among a plurality of frames including the frame.

According to clause 11,
A video encoding method in which the perceived image quality is measured using a measure that models the perceived image quality.

According to clause 14,
A video encoding method where the measure is structural similarity (SSIM) or video quality metric (VQM).

Generating perceived image quality information for a frame of a video; and
Performing encoding of the video based on the perceived image quality information
Including,
The perceived picture quality information is related to the proportion of people who do not recognize the deterioration in picture quality of the video even if playback of the video including the frame is set to exclude the frame from decoding,
The ratio is determined based on the feature vector of the frame,
A video encoding method in which the feature vector is extracted by a feature vector extraction method that reflects human visual characteristics.

According to clause 11,
The perceived image quality information is plural,
A video encoding method in which the plurality of perceived image quality information is calculated for each frame per second (FPS) of video playback.

According to clause 11,
A video encoding method wherein the bitstream generated by encoding the video includes the perceived image quality information.

According to clause 11,
A video encoding method in which the perceived image quality information is commonly applied to other frames having the same temporal identifier as that of the frame among a plurality of frames including the frame.

According to clause 16,
A video encoding method in which the human visual characteristics are spatial masking effect, temporal masking effect, salient area, or contrast sensitivity.