KR20230140266A

KR20230140266A - Method and system for optimizing video encoding using double buffering in single encoding structure

Info

Publication number: KR20230140266A
Application number: KR1020220039197A
Authority: KR
Inventors: 박근백; 왕희돈; 김재훈; 강인철; 조성택; 김성호; 장준기
Original assignee: 네이버 주식회사
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2023-10-06

Abstract

단일 인코딩 구조에서 더블 버퍼링을 이용한 동영상 인코딩 최적화 방법 및 시스템을 개시한다. 일실시예에 따른 인코딩 최적화 방법은 입력 영상의 첫 번째 세그먼트에 대응하는 프레임 이미지들을 제1 세그먼트 버퍼에 저장하는 단계, 상기 제1 세그먼트 버퍼에 상기 첫 번째 세그먼트에 대응하는 프레임 이미지들의 저장이 완료됨에 응답하여 저장 버퍼를 제2 세그먼트 버퍼로 변경하여 두 번째 세그먼트에 대응하는 프레임 이미지들을 상기 제2 세그먼트 버퍼에 저장하는 단계, 상기 제1 세그먼트 버퍼에 저장된 프레임 이미지들에 대한 인코딩 옵션을 예측하는 단계 및 상기 예측된 인코딩 옵션 및 상기 제1 세그먼트 버퍼에 저장된 프레임 이미지들을 인코더로 전달하는 단계를 포함할 수 있다.Disclosed is a video encoding optimization method and system using double buffering in a single encoding structure. An encoding optimization method according to an embodiment includes the steps of storing frame images corresponding to the first segment of an input image in a first segment buffer, and when storage of the frame images corresponding to the first segment in the first segment buffer is completed. In response, changing the storage buffer to a second segment buffer to store frame images corresponding to a second segment in the second segment buffer, predicting encoding options for frame images stored in the first segment buffer, and It may include transmitting the predicted encoding option and frame images stored in the first segment buffer to an encoder.

Description

Video encoding optimization method and system using double buffering in a single encoding structure {METHOD AND SYSTEM FOR OPTIMIZING VIDEO ENCODING USING DOUBLE BUFFERING IN SINGLE ENCODING STRUCTURE}

아래의 설명은 동영상 인코딩 기술에 관한 것이다.The explanation below is about video encoding technology.

최근 영상과 음향이 통신 및 컴퓨터와 결합되어 새로운 미디어로 융합된 멀티미디어 정보가 제공되고 있다. 예를 들면, 고속의 데이터 전송망이 공급됨에 따라 입체 음향과 고화질의 영상을 시청할 수 있고, 화상 전화를 통해 사용자 간에 얼굴을 마주보며 통화할 수 있다. 또한, 컴퓨터나 TV를 통해 상품 정보를 실시간으로 보면서 상품을 구매할 수 있고, 웹 사이트를 통해 음악 또는 영화를 감상할 수 있다. 또한, 컴퓨터를 통해 동영상 강의를 수강하는 것이 가능하다.Recently, video and sound have been combined with communications and computers to provide multimedia information that has been integrated into new media. For example, with the provision of high-speed data transmission networks, users can watch stereoscopic sound and high-definition video, and make face-to-face calls between users through video phones. Additionally, you can purchase products by viewing product information in real time through a computer or TV, and watch music or movies through a website. Additionally, it is possible to take video lectures through a computer.

이러한 멀티미디어 정보들은 동영상 압축(즉, 인코딩) 기술을 기반으로 하여 발전되어 왔다. 정보를 전달하는 데이터는 데이터로부터 중복 요소(데이터를 정확히 복원하는 데 꼭 필요하지 않은 요소)를 제거함으로써 압축될 수 있다. 손실 압축의 경우, 디코더에서 복원되는 데이터가 원본 데이터와 동일하지 않지만, 높은 압축 효율을 얻기 위하여 주관적인 중복 요소가 제거된다. 이미지 또는 비디오 압축에 있어서 주관적인 중복 요소는 보는 사람이 직관적으로 느낄 수 있는 화질에 큰 영향을 주지 않고 제거할 수 있는 요소이다.Such multimedia information has been developed based on video compression (i.e. encoding) technology. Data that carries information can be compressed by removing redundant elements (elements that are not essential to accurately reconstruct the data) from the data. In the case of lossy compression, the data restored by the decoder is not identical to the original data, but subjective redundant elements are removed to achieve high compression efficiency. In image or video compression, subjective redundant elements are elements that can be removed without significantly affecting the image quality that the viewer can intuitively feel.

[선행문헌번호] [Prior document number]

한국등록특허 제10-1136858호Korean Patent No. 10-1136858

인공지능(AI) 기술을 이용하여 영상의 구간 별로 최적의 인코딩 파라미터를 찾아 압축 효율을 향상시킬 수 있는 인코딩 최적화 기술을 제공한다.Using artificial intelligence (AI) technology, we provide encoding optimization technology that can improve compression efficiency by finding the optimal encoding parameters for each section of the video.

단일 인코딩 구조에서 더블 버퍼링을 이용하여 처리량(throughput)을 향상시킬 수 있는 인코딩 최적화 기술을 제공한다.Provides encoding optimization technology that can improve throughput by using double buffering in a single encoding structure.

적어도 하나의 프로세서를 포함하는 컴퓨터 장치에서 실행되는 인코딩 최적화 방법에 있어서, 상기 적어도 하나의 프로세서에 의해, 입력 영상의 첫 번째 세그먼트에 대응하는 프레임 이미지들을 제1 세그먼트 버퍼에 저장하는 단계; 상기 적어도 하나의 프로세서에 의해, 상기 제1 세그먼트 버퍼에 상기 첫 번째 세그먼트에 대응하는 프레임 이미지들의 저장이 완료됨에 응답하여 저장 버퍼를 제2 세그먼트 버퍼로 변경하여 두 번째 세그먼트에 대응하는 프레임 이미지들을 상기 제2 세그먼트 버퍼에 저장하는 단계; 상기 적어도 하나의 프로세서에 의해, 상기 제1 세그먼트 버퍼에 저장된 프레임 이미지들에 대한 인코딩 옵션을 예측하는 단계; 및 상기 적어도 하나의 프로세서에 의해, 상기 예측된 인코딩 옵션 및 상기 제1 세그먼트 버퍼에 저장된 프레임 이미지들을 인코더로 전달하는 단계를 포함하는 인코딩 최적화 방법을 제공한다.An encoding optimization method executed on a computer device including at least one processor, comprising: storing frame images corresponding to a first segment of an input image in a first segment buffer, by the at least one processor; In response to completion of storage of frame images corresponding to the first segment in the first segment buffer, the at least one processor changes the storage buffer to a second segment buffer to store frame images corresponding to the second segment. storing in a second segment buffer; predicting, by the at least one processor, encoding options for frame images stored in the first segment buffer; and transmitting, by the at least one processor, the predicted encoding option and the frame images stored in the first segment buffer to an encoder.

일측에 따르면, 상기 인코딩 최적화 방법은 상기 적어도 하나의 프로세서에 의해, 상기 제2 세그먼트 버퍼에 상기 두 번째 세그먼트에 대응하는 프레임 이미지들의 저장이 완료되고, 상기 제1 세그먼트 버퍼에 저장된 프레임 이미지들이 상기 인코더로 전달됨에 응답하여 저장 버퍼를 상기 제1 세그먼트 버퍼로 변경하여 세 번째 세그먼트에 대응하는 프레임 이미지들을 상기 제1 세그먼트 버퍼에 저장하는 단계를 더 포함할 수 있다.According to one side, the encoding optimization method is such that the storage of frame images corresponding to the second segment in the second segment buffer is completed by the at least one processor, and the frame images stored in the first segment buffer are stored in the encoder. The step of changing the storage buffer to the first segment buffer and storing frame images corresponding to the third segment in the first segment buffer may be further included in response to being transmitted to the first segment buffer.

다른 측면에 따르면, 상기 세 번째 세그먼트에 대응하는 프레임 이미지들을 상기 제1 세그먼트 버퍼에 저장하는 단계는, 상기 제1 세그먼트 버퍼에 상기 인코더로 전달될 프레임 이미지들이 남아 있는 경우, 상기 제1 세그먼트 버퍼에 저장된 프레임 이미지들이 모두 상기 인코더로 전달될 때까지 대기 후, 상기 세 번째 세그먼트에 대응하는 프레임 이미지들을 상기 제1 세그먼트 버퍼에 저장하는 것을 특징으로 할 수 있다.According to another aspect, the step of storing frame images corresponding to the third segment in the first segment buffer includes, when frame images to be transmitted to the encoder remain in the first segment buffer, the frame images corresponding to the third segment are stored in the first segment buffer. After waiting until all stored frame images are delivered to the encoder, frame images corresponding to the third segment may be stored in the first segment buffer.

또 다른 측면에 따르면, 상기 인코딩 최적화 방법은 상기 적어도 하나의 프로세서에 의해, 상기 제2 세그먼트 버퍼에 상기 두 번째 세그먼트에 대응하는 프레임 이미지들의 저장이 완료되고, 상기 제1 세그먼트 버퍼에 저장된 프레임 이미지들이 상기 인코더로 전달됨에 응답하여 저장 버퍼를 제3 세그먼트 버퍼로 변경하여 세 번째 세그먼트에 대응하는 프레임 이미지들을 상기 제3 세그먼트 버퍼에 저장하는 단계를 더 포함할 수 있다.According to another aspect, the encoding optimization method is such that, by the at least one processor, frame images corresponding to the second segment are completed in the second segment buffer, and frame images stored in the first segment buffer are The method may further include changing the storage buffer to a third segment buffer in response to transmission to the encoder and storing frame images corresponding to the third segment in the third segment buffer.

또 다른 측면에 따르면, 상기 인코딩 최적화 방법은 상기 적어도 하나의 프로세서에 의해, 상기 제2 세그먼트 버퍼에 저장된 프레임 이미지들에 대한 인코딩 옵션을 예측하는 단계; 및 상기 적어도 하나의 프로세서에 의해, 상기 예측된 인코딩 옵션 및 상기 제2 세그먼트 버퍼에 저장된 프레임 이미지들을 인코더로 전달하는 단계를 더 포함할 수 있다.According to another aspect, the encoding optimization method includes predicting, by the at least one processor, encoding options for frame images stored in the second segment buffer; and transmitting, by the at least one processor, the predicted encoding option and the frame images stored in the second segment buffer to an encoder.

또 다른 측면에 따르면, 상기 인코딩 옵션을 예측하는 단계는, 인공지능 모델을 이용하여 입력 영상의 피처(feature)에 대응되는 인코딩 옵션을 예측하는 것을 특징으로 할 수 있다.According to another aspect, the step of predicting the encoding option may be characterized by predicting the encoding option corresponding to the feature of the input image using an artificial intelligence model.

또 다른 측면에 따르면, 상기 인공지능 모델은 각 프레임 이미지에 대한 프레임 피처를 추출하기 위한 CNN(convolution neural network) 모델, 상기 프레임 이미지 간의 관계를 바탕으로 비디오 피처를 추출하기 위한 RNN(recurrent neural network) 모델, 및 상기 비디오 피처에 해당되는 인코딩 옵션을 분류하는 분류기(classifier)를 포함하는 인코딩 옵션 예측 모델로서 상기 CNN 모델과 상기 RNN 모델 및 상기 분류기가 하나의 손실 함수에 대해 E2E(end-to-end) 방식으로 학습되는 것을 특징으로 할 수 있다.According to another aspect, the artificial intelligence model is a convolution neural network (CNN) model for extracting frame features for each frame image, and a recurrent neural network (RNN) model for extracting video features based on relationships between the frame images. An encoding option prediction model including a model and a classifier that classifies encoding options corresponding to the video feature, wherein the CNN model, the RNN model, and the classifier are ) can be characterized as being learned in a way.

또 다른 측면에 따르면, 상기 인코딩 옵션을 예측하는 단계는, 상기 인코딩 옵션으로서 목표 VMAF(Video Multi-method Assessment Fusion) 점수를 만족하는 CRF(Constant Rate Factor)를 예측하는 것을 특징으로 할 수 있다.According to another aspect, the step of predicting the encoding option may be characterized by predicting a constant rate factor (CRF) that satisfies a target VMAF (Video Multi-method Assessment Fusion) score as the encoding option.

또 다른 측면에 따르면, 상기 인코딩 옵션을 예측하는 단계는, 상기 인코딩 옵션으로서 목표 VMAF 점수를 만족하는 제1 CRF를 예측하는 단계; 상기 인코딩 옵션으로서 목표 비트레이트를 만족하는 제2 CRF를 예측하는 단계; 및 상기 제1 CRF와 상기 제2 CRF를 이용하여 상기 입력 영상에 대한 인코딩에 실제 적용할 제3 CRF를 결정하는 단계를 포함하는 것을 특징으로 할 수 있다.According to another aspect, predicting the encoding option includes predicting a first CRF that satisfies a target VMAF score as the encoding option; predicting a second CRF that satisfies a target bit rate as the encoding option; and determining a third CRF to be actually applied to encoding the input image using the first CRF and the second CRF.

컴퓨터 장치와 결합되어 상기 방법을 컴퓨터 장치에 실행시키기 위해 컴퓨터 판독 가능한 기록매체에 저장된 컴퓨터 프로그램을 제공한다.A computer program stored on a computer-readable recording medium is provided in conjunction with a computer device to execute the method on the computer device.

상기 방법을 컴퓨터 장치에 실행시키기 위한 프로그램이 기록되어 있는 컴퓨터 판독 가능한 기록매체를 제공한다.Provided is a computer-readable recording medium on which a program for executing the above method on a computer device is recorded.

컴퓨터 장치에 있어서, 상기 컴퓨터 장치에서 판독 가능한 명령을 실행하도록 구현되는 적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서에 의해, 입력 영상의 첫 번째 세그먼트에 대응하는 프레임 이미지들을 제1 세그먼트 버퍼에 저장하고, 상기 제1 세그먼트 버퍼에 상기 첫 번째 세그먼트에 대응하는 프레임 이미지들의 저장이 완료됨에 응답하여 저장 버퍼를 제2 세그먼트 버퍼로 변경하여 두 번째 세그먼트에 대응하는 프레임 이미지들을 상기 제2 세그먼트 버퍼에 저장하고, 상기 제1 세그먼트 버퍼에 저장된 프레임 이미지들에 대한 인코딩 옵션을 예측하고, 상기 예측된 인코딩 옵션 및 상기 제1 세그먼트 버퍼에 저장된 프레임 이미지들을 인코더로 전달하는 것을 특징으로 하는 컴퓨터 장치를 제공한다.A computer device, comprising at least one processor configured to execute instructions readable by the computer device, wherein frame images corresponding to a first segment of an input image are stored in a first segment buffer by the at least one processor. In response to completion of storing frame images corresponding to the first segment in the first segment buffer, the storage buffer is changed to the second segment buffer to store the frame images corresponding to the second segment in the second segment buffer. A computer device is provided that stores, predicts encoding options for frame images stored in the first segment buffer, and transmits the predicted encoding options and the frame images stored in the first segment buffer to an encoder. .

인공지능(AI) 기술을 이용하여 영상의 구간 별로 최적의 인코딩 파라미터를 찾아 압축 효율을 향상시킬 수 있다.Using artificial intelligence (AI) technology, compression efficiency can be improved by finding the optimal encoding parameters for each section of the video.

단일 인코딩 구조에서 더블 버퍼링을 이용하여 처리량(throughput)을 향상시킬 수 있다.Throughput can be improved by using double buffering in a single encoding structure.

도 1은 본 발명의 일실시예에 따른 네트워크 환경의 예를 도시한 도면이다.
도 2는 본 발명의 일실시예에 따른 컴퓨터 장치의 예를 도시한 블록도이다.
도 3은 본 발명의 일실시예에 있어서 분산 인코딩 시스템의 구성요소의 예를 도시한 도면이다.
도 4는 본 발명의 일실시예에 있어서 인코딩 최적화를 위한 인코딩 옵션 예측 모델의 기본 컨셉을 설명하기 위한 도면이다.
도 5는 본 발명의 일실시예에 있어서 동영상 인코딩 최적화 방법을 나타낸 것이다.
도 6과 도 7은 본 발명의 일실시예에 있어서 해상도 별 화질 측정 지표를 설명하기 위한 예시 도면이다.
도 8은 본 발명의 일실시예에 있어서 모델 학습을 위한 라벨 데이터를 생성하는 과정을 설명하기 위한 예시 도면이다.
도 9는 본 발명의 일실시예에 있어서 서비스 제약 사항을 고려한 추가 인코딩 옵션을 예측하는 과정을 설명하기 위한 예시 도면이다.
도 10은 본 발명의 일실시예에 있어서, 세그먼트 단위의 영상 분석 과정의 예를 도시한 도면이다.
도 11은 본 발명의 일실시예에 있어서, 두 개의 버퍼를 사용한 세그먼트 단위의 영상 분석 과정의 예를 도시한 도면이다.
도 12는 본 발명의 일실시예에 따른 인코딩 최적화 방법의 예를 도시한 흐름도이다.1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention.
Figure 2 is a block diagram showing an example of a computer device according to an embodiment of the present invention.
Figure 3 is a diagram showing an example of components of a distributed encoding system in one embodiment of the present invention.
Figure 4 is a diagram for explaining the basic concept of an encoding option prediction model for encoding optimization in an embodiment of the present invention.
Figure 5 shows a video encoding optimization method in one embodiment of the present invention.
Figures 6 and 7 are example diagrams for explaining image quality measurement indices for each resolution in one embodiment of the present invention.
Figure 8 is an example diagram for explaining the process of generating label data for model learning in one embodiment of the present invention.
Figure 9 is an example diagram illustrating a process for predicting additional encoding options considering service constraints in one embodiment of the present invention.
Figure 10 is a diagram illustrating an example of an image analysis process on a segment basis, according to an embodiment of the present invention.
Figure 11 is a diagram illustrating an example of a segment-level image analysis process using two buffers, according to an embodiment of the present invention.
Figure 12 is a flowchart showing an example of an encoding optimization method according to an embodiment of the present invention.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

본 발명의 실시예들에 따른 동영상 인코딩 최적화 시스템은 적어도 하나의 컴퓨터 장치에 의해 구현될 수 있다. 이때, 컴퓨터 장치에는 본 발명의 일실시예에 따른 컴퓨터 프로그램이 설치 및 구동될 수 있고, 컴퓨터 장치는 구동된 컴퓨터 프로그램의 제어에 따라 본 발명의 실시예들에 따른 동영상 인코딩 최적화 방법을 수행할 수 있다. 상술한 컴퓨터 프로그램은 컴퓨터 장치와 결합되어 동영상 인코딩 최적화 방법을 컴퓨터에 실행시키기 위해 컴퓨터 판독 가능한 기록매체에 저장될 수 있다.The video encoding optimization system according to embodiments of the present invention may be implemented by at least one computer device. At this time, the computer program according to an embodiment of the present invention may be installed and driven in the computer device, and the computer device may perform the video encoding optimization method according to the embodiment of the present invention under the control of the driven computer program. there is. The above-described computer program can be combined with a computer device and stored in a computer-readable recording medium to execute the video encoding optimization method on the computer.

도 1은 본 발명의 일실시예에 따른 네트워크 환경의 예를 도시한 도면이다. 도 1의 네트워크 환경은 복수의 전자 기기들(110, 120, 130, 140), 복수의 서버들(150, 160) 및 네트워크(170)를 포함하는 예를 나타내고 있다. 이러한 도 1은 발명의 설명을 위한 일례로 전자 기기의 수나 서버의 수가 도 1과 같이 한정되는 것은 아니다.1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention. The network environment in FIG. 1 shows an example including a plurality of electronic devices 110, 120, 130, and 140, a plurality of servers 150 and 160, and a network 170. Figure 1 is an example for explaining the invention, and the number of electronic devices or servers is not limited as in Figure 1.

복수의 전자 기기들(110, 120, 130, 140)은 컴퓨터 시스템으로 구현되는 고정형 단말이거나 이동형 단말일 수 있다. 복수의 전자 기기들(110, 120, 130, 140)의 예를 들면, 스마트폰(smart phone), 휴대폰, 내비게이션, 컴퓨터, 노트북, 디지털방송용 단말, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), 태블릿 PC, 게임 콘솔(game console), 웨어러블 디바이스(wearable device), IoT(internet of things) 디바이스, VR(virtual reality) 디바이스, AR(augmented reality) 디바이스 등이 있다. 일례로 도 1에서는 전자 기기(110)의 예로 스마트폰의 형상을 나타내고 있으나, 본 발명의 실시예들에서 전자 기기(110)는 실질적으로 무선 또는 유선 통신 방식을 이용하여 네트워크(170)를 통해 다른 전자 기기들(120, 130, 140) 및/또는 서버(150, 160)와 통신할 수 있는 다양한 물리적인 컴퓨터 시스템들 중 하나를 의미할 수 있다.The plurality of electronic devices 110, 120, 130, and 140 may be fixed terminals or mobile terminals implemented with a computer system. Examples of the plurality of electronic devices 110, 120, 130, and 140 include smart phones, mobile phones, navigation devices, computers, laptops, digital broadcasting terminals, Personal Digital Assistants (PDAs), and Portable Multimedia Players (PMPs). ), tablet PCs, game consoles, wearable devices, IoT (internet of things) devices, VR (virtual reality) devices, AR (augmented reality) devices, etc. For example, in FIG. 1, the shape of a smartphone is shown as an example of the electronic device 110. However, in embodiments of the present invention, the electronic device 110 actually communicates with other devices through the network 170 using a wireless or wired communication method. It may refer to one of various physical computer systems capable of communicating with electronic devices 120, 130, 140 and/or servers 150, 160.

통신 방식은 제한되지 않으며, 네트워크(170)가 포함할 수 있는 통신망(일례로, 이동통신망, 유선 인터넷, 무선 인터넷, 방송망, 위성망 등)을 활용하는 통신 방식뿐만 아니라 기기들간의 근거리 무선 통신 역시 포함될 수 있다. 예를 들어, 네트워크(170)는, PAN(personal area network), LAN(local area network), CAN(campus area network), MAN(metropolitan area network), WAN(wide area network), BBN(broadband network), 인터넷 등의 네트워크 중 하나 이상의 임의의 네트워크를 포함할 수 있다. 또한, 네트워크(170)는 버스 네트워크, 스타 네트워크, 링 네트워크, 메쉬 네트워크, 스타-버스 네트워크, 트리 또는 계층적(hierarchical) 네트워크 등을 포함하는 네트워크 토폴로지 중 임의의 하나 이상을 포함할 수 있으나, 이에 제한되지 않는다.The communication method is not limited, and may include not only a communication method utilizing communication networks that the network 170 may include (e.g., mobile communication network, wired Internet, wireless Internet, broadcasting network, satellite network, etc.), but also short-range wireless communication between devices. You can. For example, the network 170 may include a personal area network (PAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN), and a broadband network (BBN). , may include one or more arbitrary networks such as the Internet. Additionally, the network 170 may include any one or more of network topologies including a bus network, star network, ring network, mesh network, star-bus network, tree or hierarchical network, etc. Not limited.

서버(150, 160) 각각은 복수의 전자 기기들(110, 120, 130, 140)과 네트워크(170)를 통해 통신하여 명령, 코드, 파일, 콘텐츠, 서비스 등을 제공하는 컴퓨터 장치 또는 복수의 컴퓨터 장치들로 구현될 수 있다. 예를 들어, 서버(150)는 네트워크(170)를 통해 접속한 복수의 전자 기기들(110, 120, 130, 140)로 제1 서비스를 제공하는 시스템일 수 있으며, 서버(160) 역시 네트워크(170)를 통해 접속한 복수의 전자 기기들(110, 120, 130, 140)로 제2 서비스를 제공하는 시스템일 수 있다. 보다 구체적인 예로, 서버(150)는 복수의 전자 기기들(110, 120, 130, 140)에 설치되어 구동되는 컴퓨터 프로그램으로서의 어플리케이션을 통해, 해당 어플리케이션이 목적하는 서비스(일례로, 동영상 서비스 등)를 제1 서비스로서 복수의 전자 기기들(110, 120, 130, 140)로 제공할 수 있다. 다른 예로, 서버(160)는 상술한 어플리케이션의 설치 및 구동을 위한 파일을 복수의 전자 기기들(110, 120, 130, 140)로 배포하는 서비스를 제2 서비스로서 제공할 수 있다.Each of the servers 150 and 160 is a computer device or a plurality of computers that communicate with a plurality of electronic devices 110, 120, 130, 140 and a network 170 to provide commands, codes, files, content, services, etc. It can be implemented with devices. For example, the server 150 may be a system that provides a first service to a plurality of electronic devices 110, 120, 130, and 140 connected through the network 170, and the server 160 also provides a network ( It may be a system that provides a second service to a plurality of electronic devices 110, 120, 130, and 140 connected through 170). As a more specific example, the server 150 provides a service (for example, a video service, etc.) targeted by the application through an application as a computer program installed and running on a plurality of electronic devices 110, 120, 130, and 140. As a first service, it can be provided through a plurality of electronic devices 110, 120, 130, and 140. As another example, the server 160 may provide a service for distributing files for installing and running the above-described application to a plurality of electronic devices 110, 120, 130, and 140 as a second service.

도 2는 본 발명의 일실시예에 따른 컴퓨터 장치의 예를 도시한 블록도이다. 앞서 설명한 복수의 전자 기기들(110, 120, 130, 140) 각각이나 서버들(150, 160) 각각은 도 2를 통해 도시된 컴퓨터 장치(200)에 의해 구현될 수 있다.Figure 2 is a block diagram showing an example of a computer device according to an embodiment of the present invention. Each of the plurality of electronic devices 110, 120, 130, and 140 described above or each of the servers 150 and 160 may be implemented by the computer device 200 shown in FIG. 2.

이러한 컴퓨터 장치(200)는 도 2에 도시된 바와 같이, 메모리(210), 프로세서(220), 통신 인터페이스(230) 그리고 입출력 인터페이스(240)를 포함할 수 있다. 메모리(210)는 컴퓨터에서 판독 가능한 기록매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 여기서 ROM과 디스크 드라이브와 같은 비소멸성 대용량 기록장치는 메모리(210)와는 구분되는 별도의 영구 저장 장치로서 컴퓨터 장치(200)에 포함될 수도 있다. 또한, 메모리(210)에는 운영체제와 적어도 하나의 프로그램 코드가 저장될 수 있다. 이러한 소프트웨어 구성요소들은 메모리(210)와는 별도의 컴퓨터에서 판독 가능한 기록매체로부터 메모리(210)로 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록매체를 포함할 수 있다. 다른 실시예에서 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록매체가 아닌 통신 인터페이스(230)를 통해 메모리(210)에 로딩될 수도 있다. 예를 들어, 소프트웨어 구성요소들은 네트워크(170)를 통해 수신되는 파일들에 의해 설치되는 컴퓨터 프로그램에 기반하여 컴퓨터 장치(200)의 메모리(210)에 로딩될 수 있다.As shown in FIG. 2, this computer device 200 may include a memory 210, a processor 220, a communication interface 230, and an input/output interface 240. The memory 210 is a computer-readable recording medium and may include a non-permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. Here, non-perishable large-capacity recording devices such as ROM and disk drives may be included in the computer device 200 as a separate permanent storage device that is distinct from the memory 210. Additionally, an operating system and at least one program code may be stored in the memory 210. These software components may be loaded into the memory 210 from a computer-readable recording medium separate from the memory 210. Such separate computer-readable recording media may include computer-readable recording media such as floppy drives, disks, tapes, DVD/CD-ROM drives, and memory cards. In another embodiment, software components may be loaded into the memory 210 through the communication interface 230 rather than a computer-readable recording medium. For example, software components may be loaded into memory 210 of computer device 200 based on computer programs installed by files received over network 170.

프로세서(220)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(210) 또는 통신 인터페이스(230)에 의해 프로세서(220)로 제공될 수 있다. 예를 들어 프로세서(220)는 메모리(210)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다.The processor 220 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. Commands may be provided to the processor 220 by the memory 210 or the communication interface 230. For example, processor 220 may be configured to execute received instructions according to program code stored in a recording device such as memory 210.

통신 인터페이스(230)는 네트워크(170)를 통해 컴퓨터 장치(200)가 다른 장치(일례로, 앞서 설명한 저장 장치들)와 서로 통신하기 위한 기능을 제공할 수 있다. 일례로, 컴퓨터 장치(200)의 프로세서(220)가 메모리(210)와 같은 기록 장치에 저장된 프로그램 코드에 따라 생성한 요청이나 명령, 데이터, 파일 등이 통신 인터페이스(230)의 제어에 따라 네트워크(170)를 통해 다른 장치들로 전달될 수 있다. 역으로, 다른 장치로부터의 신호나 명령, 데이터, 파일 등이 네트워크(170)를 거쳐 컴퓨터 장치(200)의 통신 인터페이스(230)를 통해 컴퓨터 장치(200)로 수신될 수 있다. 통신 인터페이스(230)를 통해 수신된 신호나 명령, 데이터 등은 프로세서(220)나 메모리(210)로 전달될 수 있고, 파일 등은 컴퓨터 장치(200)가 더 포함할 수 있는 저장 매체(상술한 영구 저장 장치)로 저장될 수 있다.The communication interface 230 may provide a function for the computer device 200 to communicate with other devices (eg, the storage devices described above) through the network 170. For example, a request, command, data, file, etc. generated by the processor 220 of the computer device 200 according to a program code stored in a recording device such as memory 210 is transmitted to the network ( 170) and can be transmitted to other devices. Conversely, signals, commands, data, files, etc. from other devices may be received by the computer device 200 through the communication interface 230 of the computer device 200 via the network 170. Signals, commands, data, etc. received through the communication interface 230 may be transmitted to the processor 220 or memory 210, and files, etc. may be stored in a storage medium (as described above) that the computer device 200 may further include. It can be stored as a permanent storage device).

입출력 인터페이스(240)는 입출력 장치(250)와의 인터페이스를 위한 수단일 수 있다. 예를 들어, 입력 장치는 마이크, 키보드 또는 마우스 등의 장치를, 그리고 출력 장치는 디스플레이, 스피커와 같은 장치를 포함할 수 있다. 다른 예로 입출력 인터페이스(240)는 터치스크린과 같이 입력과 출력을 위한 기능이 하나로 통합된 장치와의 인터페이스를 위한 수단일 수도 있다. 입출력 장치(250)는 컴퓨터 장치(200)와 하나의 장치로 구성될 수도 있다.The input/output interface 240 may be a means for interfacing with the input/output device 250. For example, input devices may include devices such as a microphone, keyboard, or mouse, and output devices may include devices such as displays and speakers. As another example, the input/output interface 240 may be a means for interfacing with a device that integrates input and output functions, such as a touch screen. The input/output device 250 may be configured as a single device with the computer device 200.

또한, 다른 실시예들에서 컴퓨터 장치(200)는 도 2의 구성요소들보다 더 적은 혹은 더 많은 구성요소들을 포함할 수도 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다. 예를 들어, 컴퓨터 장치(200)는 상술한 입출력 장치(250) 중 적어도 일부를 포함하도록 구현되거나 또는 트랜시버(transceiver), 데이터베이스 등과 같은 다른 구성요소들을 더 포함할 수도 있다.Additionally, in other embodiments, computer device 200 may include fewer or more components than those of FIG. 2 . However, there is no need to clearly show most prior art components. For example, the computer device 200 may be implemented to include at least some of the input/output devices 250 described above, or may further include other components such as a transceiver, a database, etc.

동영상 압축 알고리즘 중 하나는 사람의 시각 특성을 고려하여 원본 영상에 사람이 인지하기 힘든 정도의 손실을 주어 압축률을 높이는 방법을 사용하고 있다. 손실 정도를 조정하여 영상 압축률을 조절할 수 있다.One of the video compression algorithms takes human visual characteristics into account and uses a method of increasing the compression rate by adding a level of loss to the original video that is difficult for humans to perceive. You can adjust the video compression rate by adjusting the degree of loss.

상용 비디오 압축기(코덱)들은 손실 압축 방식을 통해 영상의 품질을 크게 저하시키지 않으면서 스트리밍에 문제가 되지 않는 정도의 비트 전송률(초당 영상 사이즈)을 유지하도록 비트 전송률 제어(Bitrate Control) 기능을 제공한다.Commercial video compressors (codecs) provide a bitrate control function to maintain the bit rate (video size per second) at a level that does not cause problems for streaming without significantly deteriorating video quality through lossy compression. .

일반적으로 많이 사용되는 비디오 코덱들은 고정된 목표 비트 전송률을 유지하거나 혹은 일정 화질을 유지하는 비트 전송률 제어 기능을 제공한다. 여기서, 일정 화질 유지란 인코딩 결과물의 화질이 균일함을 의미하며 그 화질의 수준은 인코딩 전에는 알 수가 없다.Commonly used video codecs maintain a fixed target bit rate or provide a bit rate control function to maintain constant image quality. Here, maintaining a constant image quality means that the image quality of the encoding result is uniform, and the level of image quality cannot be known before encoding.

그러나, 영상 별로 복잡도와 특성이 다르기 때문에 모든 영상에 대해 동일한 목표 비트 전송률이나 화질로 압축하면 사람이 인지하는 것 이상의 고품질로 압축되어 필요 이상의 크기로 압축되는 결과가 발생한다.However, because the complexity and characteristics of each image are different, compressing all images with the same target bit rate or image quality results in compression with a higher quality than human perception and compression into a size that is larger than necessary.

또한, 동영상은 복잡도와 특성이 다른 여러 장면으로 이루어지는 경우가 많기 때문에 영상의 구간에 따라서도 위와 같은 문제가 발생할 수 있다.Additionally, since videos often consist of multiple scenes with different complexity and characteristics, the above problem may occur depending on the section of the video.

본 실시예들은 AI 기술을 통해 영상의 구간 별 특성에 따라 적정 비트 전송률 또는 적정 화질 옵션을 결정하여 동영상의 압축률을 최적화하고자 하는 것이다.These embodiments seek to optimize the compression rate of a video by determining an appropriate bit rate or appropriate image quality option according to the characteristics of each section of the video through AI technology.

도 3은 본 발명의 일실시예에 있어서 분산 인코딩 시스템의 구성요소의 예를 도시한 도면이다.Figure 3 is a diagram showing an example of components of a distributed encoding system in one embodiment of the present invention.

도 3을 참조하면, 분산 인코딩 시스템(300)은 동영상 플랫폼(330)에 적용하기 위한 최적화 인코더 구성요소로서, 분산 인코더(distributed encoder)(310), 및 AI 서빙 모듈(serving module)(320)을 포함할 수 있다.Referring to FIG. 3, the distributed encoding system 300 is an optimized encoder component for application to the video platform 330, and includes a distributed encoder 310 and an AI serving module 320. It can be included.

분산 인코더(310)나 AI 서빙 모듈(320) 각각은 도 2를 통해 설명한 컴퓨터 장치(200)에 의해 구현될 수 있다.Each of the distributed encoder 310 and the AI serving module 320 may be implemented by the computer device 200 described with reference to FIG. 2 .

동영상 서비스에서는 서비스하고자 하는 영상을 동영상 플랫폼(330)으로 업로드하여 분산 인코딩 시스템(300)을 통해 인코딩을 진행할 수 있다. 인코딩 옵션 최적화 여부는 동영상 플랫폼(330)에서 결정되며 최적화 적용 시에는 인코딩 요청에 모델 정보 등의 파라미터가 추가될 수 있다.In the video service, the video to be served can be uploaded to the video platform 330 and encoded through the distributed encoding system 300. Whether or not to optimize the encoding option is determined by the video platform 330, and when optimization is applied, parameters such as model information may be added to the encoding request.

분산 인코더(310)는 분산 처리부(distributor)(311) 및 워커(312)를 포함할 수 있다.The distributed encoder 310 may include a distributed processor 311 and a worker 312.

분산 처리부(311)는 원본 영상을 여러 개의 세그먼트로 분할하여 다수의 워커(312)에 할당한다. 워커(312)에서 인코딩이 진행되면서 압축된 비트스트림을 수신하고, 수신된 비트스트림은 로컬 저장소에 임시 저장되거나 메모리에 로드된 상태로 관리된다. 분산 처리부(311)는 세그먼트 각각에 대한 워커(312)의 인코딩 결과를 병합하여(merge) 최종 영상 파일을 생성할 수 있다.The distributed processing unit 311 divides the original image into several segments and assigns them to a plurality of workers 312. As encoding progresses in the worker 312, a compressed bitstream is received, and the received bitstream is temporarily stored in local storage or managed as loaded in memory. The distributed processing unit 311 may generate a final video file by merging the encoding results of the workers 312 for each segment.

워커(312)는 인코딩 작업을 수행하기 위한 단위 트랜스코더로서 분산 처리부(311)로부터 할당받은 영상 세그먼트를 인코딩하는 역할을 한다. 이때, 워커(312)는 인코딩 옵션을 최적화하기 위한 AI 서빙 모듈(320)과 직접 연동하여 동작할 수 있다. 인코딩 옵션 최적화가 활성화되어 있다면 인코딩 옵션 예측에 필요한 일부 프레임 이미지를 준비하고 준비된 프레임 이미지를 모델 정보와 함께 AI 서빙 모듈(320)에 전달하여 최적의 인코딩 옵션을 요청할 수 있다. 워커(312)는 AI 서빙 모듈(320)의 예측 결과를 적용하여 인코딩을 수행하고 인코딩 결과 화질과 전송 비트율을 점검한다. 워커(312)는 인코딩 결과 화질 또는 전송 비트율이 적정 범위를 벗어나는 경우 예측 오차를 실험치에 근거하여 보간하고 다시 인코딩을 수행한다. 인코딩 진행과 동시에 압축된 비트스트림은 분산 처리부(311)로 전송하고, 모델의 예측 결과와 인코딩 결과 데이터를 동영상 플랫폼(330)에 저장한다.The worker 312 is a unit transcoder for performing encoding tasks and serves to encode video segments allocated from the distributed processing unit 311. At this time, the worker 312 may operate in direct connection with the AI serving module 320 to optimize encoding options. If encoding option optimization is activated, some frame images required for encoding option prediction can be prepared and the prepared frame images can be transmitted to the AI serving module 320 along with model information to request the optimal encoding option. The worker 312 performs encoding by applying the prediction result of the AI serving module 320 and checks the encoding result image quality and transmission bit rate. If the image quality or transmission bit rate as a result of encoding is outside the appropriate range, the worker 312 interpolates the prediction error based on experimental values and performs encoding again. Simultaneously with the encoding process, the compressed bitstream is transmitted to the distributed processing unit 311, and the model prediction result and encoding result data are stored in the video platform 330.

AI 서빙 모듈(320)은 분산 인코더(310)와 연동하여 동작하는 것으로, 인코딩 옵션 예측 모델(321)을 포함한다. AI 서빙 모듈(320)은 각 워커(312)로부터 영상 세그먼트를 인코딩하기 위한 인코딩 옵션 요청을 수신하는 경우 수신된 요청에서 지정하는 인코딩 옵션 예측 모델(321)을 통해 최적의 인코딩 옵션을 예측하여 예측 결과를 워커(312)로 반환할 수 있다.The AI serving module 320 operates in conjunction with the distributed encoder 310 and includes an encoding option prediction model 321. When receiving an encoding option request for encoding a video segment from each worker 312, the AI serving module 320 predicts the optimal encoding option through the encoding option prediction model 321 specified in the received request, resulting in a prediction result. can be returned to the worker (312).

동영상 플랫폼(330)은 ELK(Elastic Logstash Kibana)와 같은 클라우드 서치(cloud search)를 바탕으로 인코딩 옵션 예측 모델(321)의 예측 정확도에 대한 로그를 수집할 수 있다. 다시 말해, 동영상 플랫폼(330)은 워커(312)의 인코딩 결과를 로그로 기록하여 인코딩 옵션 예측 모델(321)의 정확도를 모니터링하고 모니터링 결과와 사용자(예를 들어, 관리자 등)(340)에 의한 입력 값을 바탕으로 새로운 영상을 통한 모델 추가 학습과 예측 정확도 개선을 지원할 수 있다.The video platform 330 may collect logs about the prediction accuracy of the encoding option prediction model 321 based on cloud search such as ELK (Elastic Logstash Kibana). In other words, the video platform 330 records the encoding results of the worker 312 as a log to monitor the accuracy of the encoding option prediction model 321 and monitors the monitoring results and the user (e.g., administrator, etc.) 340. Based on the input value, it can support additional model learning and improvement of prediction accuracy through new images.

본 실시예에서 인코딩 옵션 예측 모델(321)은 AI 기반의 예측 모델로서 영상의 세그먼트 별 특성을 판단하고 그에 맞는 최적의 인코딩 옵션을 예측하는 것이다.In this embodiment, the encoding option prediction model 321 is an AI-based prediction model that determines the characteristics of each segment of the video and predicts the optimal encoding option accordingly.

인코딩 옵션은 인코딩 레이트, 즉 영상 압축률을 조절할 수 있는 파라미터를 의미하는 것으로, 일례로 최적화 인코딩 옵션으로 CRF(Constant Rate Factor)를 사용할 수 있다.Encoding options refer to parameters that can control the encoding rate, that is, the video compression rate. For example, CRF (Constant Rate Factor) can be used as an optimized encoding option.

CRF는 CQP(constant quantization parameter)와 대비하여 시각적으로 보다 균일한 화질을 보장하며, 사람의 지각적인 인지 특성을 반영할 수 있다.CRF guarantees visually more uniform image quality compared to CQP (constant quantization parameter) and can reflect human perceptual recognition characteristics.

인코딩 옵션 예측 모델(321)의 기본 컨셉은 도 4와 같다.The basic concept of the encoding option prediction model 321 is shown in FIG. 4.

도 4를 참조하면, 인코딩 옵션 예측 모델(321)은 영상을 구성하는 프레임의 이미지들(401)을 입력받아 딥러닝 모델을 통해 영상의 피처를 추출할 수 있다. 이때, 딥러닝 모델은 CNN(convolution neural network) 모델과 RNN(recurrent neural network) 모델을 포함할 수 있다. CNN 모델은 프레임 이미지의 피처를 추출하는 역할을 하고, RNN 모델은 데이터(프레임 이미지의 피처) 시퀀스 간의 관계를 학습하는 역할을 한다.Referring to FIG. 4, the encoding option prediction model 321 can receive the images 401 of the frames constituting the video and extract features of the video through a deep learning model. At this time, the deep learning model may include a convolution neural network (CNN) model and a recurrent neural network (RNN) model. The CNN model is responsible for extracting features of frame images, and the RNN model is responsible for learning relationships between sequences of data (features of frame images).

인코딩 옵션 예측 모델(321)은 지도 학습(supervised learning)으로 딥러닝 모델을 통해 추출된 피처 별로 최적의 CRF 클래스를 분류하도록 학습된다.The encoding option prediction model 321 is learned to classify the optimal CRF class for each feature extracted through a deep learning model through supervised learning.

이러한 방식으로 학습된 인코딩 옵션 예측 모델(321)에 새로운 영상의 프레임 이미지를 입력하게 되면 해당 이미지의 특성에 맞는 최적의 CRF 카테고리를 예측할 수 있다.When a frame image of a new video is input to the encoding option prediction model 321 learned in this way, the optimal CRF category suitable for the characteristics of the image can be predicted.

도 5는 본 발명의 일실시예에 있어서 동영상 인코딩 최적화 방법을 나타낸 것이다.Figure 5 shows a video encoding optimization method in one embodiment of the present invention.

도 5를 참조하면, 분산 인코딩 시스템(300)은 인코딩 옵션 예측 모델(321)에 대한 학습 과정(S510)과 추론 과정(S520)을 포함한다.Referring to FIG. 5, the distributed encoding system 300 includes a learning process (S510) and an inference process (S520) for the encoding option prediction model 321.

인코딩 옵션 예측 모델(321)은 CNN 모델, RNN 모델, 및 분류기(classifier)를 포함한다.The encoding option prediction model 321 includes a CNN model, a RNN model, and a classifier.

먼저, 분산 인코딩 시스템(300)은 전처리(pre-processing) 과정을 수행한다. 전처리 과정은 영상 세그먼트에서 프레임 이미지를 추출하는 과정을 포함한다. 분산 인코딩 시스템(300)은 학습 과정(S510)에서의 메모리 문제를 방지하고 정확도를 높이기 위해 프레임 이미지 사이즈, 프레임 이미지 개수 등을 최적화할 수 있다.First, the distributed encoding system 300 performs a pre-processing process. The preprocessing process includes extracting frame images from video segments. The distributed encoding system 300 can optimize frame image size, number of frame images, etc. to prevent memory problems and increase accuracy in the learning process (S510).

인코딩 옵션 예측 모델(321)을 학습하기 위한 데이터 셋은 동영상 플랫폼에 업로드된 비디오를 이용하여 제작한 비디오 세그먼트 데이터 셋을 활용할 수 있다.The data set for learning the encoding option prediction model 321 can be a video segment data set created using videos uploaded to a video platform.

일례로, 인코딩 최적화 옵션은 CRF를 사용하고, 데이터 셋의 분포와 실제 인코딩 시 유효한 범위를 고려하여 학습 대상이 되는 CRF 범위를 결정할 수 있다.For example, the encoding optimization option uses CRF and can determine the CRF range to be learned by considering the distribution of the data set and the effective range during actual encoding.

프레임 이미지 사이즈는 최초 224×224부터, 336×336, 448×448, 560×560, 그 이상까지 키울 수 있다. 336×336 이하에서는 원본 영상을 80% 중앙 자르기(center-crop) 후에 리사이즈할 수 있다. 원본 정보가 가능한 유지될 수 있게끔 리사이즈하되 원본의 종횡비(aspect ratio)는 변경될 수 있다.The frame image size can be increased from the initial 224×224 to 336×336, 448×448, 560×560, and beyond. In sizes below 336×336, the original video can be resized after center-cropping 80%. Resize so that the original information is maintained as much as possible, but the aspect ratio of the original may be changed.

원본 영상의 프레임 이미지(1920×1080)가 위 사이즈로 리사이즈되면 리사이즈 과정에서 정보 손실이 발생할 수 밖에 없고 이러한 손실을 줄이기 위해서 이미지 사이즈를 가능한 키우는 것이 정확도 향상에 도움이 된다.When the frame image (1920×1080) of the original video is resized to the above size, information loss is bound to occur during the resizing process. To reduce this loss, increasing the image size as much as possible helps improve accuracy.

프레임 이미지 개수의 경우 정확도 향상을 위해 4초 세그먼트 영상에서 250ms 간격으로 15장의 이미지를 모델 입력으로 사용할 수 있다. 분산 인코딩 시스템(300)은 정확도 향상을 위해 5초 세그먼트 영상에서 500ms 간격으로 10장의 이미지를 선정한 후 선정된 이미지를 CNN에서 처리 가능한 336×336 사이즈로 리사이즈할 수 있다. 다시 말해, 세그먼트 영상 하나에 대하여 10장×(336×336×3) 만큼의 데이터가 인코딩 옵션 예측 모델(321)의 입력으로 제공될 수 있다.In the case of the number of frame images, 15 images at 250ms intervals from a 4-second segment video can be used as model input to improve accuracy. To improve accuracy, the distributed encoding system 300 can select 10 images at 500 ms intervals from a 5-second segment video and then resize the selected images to a size of 336×336 that can be processed by CNN. In other words, for one segment image, 10 pieces × (336 × 336 × 3) data can be provided as input to the encoding option prediction model 321.

이는 예시적인 것일 뿐 모델 입력으로 사용하지 위한 이미지 선정은 얼마든지 변경 가능하다. 예를 들어, 4초 세그먼트 영상에서 200ms 간격으로 20장의 이미지를 선정하는 것 또한 가능하다.This is just an example, and the selection of images to use as model input can be changed at any time. For example, it is also possible to select 20 images at 200ms intervals from a 4-second segment video.

각 영상의 반복 인코딩을 통해 VMAF(Video Multi-method Assessment Fusion) 기준의 적정 화질을 만족하는 CRF 옵션을 탐색하여 이렇게 수집된 4초 길이 세그먼트 영상과 GT(ground truth) 데이터 셋을 인코딩 옵션 예측 모델(321)의 학습 과정(S510)에 사용할 수 있다.Through repeated encoding of each video, CRF options that satisfy the appropriate image quality of VMAF (Video Multi-method Assessment Fusion) standards are explored, and the collected 4-second segment video and GT (ground truth) data set are used to create an encoding option prediction model ( 321) can be used in the learning process (S510).

다음으로, 분산 인코딩 시스템(300)은 영상 피처 추출(video feature extraction) 과정을 수행한다. CNN에서 인코딩 옵션 예측 모델(321)의 입력으로 주어지는 각 프레임 이미지에 대한 프레임 피처를 추출할 수 있고, CNN에서 추출된 프레임 피처를 이미지 순서대로 RNN(LSTM(Long Short-Term Memory))에 입력하여 피처 시퀀스 간의 관계 정보를 바탕으로 비디오 피처를 추출할 수 있다.Next, the distributed encoding system 300 performs a video feature extraction process. In CNN, frame features for each frame image given as input to the encoding option prediction model 321 can be extracted, and frame features extracted from CNN are input into RNN (LSTM (Long Short-Term Memory)) in image order. Video features can be extracted based on relationship information between feature sequences.

마지막으로, 분산 인코딩 시스템(300)은 분류(classification) 과정을 수행하는 것으로, 일례로 소프트맥스 분류기(softmax classifier)를 이용하여 RNN에서 추출된 비디오 피처를 분류할 수 있다.Lastly, the distributed encoding system 300 performs a classification process and, for example, can classify video features extracted from the RNN using a softmax classifier.

따라서, 분산 인코딩 시스템(300)은 CNN 모델까지 파인-튜닝(fine-tuning)하고 CNN 모델을 포함한 모델 전체를 하나의 손실 함수에 대해 E2E(end-to-end) 방식으로 학습할 수 있다.Therefore, the distributed encoding system 300 can fine-tune the CNN model and learn the entire model, including the CNN model, for one loss function in an end-to-end (E2E) manner.

이미지 분류 성능을 위해 학습 과정(S510)에서 학습된 CNN의 가중치(weight)를 영상 특성을 잘 구분하도록 최적화할 수 있다. 지도 학습 방법으로 영상의 특성, 즉 비디오 피처와 인코딩 옵션(CRF) 간의 관계를 학습하여 영상의 특성에 맞는 최적의 옵션으로 분류하는 모델을 구축할 수 있다.For image classification performance, the weights of the CNN learned in the learning process (S510) can be optimized to better distinguish image characteristics. With the supervised learning method, you can learn the characteristics of the video, that is, the relationship between video features and encoding options (CRF), and build a model that classifies it as the optimal option suitable for the characteristics of the video.

상기한 학습 과정(S510)을 통해 학습된 인코딩 옵션 예측 모델(321)에 새로운 영상의 프레임 이미지를 입력하게 되면 추론 과정(S520)으로서 해당 이미지의 특성에 맞는 최적의 CRF 카테고리를 예측할 수 있다.When a frame image of a new video is input to the encoding option prediction model 321 learned through the above-described learning process (S510), the optimal CRF category suitable for the characteristics of the image can be predicted as an inference process (S520).

특히, 본 실시예들은 인코딩 옵션 예측 모델(321)을 단일 모델로 구축하여 여러 해상도 영상의 목표 화질 달성 파라미터를 예측할 수 있다.In particular, these embodiments can predict target image quality achievement parameters for multiple resolution images by building the encoding option prediction model 321 as a single model.

스트리밍 서비스에서는 하나의 영상을 여러 해상도로 서비스하며 각 해상도를 목표 화질로 인코딩해야 한다. 목표 화질로 인코딩하기 위한 인코딩 파라미터는 AI를 이용한 영상 분석을 통해 도출 가능하며 각 해상도 별로 도출 결과가 다를 수 있다. 이러한 경우 각 해상도 별로 모델을 따로 운영하는 것이 일반적이다.Streaming services serve one video in multiple resolutions, and each resolution must be encoded to the target quality. Encoding parameters for encoding to the target image quality can be derived through image analysis using AI, and the derived results may be different for each resolution. In these cases, it is common to operate a separate model for each resolution.

예를 들어, 1080p와 720p 영상을 인코딩하는 경우 1080p용 모델과 720p용 모델을 각각 만들고 각 영상을 상응하는 모델로 추론하게 된다. 이와 같이, 여러 해상도를 지원하는 서비스 환경에서 모델이 많아지면서 모델의 개발과 관리 및 유지보수에 어려움이 있다.For example, when encoding 1080p and 720p videos, a model for 1080p and a model for 720p are created respectively, and each video is inferred with the corresponding model. As the number of models increases in a service environment that supports multiple resolutions, it becomes difficult to develop, manage, and maintain models.

일반적으로 스트리밍 서비스의 화질 측정 지표로 VMAF를 사용하고 있다. VMAF는 압축 아티팩트(compression artifact)와 스케일링 아티팩트(scaling artifact)를 모두 고려한 영상 품질 지표이다.In general, VMAF is used as a picture quality measurement indicator for streaming services. VMAF is an image quality indicator that considers both compression artifacts and scaling artifacts.

도 6은 일반적인 VMAF 측정 방법으로, 인코딩한 해상도를 모두 1080p로 리사이즈하여 원본 1080p와 비교한 결과이다.Figure 6 is a general VMAF measurement method, and shows the results of resizing all encoded resolutions to 1080p and comparing them with the original 1080p.

중첩된 VMAF 커브의 컨벡스-헐(convex-hull)을 구하고 컨벡스-헐 상의 점을 선택하면 해당 비트레이트의 최적 해상도 선택이 가능하다. 그러나, 각 해상도의 VMAF 범위가 달라 해상도 별 VMAF 기준을 동일하게 가져갈 수 없다.By obtaining the convex-hull of the overlapping VMAF curve and selecting a point on the convex-hull, it is possible to select the optimal resolution for the corresponding bitrate. However, because the VMAF range of each resolution is different, the VMAF standard for each resolution cannot be set the same.

도 7은 각 해상도의 인코딩 결과를 동일한 해상도의 인코딩 전 원본 영상과 비교하는 것으로, 스케일링 아티팩트를 제외하고 압축 아티팩트만 고려한 결과이다.Figure 7 compares the encoding results at each resolution with the original image before encoding at the same resolution, and shows the results considering only compression artifacts and excluding scaling artifacts.

컨벡스-헐을 만들지 못하나 전체 해상도에 대해서 VMAF 기준을 동일하게 가져갈 수 있다. 다시 말해, 인코딩 옵션 예측 모델(321)의 학습에 사용되는 라벨을 구할 때 전체 해상도에 대해 같은 화질 점수(예를 들어, VMAF 93)를 GT 값으로 가져갈 수 있다.Although it is not possible to create a convex-hull, the VMAF standard can be used equally for all resolutions. In other words, when obtaining the label used for learning the encoding option prediction model 321, the same image quality score (e.g., VMAF 93) for the entire resolution can be taken as the GT value.

이와 같이, 전 해상도에 대해 동일 기준(VMAF 점수)를 만족하는 화질 인코딩 파라미터(CRF, QP(quantization parameter) 등과 같은 고정 화질 인코딩 파라미터) 값을 라벨로 사용하면 CRF를 같은 카테고리로 묶을 수 있다.In this way, if the values of picture quality encoding parameters (fixed picture quality encoding parameters such as CRF, QP (quantization parameter), etc.) that satisfy the same standard (VMAF score) for all resolutions are used as labels, CRFs can be grouped into the same category.

다시 말해, 해상도 별로 동일한 VMAF 기준을 만족하는 CRF값들을 라벨로 구하게 되면 같은 라벨을 가지는 영상들은 해상도가 달라도 같은 라벨의 데이터로 묶을 수 있다. 예를 들어, 1080p CRF 23과 720p CRF 23을 같은 카테고리로 묶을 수 있으므로 학습을 위한 데이터 셋을 구성할 때 CRF 23 카테고리의 데이터 셋을 1080p와 720p 영상을 혼합하여 구성할 수 있다.In other words, if CRF values that satisfy the same VMAF standard for each resolution are obtained as labels, images with the same label can be grouped as data with the same label even if the resolutions are different. For example, 1080p CRF 23 and 720p CRF 23 can be grouped into the same category, so when constructing a data set for learning, the data set in the CRF 23 category can be composed by mixing 1080p and 720p videos.

각 라벨 카테고리를 여러 해상도를 혼합한 데이터 셋으로 구성하여 학습하면 하나의 모델로 여러 해상도의 인코딩에 사용될 수 있다. 1080p와 720p 해상도 인코딩에 대한 CRF를 하나의 모델로 예측할 수 있다.If each label category is learned by constructing a data set that mixes multiple resolutions, one model can be used for encoding of multiple resolutions. CRF for 1080p and 720p resolution encoding can be predicted with one model.

라벨 데이터를 생성하는 방법은 다음과 같다.The method for generating label data is as follows.

각 세그먼트 영상의 최적 CRF 값을 라벨로 사용한다.The optimal CRF value of each segment image is used as a label.

도 8을 참조하면, 분산 인코딩 시스템(300)은 세그먼트 영상(801)에 대해 CRF 값을 달리하여 인코딩을 반복하고(S81), 이때 인코딩 결과 화질을 측정하여(S83) 일정 수준의 화질을 만족하는지 여부를 판단한다(S83).Referring to FIG. 8, the distributed encoding system 300 repeats encoding by varying the CRF value for the segment image 801 (S81), and at this time, measures the image quality of the encoding result (S83) to determine whether a certain level of image quality is satisfied. Determine whether or not (S83).

분산 인코딩 시스템(300)은 상기한 과정(S81 내지 S83)을 통해 일정 수준의 화질을 만족하는 CRF 값을 찾아 라벨(802)을 구할 수 있다.The distributed encoding system 300 can obtain the label 802 by finding a CRF value that satisfies a certain level of image quality through the above-described processes (S81 to S83).

화질의 기준은 사람의 지각적인 부분을 가장 잘 반영하는 VMAF을 사용하고, 예를 들어 스트리밍 서비스에 적합한 VMAF 93을 기준으로 할 수 있다. 각 세그먼트 영상(801)의 VMAF 값이 93이 될 때까지 CRF를 변경하면서 인코딩하고 VMAF 값이 93이 되는 시점의 CRF 값을 라벨(802)로 선정한다.The standard for picture quality is VMAF, which best reflects human perception. For example, VMAF 93, which is suitable for streaming services, can be used as the standard. Each segment image 801 is encoded by changing the CRF until the VMAF value becomes 93, and the CRF value at the point when the VMAF value becomes 93 is selected as the label 802.

예를 들어, 1080p와 720p 해상도 인코딩에 대한 라벨 생성 과정은 다음과 같다.For example, the label creation process for 1080p and 720p resolution encoding is as follows.

1080p 해상도 인코딩 라벨은 VMAF 93을 기준으로 원본 1080p를 베이스로 라벨 CRF를 구한다.The 1080p resolution encoding label obtains the label CRF based on the original 1080p based on VMAF 93.

720p 해상도 인코딩 라벨의 경우, 먼저 원본 1080p를 720p로 리사이즈하고 리사이즈된 720p를 베이스로 CRF를 달리하여 인코딩을 반복하면서 라벨 CRF를 구한다. 화질 기준은 1080p와 동일하게 VMAF 93으로 한다. 이때, VMAF를 구할 때 화질 비교 원본은 1080p를 720p로 리사이즈한 영상이 된다. 즉, 리사이즈 아티팩트는 고려하지 않고 압축 아티팩트만 고려해서 VMAF를 측정하고 이에 맞는 CRF를 구한다. 압축 아티팩트만 고려한 특정 화질(VMAF)을 만족하는 CRF를 각 해상도 별로 구하고, 이때 CRF 카테고리는 여러 해상도로 묶을 수 있으므로 1080p와 720p 해상도를 묶어서 각 카테고리의 데이터 셋을 생성할 수 있다.In the case of a 720p resolution encoding label, the original 1080p is first resized to 720p, and the label CRF is obtained by repeating encoding with different CRFs based on the resized 720p. The quality standard is VMAF 93, the same as 1080p. At this time, when obtaining VMAF, the original image quality comparison is a video resized from 1080p to 720p. In other words, the VMAF is measured by considering only the compression artifact, without considering the resize artifact, and the corresponding CRF is obtained. A CRF that satisfies a specific image quality (VMAF) considering only compression artifacts is obtained for each resolution. At this time, the CRF category can be grouped into multiple resolutions, so 1080p and 720p resolutions can be grouped to create a data set for each category.

따라서, 1080p와 720p가 혼합된 데이터 셋으로 1080p와 720p 해상도를 모두 지원하는 단일 모델을 구축할 수 있고, 이를 통해 1080p 세그먼트 영상과 720p 세그먼트 영상에 대한 추론 결과로 최적의 CRF 결과를 획득할 수 있다.Therefore, it is possible to build a single model that supports both 1080p and 720p resolutions with a mixed 1080p and 720p data set, and through this, optimal CRF results can be obtained as inference results for 1080p segmented images and 720p segmented images. .

라벨 데이터를 생성하는 과정에서 예를 들어 CRF를 1 단위로 변경하면서 인코딩을 진행하는 경우 VMAF 93에 가장 가까운 CRF를 선정하기 때문에 라벨의 정확도가 떨어지는 문제가 있다.In the process of generating label data, for example, when encoding is performed while changing the CRF in increments of 1, the CRF closest to VMAF 93 is selected, so there is a problem of reduced label accuracy.

이를 해결하기 위해, CRF 단위에 대한 테스트를 통해서 CRF를 1보다 작은 단위, 예를 들어 0.5 단위로 하여 라벨 클래스를 적용함으로써 라벨의 정확도를 개선할 수 있다.To solve this, the accuracy of the label can be improved by applying a label class with the CRF in units smaller than 1, for example, 0.5, through testing on the CRF unit.

따라서, CRF 예측을 위해 구축된 인코딩 옵션 예측 모델(321)을 이용함으로써 인코딩을 최적화하고 반복적인 과정을 최소화할 수 있다.Therefore, by using the encoding option prediction model 321 built for CRF prediction, encoding can be optimized and repetitive processes can be minimized.

더 나아가, 본 실시예들은 서비스 제약 사항 하에서 효율적인 목표 화질 달성 파라미터를 예측할 수 있다.Furthermore, these embodiments can predict efficient target picture quality achievement parameters under service constraints.

스트리밍 서비스를 위한 영상 압축 시 영상 화질 저하의 최소화와 함께 끊김 없는 시청을 보장하기 위한 비트레이트 제약이 필요하다. 따라서, 비트레이트 제약 이내로 결과 영상을 생성하는 것이 필요하며, 서비스 별로 비트레이트 제약 및 제약의 중요도가 다르다. 동일 비트레이트에서 되도록 높은 화질의 결과물을 만드는 것 또한 중요하다.When compressing video for streaming services, bitrate restrictions are necessary to minimize video quality degradation and ensure uninterrupted viewing. Therefore, it is necessary to generate the resulting image within the bitrate constraint, and the bitrate constraint and importance of the constraint are different for each service. It is also important to produce results with the highest quality possible at the same bit rate.

도 9는 본 발명의 일실시예에 있어서 서비스 제약을 고려한 실제 인코딩에 적용할 화질 파라미터를 도출하는 과정의 일례를 도시한 것이다.Figure 9 shows an example of a process for deriving picture quality parameters to be applied to actual encoding considering service constraints in one embodiment of the present invention.

도 9를 참조하면, 분산 인코딩 시스템(300)은 입력 영상(901)의 피처를 추출한 후(S900) 추출된 피처에 최적화된 인코딩 옵션을 예측할 수 있다(S901 내지 S904).Referring to FIG. 9, the distributed encoding system 300 may extract features of the input image 901 (S900) and then predict an encoding option optimized for the extracted features (S901 to S904).

인코딩 옵션을 예측하는 과정에서 단계(S903)를 제외한 나머지 단계(S901, S902, S903)는 AI 기반의 인코딩 옵션 예측 모델(321)을 이용한다.In the process of predicting the encoding option, the remaining steps (S901, S902, S903) except step (S903) use the AI-based encoding option prediction model (321).

상세하게, 분산 인코딩 시스템(300)은 입력 영상(901)의 피처에 대응되는 인코딩 옵션으로서 목표 화질(예를 들어, VMAF 93)을 만족하는 화질 인코딩 파라미터인 제1 CRF(CRF1)를 예측할 수 있다(S901). 제1 CRF를 예측하는 방법은 위에서 설명한 바와 동일하므로 구체적인 설명은 생략한다.In detail, the distributed encoding system 300 can predict the first CRF (CRF1), which is an image quality encoding parameter that satisfies the target image quality (e.g., VMAF 93) as an encoding option corresponding to the features of the input image 901. (S901). Since the method for predicting the first CRF is the same as described above, detailed description is omitted.

분산 인코딩 시스템(300)은 입력 영상(901)의 피처에 대응되는 인코딩 옵션으로서 목표 비트레이트를 만족하는 화질 인코딩 파라미터인 제2 CRF(CRF2)를 예측할 수 있다(S902). 영상 화질 조절 파라미터로는 일반적으로 구간 별로 고른 화질을 보이는 CRF 옵션을 사용하게 되는데, CRF로 인코딩 시 인코딩 결과 영상의 비트레이트를 알 수 없다. 목표 비트레이트를 만족하는 CRF를 찾기 위해서는 CRF 값을 조절하며 반복 인코딩을 수행한 후 결과 비트레이트를 확인해야 하며 이는 매우 큰 인코딩 비용이 발생한다. 적은 인코딩 비용으로 목표 화질과 서비스 비트레이트 제한을 만족하기 위해서는 목표 비트레이트를 만족하는 제2 CRF 예측이 필요하다. 목표 화질을 만족하는 제1 CRF와 함께 목표 비트레이트를 만족하는 제2 CRF를 예측하면 비용 절감과 함께 서비스 품질 향상이 가능하다.The distributed encoding system 300 may predict a second CRF (CRF2), which is an image quality encoding parameter that satisfies the target bitrate, as an encoding option corresponding to the features of the input image 901 (S902). As a video quality control parameter, the CRF option, which provides uniform picture quality for each section, is generally used, but when encoding with CRF, the bitrate of the resulting video is unknown. In order to find a CRF that satisfies the target bitrate, the CRF value must be adjusted, repeated encoding must be performed, and the resulting bitrate must be checked, which incurs a very large encoding cost. In order to satisfy the target image quality and service bitrate limitations with a low encoding cost, a second CRF prediction that satisfies the target bitrate is required. By predicting a second CRF that satisfies the target bit rate along with a first CRF that satisfies the target image quality, it is possible to reduce costs and improve service quality.

제2 CRF 예측을 위해서는 각 세그먼트 영상의 특정 비트레이트를 만족하는 CRF 값을 라벨로 사용한다. 각 세그먼트 영상에 대해 CRF 값을 달리 인코딩해서 목표 비트레이트를 만족하는 CRF 값을 찾고 해당 CRF 값을 라벨로 선정할 수 있다. 목표 비트레이트는 어플리케이션마다 상이할 수 있으며, 예를 들어 카테고리 1의 경우 제2 CRF 23, 카테고리 2의 경우 제2 CRF 23.5, 카테고리 3의 경우 제2 CRF 24 등과 같이 라벨 데이터를 생성할 수 있다.For the second CRF prediction, the CRF value that satisfies the specific bitrate of each segment image is used as a label. By encoding the CRF value differently for each segment image, you can find the CRF value that satisfies the target bitrate and select that CRF value as a label. The target bit rate may be different for each application, and for example, label data may be generated as 2nd CRF 23 for category 1, 2nd CRF 23.5 for category 2, 2nd CRF 24 for category 3, etc.

실시예에 따라서는 제1 CRF와 제2 CRF를 합쳐서 라벨 데이터를 생성하는 것 또한 가능하다. 예를 들어 카테고리 1의 경우 제1 CRF 23과 제2 CRF 24, 카테고리 2의 경우 제1 CRF 23과 제2 CRF 25, 카테고리 3의 경우 제1 CRF 23과 제2 CRF 26 등과 같이 라벨 데이터를 생성할 수 있다. 제1 CRF와 제2 CRF가 결합된 라벨 데이터를 사용하는 경우, 도 9에서 단계(S902)와 단계(S903)가 생략되고 단계(S901)에서 예측된 CRF를 인코딩에 적용하기 위한 최종 CRF로 사용할 수 있다.Depending on the embodiment, it is also possible to generate label data by combining the first CRF and the second CRF. For example, for category 1, label data is generated as 1st CRF 23 and 2nd CRF 24, for category 2, 1st CRF 23 and 2nd CRF 25, for category 3, 1st CRF 23 and 2nd CRF 26, etc. can do. When using label data in which the first CRF and the second CRF are combined, steps S902 and S903 are omitted in FIG. 9 and the CRF predicted in step S901 is used as the final CRF to apply to encoding. You can.

인코딩 옵션 예측 모델(321)은 입력 영상(901)을 구성하는 프레임의 이미지들을 입력받아 딥러닝(CNN 및 RNN)으로 영상의 피처를 추출하고 추출된 피처를 이용하여 각 CRF에 상응하는 클래스로 분류하도록 지도 학습으로 학습된다. 이때, CRF 클래스의 정답 라벨은 상기에서 라벨 데이터로 생성된 제2 CRF가 된다. 제1 CRF와 제2 CRF를 합쳐서 라벨 데이터를 생성한 경우 CRF 클래스의 정답 라벨은 제1 CRF와 제2 CRF의 조합으로 구성된다. 이러한 라벨 데이터 셋으로 학습된 인코딩 옵션 예측 모델(321)에 분석 대상이 되는 영상의 프레임들을 입력하면 비트레이트를 만족하는 CRF에 해당하는 클래스를 출력할 수 있다.The encoding option prediction model 321 receives the images of the frames constituting the input image 901, extracts the features of the image using deep learning (CNN and RNN), and classifies them into classes corresponding to each CRF using the extracted features. It is learned through supervised learning. At this time, the correct answer label of the CRF class becomes the second CRF generated from the label data above. When label data is generated by combining the first CRF and the second CRF, the correct answer label of the CRF class consists of a combination of the first CRF and the second CRF. By inputting frames of the video to be analyzed into the encoding option prediction model 321 learned with this label data set, a class corresponding to a CRF that satisfies the bit rate can be output.

분산 인코딩 시스템(300)은 목표 화질을 만족하는 제1 CRF와 목표 비트레이트를 만족하는 제2 CRF를 고려하여 실제 인코딩에 적용할 최종 CRF인 제3 CRF(CRF3)를 결정할 수 있다(S903). 일례로, 분산 인코딩 시스템(300)은 제1 CRF가 제2 CRF보다 크거나 같은 경우 제1 CRF를 제3 CRF로 결정할 수 있다. 한편, 분산 인코딩 시스템(300)은 제1 CRF가 제2 CRF보다 작은 경우 모델 학습과 관련된 사용자 입력 값인 비트레이트 제한 준수 가중치를 고려하여 제3 CRF를 결정할 수 있다(S903). 비트레이트 제약이 엄격해야 하는 어플리케이션에서는 제1 CRF가 제2 CRF보다 작은 경우 제2 CRF를 제3 CRF로 결정할 수 있다. 비트레이트 제약이 상대적으로 덜 엄격한 어플리케이션에서는 화질 이득을 위해 제1 CRF 쪽에 더 가까운 값을 선택할 수 있다. 이때, 분산 인코딩 시스템(300)은 사용자 입력 값인 비트레이트 제한 준수 가중치(w)를 이용하여 CRF 값을 보정함으로써 제3 CRF를 결정할 수 있다(수학식 1).The distributed encoding system 300 may determine the third CRF (CRF3), which is the final CRF to be applied to actual encoding, by considering the first CRF that satisfies the target image quality and the second CRF that satisfies the target bit rate (S903). For example, the distributed encoding system 300 may determine the first CRF to be the third CRF when the first CRF is greater than or equal to the second CRF. Meanwhile, if the first CRF is smaller than the second CRF, the distributed encoding system 300 may determine the third CRF by considering the bitrate limit compliance weight, which is a user input value related to model learning (S903). In applications where bitrate constraints must be strict, if the first CRF is smaller than the second CRF, the second CRF may be determined as the third CRF. In applications where bitrate constraints are relatively less stringent, a value closer to the first CRF can be selected for image quality gain. At this time, the distributed encoding system 300 may determine the third CRF by correcting the CRF value using the bitrate limit compliance weight (w), which is a user input value (Equation 1).

[수학식 1][Equation 1]

CRF3 = CRF1×(1-w)+CRF2×wCRF3 = CRF1×(1-w)+CRF2×w

비트레이트 제한 준수 가중치는 0에서 1.0 이내의 값으로 입력될 수 있다. 가중치가 0인 경우 비트레이트 제한을 준수하지 않고 제1 CRF를 제3 CRF로 결정하고, 가중치가 1인 경우 비트레이트 제한을 엄격히 준수하도록 제2 CRF를 제3 CRF로 결정할 수 있다.The bitrate limit compliance weight can be entered as a value between 0 and 1.0. If the weight is 0, the first CRF may be determined as the third CRF without complying with the bit rate limit, and if the weight is 1, the second CRF may be determined as the third CRF to strictly comply with the bit rate limit.

분산 인코딩 시스템(300)은 입력 영상의 피처에 대응되는 추가적인 인코딩 옵션으로서 QP 제한 파라미터를 예측할 수 있다(S904).The distributed encoding system 300 can predict the QP limitation parameter as an additional encoding option corresponding to the features of the input image (S904).

화질 인코딩 파라미터 중 하나인 CRF는 프레임 QP를 결정한다. 인코더는 체감 화질 향상을 위한 AQ(adaptive quantization), MB(macroblock)-트리와 같은 다양한 블록 단위 QP 결정 알고리즘을 탑재하고 있으며 CRF와 해당 인코딩 옵션들을 조합해서 사용하는 경우가 일반적이다. QP 결정 알고리즘은 CRF로부터 결정된 프레임 QP에 각 블록 별로 오프셋 값을 적용하여 블록 QP를 결정하게 되는데, 이때 지나치게 낮거나 높은 QP가 할당되는 블록이 생기게 된다. 인코더는 QP 제한을 위해 최소 QP(minqp) 및 최대 QP(maxqp)와 같은 옵션을 제공하며, 이를 활용하여 블록들의 QP 값을 적절한 수준으로 제한하면 체감 화질을 유지하면서 추가적인 비트레이트 절감을 달성할 수 있다.CRF, one of the picture quality encoding parameters, determines the frame QP. The encoder is equipped with various block-level QP decision algorithms such as AQ (adaptive quantization) and MB (macroblock)-tree to improve perceived picture quality, and it is common to use a combination of CRF and the corresponding encoding options. The QP determination algorithm determines the block QP by applying an offset value for each block to the frame QP determined from the CRF. At this time, blocks that are assigned an excessively low or high QP are created. The encoder provides options such as minimum QP (minqp) and maximum QP (maxqp) for QP limitation, and by using these to limit the QP value of blocks to an appropriate level, additional bitrate reduction can be achieved while maintaining perceived image quality. there is.

분산 인코딩 시스템(300)은 제3 CRF와 함께 적용 시 최적의 비트레이트 대비 화질을 얻을 수 있는 최소 QP 값과 최대 QP 값을 도출할 수 있다. 최소 QP는 과도하게 높은 화질을 방지하고 최대 QP는 과도하게 낮은 화질을 방지하는 역할을 하며, 블록 단위로 적용하여 시각적으로 보다 고른 화질을 달성할 수 있다. 일례로, 분산 인코딩 시스템(300)은 제3 CRF에 오프셋을 적용하여 최소 QP 값과 최대 QP 값을 도출할 수 있다(minqp=CRF3-offset, maxqp=CRF3+offset). 예를 들어, CRF 오프셋을 2라 할 때, CRF3에서 2를 뺀 값을 최소 QP 값으로 설정하고, CRF3에 2를 더한 값을 최대 QP 값으로 설정할 수 있다.When applied together with the third CRF, the distributed encoding system 300 can derive the minimum QP value and maximum QP value that can obtain the optimal bitrate-to-image quality. The minimum QP prevents excessively high image quality, and the maximum QP prevents excessively low image quality, and can be applied in blocks to achieve visually more even image quality. For example, the distributed encoding system 300 may derive the minimum QP value and maximum QP value by applying an offset to the third CRF (minqp=CRF3-offset, maxqp=CRF3+offset). For example, when the CRF offset is 2, the value obtained by subtracting 2 from CRF3 can be set as the minimum QP value, and the value obtained by adding 2 to CRF3 can be set as the maximum QP value.

분산 인코딩 시스템(300)은 제3 CRF와 함께 최소 QP 값과 최대 QP 값을 인코딩 옵션으로 적용할 수 있으며, 실시예에 따라서는 추론 시간을 줄이기 위해 QP 제한 파라미터를 예측하는 과정(S904)을 생략하고 제3 CRF만을 실제 인코딩에 적용하는 것 또한 가능하다.The distributed encoding system 300 may apply the minimum QP value and maximum QP value as encoding options along with the third CRF, and depending on the embodiment, the process of predicting the QP limit parameter (S904) is omitted to reduce the inference time. And it is also possible to apply only the third CRF to actual encoding.

QP 제한 파라미터를 예측하기 위해서는 각 세그먼트에 대해 결정된 인코딩 CRF(CRF3)와 조합 시 최적 비용을 만족하는 CRF 오프셋을 라벨로 사용한다. 각 세그먼트 영상에 대해 CRF 값을 달리 하고 각 CRF 값 당 최소 QP와 최대 QP(또는 오프셋)를 다르게 설정하여 인코딩한다. 각 인코딩 결과의 비트레이트와 화질(VMAF)을 구한 후 이로부터 비용을 구해 비용이 가장 작은 영상의 오프셋 값을 라벨로 선정할 수 있다.To predict the QP limitation parameter, the CRF offset that satisfies the optimal cost when combined with the encoding CRF (CRF3) determined for each segment is used as a label. Each segment image is encoded by setting different CRF values and setting different minimum QP and maximum QP (or offset) for each CRF value. After obtaining the bitrate and image quality (VMAF) of each encoding result, the cost can be obtained from this, and the offset value of the image with the lowest cost can be selected as the label.

비용은 수학식 2와 같이 정의될 수 있다Cost can be defined as Equation 2:

[수학식 2][Equation 2]

Cost = rate+λ×dCost = rate+λ×d

여기서, λ(lambda) 값은 비트레이트 절감을 우선할지 또는 화질 보존을 우선할지에 따라 모델 학습 시 사용자 입력 값으로 결정된다.Here, the λ (lambda) value is determined as a user input value when learning the model depending on whether to prioritize bit rate reduction or image quality preservation.

영상의 피처와 이전 단계에서 결정된 제3 CRF이 입력 값이 되며, 예를 들어 카테고리 1의 경우 CRF오프셋 2, 카테고리 2의 경우 CRF오프셋 4, 카테고리 3의 경우 CRF오프셋 6 등과 같이 라벨 데이터를 생성할 수 있다. 최종적으로 인코딩 적용 파라미터는 제3 CRF, 최소 QP(CRF3-offset), 최대 QP(CRF3+offset)가 된다.The features of the image and the third CRF determined in the previous step are input values, and label data can be generated, for example, CRF offset 2 for category 1, CRF offset 4 for category 2, and CRF offset 6 for category 3. You can. Finally, the encoding application parameters are the third CRF, minimum QP (CRF3-offset), and maximum QP (CRF3+offset).

일반적으로 QP가 낮아질수록 비트레이트 대비 화질 이득이 떨어지므로 오프셋은 비대칭 적용 가능하다. 다시 말해, 최소 QP는 (CRF3-offset+1)과 같이 결정되고, 최대 QP는 (CRF3+offset)과 같이 적용될 수 있다. 예를 들어, 제3 CRF 값이 26이고, 오프셋이 4라고 할 때, 최소 QP 값은 23, 최대 QP 값은 30이 될 수 있다.In general, the lower the QP, the lower the image quality gain compared to the bit rate, so the offset can be applied asymmetrically. In other words, the minimum QP can be determined as (CRF3-offset+1), and the maximum QP can be applied as (CRF3+offset). For example, when the third CRF value is 26 and the offset is 4, the minimum QP value may be 23 and the maximum QP value may be 30.

인코딩 옵션 예측 모델(321)은 입력 영상을 구성하는 프레임의 이미지들을 입력받아 딥러닝(CNN 및 RNN)으로 영상의 피처를 추출하고 해당 피처와 인코딩 CRF 별 최적의 코스트 클래스로 분류하도록 지도 학습으로 학습된다. 이때, CRF 클래스의 정답 라벨은 상기에서 라벨 데이터로 생성된 CRF 오프셋으로 구성된다. 이러한 라벨 데이터 셋으로 학습된 인코딩 옵션 예측 모델(321)에 분석 대상이 되는 영상의 프레임 이미지와 제3 CRF 값을 입력하게 되면 분류 결과 카테고리에 상응하는 QP 제한 파라미터로서 CRF 오프셋 값을 얻을 수 있다.The encoding option prediction model 321 receives the images of the frames that make up the input video, extracts the features of the image using deep learning (CNN and RNN), and learns through supervised learning to classify the features and the optimal cost class for each encoding CRF. do. At this time, the correct answer label of the CRF class consists of the CRF offset generated from the label data above. By inputting the frame image of the video to be analyzed and the third CRF value into the encoding option prediction model 321 learned with this label data set, the CRF offset value can be obtained as a QP limitation parameter corresponding to the classification result category.

따라서, 본 실시예에서는 AI 기반의 인코딩 옵션 예측 모델(321)을 이용하여 인코딩에 적용할 최적의 인코딩 옵션을 예측함에 있어 목표 화질을 만족하는 화질 인코딩 파라미터를 예측하는 것은 물론이고, 서비스 제약 사항 하에서 목표 비트레이트를 만족하는 화질 인코딩 파라미터 및/또는 화질 균형 조건을 만족하는 QP 제한 파라미터를 추가로 예측할 수 있다.Therefore, in this embodiment, in predicting the optimal encoding option to be applied to encoding using the AI-based encoding option prediction model 321, it is necessary to predict image quality encoding parameters that satisfy the target image quality, as well as predict image quality encoding parameters under service constraints. A picture quality encoding parameter that satisfies the target bitrate and/or a QP limit parameter that satisfies the picture quality balance condition can be additionally predicted.

한편, 이상의 실시예들에서는 프레임 이미지 개수의 경우 정확도 향상을 위해 분석 세그먼트 단위(일례로, 4초 세그먼트 영상)에서 250ms 간격으로 15장의 이미지를 모델 입력으로 사용할 수 있음을 설명하였다.Meanwhile, in the above embodiments, it was explained that in the case of the number of frame images, 15 images at 250 ms intervals in analysis segment units (for example, 4-second segment images) can be used as model input to improve accuracy.

도 10은 본 발명의 일실시예에 있어서, 세그먼트 단위의 영상 분석 과정의 예를 도시한 도면이다. 세그먼트 분석부(segment analyzer, 1010)는 앞서 설명한 AI 서빙 모듈(320)에 대응할 수 있으며, 세그먼트 분석부(1010)가 포함하는 세그먼트 분석 모듈(segment analyzing module, 1012)은 인코딩 옵션 예측 모델(321)에 대응할 수 있다.Figure 10 is a diagram illustrating an example of an image analysis process on a segment basis, according to an embodiment of the present invention. The segment analyzer (1010) can correspond to the AI serving module (320) described above, and the segment analyzing module (1012) included in the segment analyzer (1010) is an encoding option prediction model (321). can respond.

세그먼트 분석부(1010)는 비디오 프레임(decoded video frame, 1020)을 입력받아 현재 세그먼트를 위한 프레임 이미지들을 세그먼트 버퍼(segment buffer, 1011)에 저장할 수 있다. 이후, 세그먼트 분석 모듈(1012)은 세그먼트 버퍼(1011)에 저장된 프레임 이미지들을 가져다가 해당 세그먼트의 프레임 이미지들을 위한 최적의 인코딩 옵션을 예측할 수 있다. 도 10에서는 인코딩 옵션을 품질 파라미터(quality parameter(CRF))로 표시하고 있다. 이때, 세그먼트 분석 모듈(1012)은 예측된 인코딩 옵션을 세그먼트 버퍼(1011)로 전달할 수 있으며, 세그먼트 버퍼(1011)는 예측된 인코딩 옵션과 함께 버퍼링되고 있는 프레임 이미지들을 인코더(1030)로 전달할 수 있다. The segment analysis unit 1010 may receive a video frame (decoded video frame, 1020) and store frame images for the current segment in a segment buffer (1011). Thereafter, the segment analysis module 1012 can take the frame images stored in the segment buffer 1011 and predict the optimal encoding option for the frame images of the corresponding segment. In Figure 10, encoding options are indicated as quality parameters (CRF). At this time, the segment analysis module 1012 can transmit the predicted encoding option to the segment buffer 1011, and the segment buffer 1011 can transmit the buffered frame images along with the predicted encoding option to the encoder 1030. .

이때, 세그먼트 단위의 영상 분석을 통한 인코딩 최적화 기술을 단일 인코딩에 적용 시, 분석(인퍼런스) 시간동안 인코더가 대기해야 하기 때문에 시스템 사용률이 떨어질 수 있다. 인코딩 과정은 매우 집중적인(intensive) 과정으로, 인코더의 시스템 사용률 저하는 인코딩 수행 시간에 큰 영향을 미칠 수 있다.At this time, when applying encoding optimization technology through segment-level video analysis to single encoding, system utilization may decrease because the encoder must wait during the analysis (inference) time. The encoding process is a very intensive process, and a decrease in the encoder's system utilization can have a significant impact on the encoding performance time.

따라서 인퍼런스 수행 시에 시스템 사용률이 저하되는 것을 방지하기 위해, 둘 이상의 버퍼를 사용하여 하나의 버퍼가 인퍼런스 중인 경우, 다른 버퍼에 지속적으로 디코딩된 비디오 프레임들을 저장하도록 함으로써, 인코딩 파이프라인 동작이 끊기지 않고 이루어지도록 할 수 있다.Therefore, in order to prevent system utilization from decreasing when performing inference, the encoding pipeline uses two or more buffers to continuously store decoded video frames in the other buffer when one buffer is inferred. Movement can be performed without interruption.

도 11은 본 발명의 일실시예에 있어서, 두 개의 버퍼를 사용한 세그먼트 단위의 영상 분석 과정의 예를 도시한 도면이다. 세그먼트 분석부(segment analyzer, 1110)는 앞서 설명한 AI 서빙 모듈(320)에 대응할 수 있으며, 세그먼트 분석부(1110)가 포함하는 세그먼트 분석 모듈(segment analyzing module, 1113)은 인코딩 옵션 예측 모델(321)에 대응할 수 있다. 이때, 하나의 세그먼트 분석 모듈(1113)이 두 개의 세그먼트 버퍼(1111, 1112) 각각에 저장된 프레임 이미지들을 처리할 수 있다.Figure 11 is a diagram illustrating an example of a segment-level image analysis process using two buffers, according to an embodiment of the present invention. The segment analyzer (1110) can correspond to the AI serving module (320) described above, and the segment analyzing module (1113) included in the segment analyzer (1110) is an encoding option prediction model (321). can respond. At this time, one segment analysis module 1113 can process frame images stored in each of the two segment buffers 1111 and 1112.

먼저, 세그먼트 버퍼 1(1111)에서 디코딩된 비디오 프레임의 프레임 이미지들에 대한 저장이 시작될 수 있다. 이때, 세그먼트 버퍼 1(1111)에 첫 번째 세그먼트 분량의 프레임 이미지들이 저장되면, 세그먼트 분석부(1110)는 저장 버퍼를 세그먼트 버퍼 2(1112)로 바꾸고 분석 과정(인퍼런스)을 수행할 수 있다. 예를 들어, 세그먼트 분석 모듈(1113)은 세그먼트 버퍼 1(1111)에 저장된 프레임 이미지들을 이용하여 인코딩 옵션을 예측할 수 있다. 이러한 분석 과정 동안 세그먼트 버퍼 2(1112)에는 계속적으로 두 번째 세그먼트 분량의 프레임 이미지들이 저장될 수 있다. 이후, 세그먼트 버퍼 1(1111)에 저장된 프레임 이미지들에 대한 인코딩 옵션의 예측이 완료되면, 세그먼트 분석 모듈(1113)은 예측된 인코딩 옵션을 세그먼트 버퍼 1(1111)로 전달할 수 있고, 세그먼트 버퍼 1(1111)은 예측된 인코딩 옵션과 저장된 프레임 이미지들을 인코더(1130)로 전달할 수 있다.First, storage of frame images of the decoded video frame may begin in segment buffer 1 (1111). At this time, when frame images for the first segment are stored in segment buffer 1 (1111), the segment analysis unit 1110 changes the storage buffer to segment buffer 2 (1112) and performs an analysis process (inference). . For example, the segment analysis module 1113 can predict encoding options using frame images stored in segment buffer 1 (1111). During this analysis process, frame images for the second segment may be continuously stored in segment buffer 2 1112. Thereafter, when the prediction of the encoding options for the frame images stored in segment buffer 1 (1111) is completed, the segment analysis module 1113 may transfer the predicted encoding options to segment buffer 1 (1111), and segment buffer 1 ( 1111) may transmit predicted encoding options and stored frame images to the encoder 1130.

이후, 세그먼트 버퍼 2(1112)에 두 번째 세그먼트 분량의 프레임 이미지들이 저장되면, 세그먼트 분석 모듈(1113)은 세그먼트 버퍼 2(1112)에 저장된 프레임 이미지들에 대해 인퍼런스를 수행할 수 있다. 이때는 세그먼트 버퍼 1(1111)의 프레임 이미지들이 인코더(1130)로 전달된 이후이므로, 세그먼트 버퍼 1(1111)에는 세 번째 세그먼트 분량의 프레임 이미지들의 저장이 시작될 수 있다. 만약, 인퍼런스 지연에 의해 세그먼트 버퍼 1(1111)에 첫 번째 세그먼트 분량의 프레임 이미지들이 남아 있는 경우, 첫 번째 세그먼트 분량의 프레임 이미지들이 모두 인코더(1130)에 전달될 때까지 분석 과정은 대기할 수 있다. 다만 이러한 상황은 거의 발생하지 않는다. 이처럼, 두 개의 세그먼트 버퍼(1111, 1112)를 이용하여 프레임 이미지들을 번갈아 저장함에 따라 인퍼런스 시 트랜스코딩 파이프라인이 끊기지 않도록 할 수 있다.Thereafter, when frame images for the second segment are stored in segment buffer 2 (1112), the segment analysis module 1113 may perform inference on the frame images stored in segment buffer 2 (1112). At this time, since the frame images of segment buffer 1 (1111) have been transferred to the encoder 1130, storage of frame images for the third segment may begin in segment buffer 1 (1111). If frame images for the first segment remain in segment buffer 1 (1111) due to inference delay, the analysis process must wait until all frame images for the first segment are delivered to the encoder (1130). You can. However, this situation rarely occurs. In this way, by alternately storing frame images using the two segment buffers 1111 and 1112, the transcoding pipeline can be prevented from being interrupted during inference.

이상의 설명을 통해 상황에 따라 셋 이상의 세그먼트 버퍼들이 사용될 수도 있음을 쉽게 이해할 수 있을 것이다. 동작 방식은 두 개의 세그먼트 버퍼를 사용하는 경우와 동일하며 세 개 이상의 세그먼트 버퍼들이 순차적으로 각 세그먼트 분량의 프레임 이미지들을 저장할 수 있다. 둘 이상의 세그먼트 버퍼를 사용한 분석 과정은 높은 처리량을 유지해야 하는 비 실시간 인코딩에 적합하다. 또한, 둘 이상의 세그먼트 버퍼를 사용한 분석 과정은 하나의 인코딩 프로세스로 영상 전체를 인코딩하는 단일 인코딩 방식에 적용되는 것으로, 단일 인코딩 도중, 인코더는 세그먼트 분석부(1110)가 전달하는 인코딩 옵션(CRF의 값)으로 조절하여 인코딩을 처리할 수 있다.Through the above explanation, it will be easy to understand that three or more segment buffers may be used depending on the situation. The operation method is the same as when using two segment buffers, and three or more segment buffers can sequentially store frame images for each segment. The analysis process using two or more segment buffers is suitable for non-real-time encoding that must maintain high throughput. In addition, the analysis process using two or more segment buffers is applied to a single encoding method that encodes the entire image with one encoding process. During single encoding, the encoder uses the encoding option (value of CRF) transmitted by the segment analysis unit 1110. ) can be adjusted to handle encoding.

도 12는 본 발명의 일실시예에 따른 인코딩 최적화 방법의 예를 도시한 흐름도이다. 본 실시예에 따른 인코딩 최적화 방법은 컴퓨터 장치(200)에 의해 수행될 수 있다. 이때, 컴퓨터 장치(200)의 프로세서(220)는 메모리(210)가 포함하는 운영체제의 코드나 적어도 하나의 컴퓨터 프로그램의 코드에 따른 제어 명령(instruction)을 실행하도록 구현될 수 있다. 여기서, 프로세서(220)는 컴퓨터 장치(200)에 저장된 코드가 제공하는 제어 명령에 따라 컴퓨터 장치(200)가 도 12의 방법이 포함하는 단계들(1210 내지 1270)을 수행하도록 컴퓨터 장치(200)를 제어할 수 있다.Figure 12 is a flowchart showing an example of an encoding optimization method according to an embodiment of the present invention. The encoding optimization method according to this embodiment may be performed by the computer device 200. At this time, the processor 220 of the computer device 200 may be implemented to execute control instructions according to the code of an operating system included in the memory 210 or the code of at least one computer program. Here, the processor 220 causes the computer device 200 to perform steps 1210 to 1270 included in the method of FIG. 12 according to control instructions provided by code stored in the computer device 200. can be controlled.

단계(1210)에서 컴퓨터 장치(200)는 입력 영상의 첫 번째 세그먼트에 대응하는 프레임 이미지들을 제1 세그먼트 버퍼에 저장할 수 있다. 예를 들어, 세그먼트의 길이가 4초이고 인퍼런스 간격이 250ms라 가정할 때, 컴퓨터 장치(200)는 첫 번째 세그먼트에서 250ms 간격으로 추출되는 15장의 프레임 이미지들을 제1 세그먼트 버퍼에 저장할 수 있다.In step 1210, the computer device 200 may store frame images corresponding to the first segment of the input image in the first segment buffer. For example, assuming that the segment length is 4 seconds and the inference interval is 250 ms, the computer device 200 can store 15 frame images extracted at 250 ms intervals from the first segment in the first segment buffer. .

단계(1220)에서 컴퓨터 장치(200)는 제1 세그먼트 버퍼에 첫 번째 세그먼트에 대응하는 프레임 이미지들의 저장이 완료됨에 응답하여 저장 버퍼를 제2 세그먼트 버퍼로 변경하여 두 번째 세그먼트에 대응하는 프레임 이미지들을 제2 세그먼트 버퍼에 저장할 수 있다. 예를 들어, 컴퓨터 장치(200)는 두 번째 세그먼트에 대응하여 250ms 간격으로 추출되는 15장의 프레임 이미지들을 제2 세그먼트 버퍼에 저장할 수 있다. 이러한 제2 세그먼트 버퍼로의 프레임 이미지들의 저장은 이후 단계(1230)에서 컴퓨터 장치(200)가 인코딩 옵션을 예측하는 과정 중에도 이루어질 수 있다.In step 1220, in response to completion of storing frame images corresponding to the first segment in the first segment buffer, the computer device 200 changes the storage buffer to the second segment buffer to store frame images corresponding to the second segment. It can be stored in the second segment buffer. For example, the computer device 200 may store 15 frame images extracted at 250 ms intervals corresponding to the second segment in the second segment buffer. Saving the frame images to the second segment buffer may also occur during the process of the computer device 200 predicting the encoding option in step 1230.

단계(1230)에서 컴퓨터 장치(200)는 제1 세그먼트 버퍼에 저장된 프레임 이미지들에 대한 인코딩 옵션을 예측할 수 있다. 일례로, 컴퓨터 장치(200)는 인공지능 모델을 이용하여 입력 영상의 피처에 대응되는 인코딩 옵션을 예측할 수 있다. 여기서 인공지능 모델은 앞서 설명한 인코딩 옵션 예측 모델(321)에 대응할 수 있다. 인코딩 옵션 예측 모델(321)과 인코딩 옵션 예측 모델(321)을 이용하여 인코딩 옵션을 예측하는 방법에 대해서는 앞서 자세히 설명하였기에 반복적인 설명은 생략한다.In step 1230, the computer device 200 may predict encoding options for frame images stored in the first segment buffer. For example, the computer device 200 may predict encoding options corresponding to features of the input image using an artificial intelligence model. Here, the artificial intelligence model can correspond to the encoding option prediction model 321 described above. Since the encoding option prediction model 321 and the method of predicting the encoding option using the encoding option prediction model 321 have been described in detail previously, repeated explanations will be omitted.

단계(1240)에서 컴퓨터 장치(200)는 예측된 인코딩 옵션 및 제1 세그먼트 버퍼에 저장된 프레임 이미지들을 인코더로 전달할 수 있다. 인코더는 이러한 예측된 인코딩 옵션을 적용하여 인코딩을 처리할 수 있다.In step 1240, the computer device 200 may transmit the predicted encoding option and frame images stored in the first segment buffer to the encoder. The encoder can process encoding by applying these predicted encoding options.

단계(1250)에서 컴퓨터 장치(200)는 제2 세그먼트 버퍼에 두 번째 세그먼트에 대응하는 프레임 이미지들의 저장이 완료되고, 제1 세그먼트 버퍼에 저장된 프레임 이미지들이 인코더로 전달됨에 응답하여 저장 버퍼를 제1 세그먼트 버퍼로 변경하여 세 번째 세그먼트에 대응하는 프레임 이미지들을 제1 세그먼트 버퍼에 저장할 수 있다. 이때, 컴퓨터 장치(200)는 제1 세그먼트 버퍼에 인코더로 전달될 프레임 이미지들이 남아 있는 경우, 제1 세그먼트 버퍼에 저장된 프레임 이미지들이 모두 인코더로 전달될 때까지 대기 후, 세 번째 세그먼트에 대응하는 프레임 이미지들을 제1 세그먼트 버퍼에 저장할 수 있다.In step 1250, the computer device 200 stores the frame images corresponding to the second segment in the second segment buffer, and in response to the frame images stored in the first segment buffer being transferred to the encoder, the storage buffer is stored in the first segment buffer. By changing to a segment buffer, frame images corresponding to the third segment can be stored in the first segment buffer. At this time, if there are still frame images to be transmitted to the encoder in the first segment buffer, the computer device 200 waits until all frame images stored in the first segment buffer are transmitted to the encoder, and then frames corresponding to the third segment. Images may be stored in the first segment buffer.

한편, 앞서 설명한 바와 같이, 셋 이상의 세그먼트 버퍼가 사용될 수도 있다. 이 경우, 컴퓨터 장치(200)는 단계(1250)에서 저장 버퍼를 제2 세그먼트 버퍼에서 제3 세그먼트 버퍼로 변경함으로써, 세 번째 세그먼트에 대응하는 프레임 이미지들을 제1 세그먼트 버퍼에 저장하는 대신 제3 세그먼트 버퍼에 저장할 수 있다. 세 개의 세그먼트 버퍼를 사용하는 경우, 저장 버퍼는 이후 제3 세그먼트 버퍼에서 다시 제1 세그먼트 버퍼로 변경될 수 있으며, 네 개의 세그먼트 버퍼를 사용하는 경우, 저장 버퍼는 제3 세그먼트 버퍼에서 제4 세그먼트 버퍼로, 그 이후에 다시 제4 세그먼트 버퍼에서 제1 세그먼트 버퍼로 순차적으로 변경될 것임을 쉽게 이해할 수 있을 것이다.Meanwhile, as described above, three or more segment buffers may be used. In this case, the computer device 200 changes the storage buffer from the second segment buffer to the third segment buffer in step 1250, thereby storing frame images corresponding to the third segment in the first segment buffer and storing them in the third segment buffer. It can be stored in a buffer. If three segment buffers are used, the storage buffer can then be changed from the third segment buffer back to the first segment buffer, and if four segment buffers are used, the storage buffer can be changed from the third segment buffer to the fourth segment buffer. So, it can be easily understood that after that, it will sequentially change from the fourth segment buffer to the first segment buffer.

단계(1260)에서 컴퓨터 장치(200)는 제2 세그먼트 버퍼에 저장된 프레임 이미지들에 대한 인코딩 옵션을 예측할 수 있다. 제1 세그먼트 버퍼에 저장된 프레임 이미지들에 대한 인코딩 옵션을 예측하는 동안 제2 세그먼트 버퍼에 다음 세그먼트의 프레임 이미지들이 저장되기 때문에 인코딩 옵션 예측 모델(321)은 세그먼트 버퍼에 다시 프레임 이미지들이 저장될 때까지 기다릴 필요 없이 바로 다음 세그먼트의 프레임 이미지들에 대한 분석을 처리할 수 있게 된다.In step 1260, the computer device 200 may predict encoding options for frame images stored in the second segment buffer. Since the frame images of the next segment are stored in the second segment buffer while predicting the encoding option for the frame images stored in the first segment buffer, the encoding option prediction model 321 is used until the frame images are stored in the segment buffer again. You can immediately process the analysis of frame images of the next segment without having to wait.

단계(1270)에서 컴퓨터 장치(200)는 예측된 인코딩 옵션 및 제2 세그먼트 버퍼에 저장된 프레임 이미지들을 인코더로 전달할 수 있다. 마찬가지로 인코더는 전달된 인코딩 옵션에 따라 프레임 이미지들의 인코딩을 처리할 수 있다.In step 1270, the computer device 200 may transmit the predicted encoding option and frame images stored in the second segment buffer to the encoder. Likewise, the encoder can process encoding of frame images according to the passed encoding options.

이처럼, 본 발명의 실시예들에 따르면, 인공지능(AI) 기술을 이용하여 영상의 구간 별로 최적의 인코딩 파라미터를 찾아 압축 효율을 향상시킬 수 있다. 또한, 단일 인코딩 구조에서 더블 버퍼링을 이용하여 처리량(throughput)을 향상시킬 수 있다.As such, according to embodiments of the present invention, compression efficiency can be improved by finding optimal encoding parameters for each section of the image using artificial intelligence (AI) technology. Additionally, throughput can be improved by using double buffering in a single encoding structure.

이상에서 설명된 시스템 또는 장치는 하드웨어 구성요소, 또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The system or device described above may be implemented with hardware components or a combination of hardware components and software components. For example, devices and components described in embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), etc. , may be implemented using one or more general-purpose or special-purpose computers, such as a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include a plurality of processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. It can be embodied in . Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수개 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 애플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination. The medium may continuously store a computer-executable program, or may temporarily store it for execution or download. In addition, the medium may be a variety of recording or storage means in the form of a single or several pieces of hardware combined. It is not limited to a medium directly connected to a computer system and may be distributed over a network. Examples of media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, And there may be something configured to store program instructions, including ROM, RAM, flash memory, etc. Additionally, examples of other media include recording or storage media managed by app stores that distribute applications, sites or servers that supply or distribute various other software, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, various modifications and variations can be made by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments and equivalents of the claims also fall within the scope of the following claims.

Claims

An encoding optimization method executed on a computer device comprising at least one processor, comprising:
storing frame images corresponding to a first segment of an input image in a first segment buffer, by the at least one processor;
In response to completion of storage of frame images corresponding to the first segment in the first segment buffer, the at least one processor changes the storage buffer to a second segment buffer to store frame images corresponding to the second segment. storing in a second segment buffer;
predicting, by the at least one processor, encoding options for frame images stored in the first segment buffer; and
Transferring, by the at least one processor, the predicted encoding option and frame images stored in the first segment buffer to an encoder.
Encoding optimization method including.

According to paragraph 1,
Storage of frame images corresponding to the second segment in the second segment buffer is completed by the at least one processor, and the storage buffer is opened in response to the frame images stored in the first segment buffer being transferred to the encoder. Changing to the first segment buffer and storing frame images corresponding to the third segment in the first segment buffer.
An encoding optimization method further comprising:

According to paragraph 2,
The step of storing frame images corresponding to the third segment in the first segment buffer includes:
If there are still frame images to be transmitted to the encoder in the first segment buffer, wait until all frame images stored in the first segment buffer are transmitted to the encoder, and then send the frame images corresponding to the third segment to the encoder. An encoding optimization method characterized by storing in a first segment buffer.

According to paragraph 1,
When the storage of frame images corresponding to the second segment in the second segment buffer is completed by the at least one processor and the frame images stored in the first segment buffer are transmitted to the encoder, a storage buffer is generated. Changing to a 3-segment buffer and storing frame images corresponding to the third segment in the third segment buffer.
An encoding optimization method further comprising:

According to paragraph 1,
predicting, by the at least one processor, encoding options for frame images stored in the second segment buffer; and
Transferring, by the at least one processor, the predicted encoding option and the frame images stored in the second segment buffer to an encoder.
An encoding optimization method further comprising:

According to paragraph 1,
The step of predicting the encoding option is,
An encoding optimization method characterized by predicting encoding options corresponding to features of an input image using an artificial intelligence model.

According to clause 6,
The artificial intelligence model includes a convolution neural network (CNN) model for extracting frame features for each frame image, a recurrent neural network (RNN) model for extracting video features based on relationships between the frame images, and the video features. An encoding option prediction model that includes a classifier that classifies the encoding option corresponding to the CNN model, the RNN model, and the classifier are learned in an E2E (end-to-end) manner for one loss function.
An encoding optimization method characterized by

According to paragraph 1,
The step of predicting the encoding option is,
An encoding optimization method characterized by predicting a CRF (Constant Rate Factor) that satisfies a target VMAF (Video Multi-method Assessment Fusion) score as the encoding option.

According to paragraph 1,
The step of predicting the encoding option is,
predicting a first CRF that satisfies a target VMAF score as the encoding option;
predicting a second CRF that satisfies a target bit rate as the encoding option; and
Determining a third CRF to be actually applied to encoding the input image using the first CRF and the second CRF
An encoding optimization method comprising:

A computer program stored in a computer-readable recording medium to cause the computer device to execute the encoding optimization method of any one of claims 1 to 9.

In computer devices,
At least one processor implemented to execute readable instructions on the computer device
Including,
By the at least one processor,
Store frame images corresponding to the first segment of the input image in a first segment buffer,
In response to completion of storing frame images corresponding to the first segment in the first segment buffer, change the storage buffer to a second segment buffer to store frame images corresponding to the second segment in the second segment buffer,
Predict encoding options for frame images stored in the first segment buffer,
Transferring the predicted encoding option and frame images stored in the first segment buffer to the encoder.
A computer device characterized by a.

According to clause 11,
By the at least one processor,
In response to the completion of storage of frame images corresponding to the second segment in the second segment buffer and the frame images stored in the first segment buffer being transmitted to the encoder, the storage buffer is changed to the first segment buffer to three Storing frame images corresponding to the th segment in the first segment buffer
A computer device characterized by a.

According to clause 11,
By the at least one processor,
In response to the storage of frame images corresponding to the second segment in the second segment buffer being completed and the frame images stored in the first segment buffer being transferred to the encoder, the storage buffer is changed to the third segment buffer to create the third segment buffer. Storing frame images corresponding to segments in the third segment buffer
A computer device characterized by a.

According to clause 11,
By the at least one processor,
Predict encoding options for frame images stored in the second segment buffer,
Transferring the predicted encoding option and frame images stored in the second segment buffer to the encoder.
A computer device characterized by a.

According to clause 11,
to predict the encoding option by the at least one processor,
Predicting encoding options corresponding to features of the input image using an artificial intelligence model
A computer device characterized by a.