KR20240015021A

KR20240015021A - Anchor scheduler with super-resolution acceleration function and anchor scheduling method using the same

Info

Publication number: KR20240015021A
Application number: KR1020230095284A
Authority: KR
Inventors: 한동수; 여현호; 임휘준; 김재홍; 정영목; 예준철
Original assignee: 한국과학기술원
Priority date: 2022-07-25
Filing date: 2023-07-21
Publication date: 2024-02-02

Abstract

신경망 강화 라이브 스트리밍의 한계에 동기를 부여하여 엔드-투-엔드 신경망 강화 비용을 줄이고 컴퓨팅 클러스터에서 전반적인 품질 향상을 극대화하기 위한 초해상화 가속 기능을 갖는 앵커 스케줄러 및 이를 이용한 앵커 스케줄링 방법이 개시된다. 초해상화 가속 기능을 갖는 앵커 스케줄러는, 외부에서 제공되는 비디오 스트림을 디코딩하는 디코더; 및 상기 디코더에 의해 디코딩된 각 비디오 스트림에 컴퓨팅 리소스를 최적으로 할당하면서 부하를 분산시키는 리소스 관리 모듈을 포함한다.Motivated by the limitations of neural network-enhanced live streaming, an anchor scheduler with super-resolution acceleration and an anchor scheduling method using the same are disclosed to reduce the cost of end-to-end neural network enhancement and maximize overall quality improvement in a computing cluster. An anchor scheduler with a super-resolution acceleration function includes a decoder that decodes an externally provided video stream; and a resource management module that distributes the load while optimally allocating computing resources to each video stream decoded by the decoder.

Description

Anchor scheduler with super-resolution acceleration function and anchor scheduling method using the same {ANCHOR SCHEDULER WITH SUPER-RESOLUTION ACCELERATION FUNCTION AND ANCHOR SCHEDULING METHOD USING THE SAME}

본 발명은 초해상화 가속 기능을 갖는 앵커 스케줄러 및 이를 이용한 앵커 스케줄링 방법에 관한 것으로, 보다 상세하게는 엔드-투-엔드 신경망 강화 비용을 줄이고 컴퓨팅 클러스터에서 전반적인 품질 향상을 극대화하기 위한 초해상화 가속 기능을 갖는 앵커 스케줄러 및 이를 이용한 앵커 스케줄링 방법에 관한 것이다. The present invention relates to an anchor scheduler with a super-resolution acceleration function and an anchor scheduling method using the same. More specifically, the present invention relates to super-resolution acceleration to reduce the cost of end-to-end neural network reinforcement and maximize overall quality improvement in a computing cluster. It relates to a functional anchor scheduler and an anchor scheduling method using the same.

라이브 스트리밍에 대한 수요는 급격히 증가했다. 라이브 비디오 트래픽은 2022년까지 인터넷 트래픽의 17%를 차지할 것으로 예상된다. 현재 라이브 스트리밍 인프라는 두 가지 핵심 요소에 의존한다. 대기 시간이 짧은 스트리밍 프로토콜을 사용하여 미디어 서버에 비디오를 전송한다. 배포 측면에서 클라이언트는 ABR(Adaptive Bitrate) 알고리즘을 실행하여 실시간으로 스트리밍할 수 있는 최고 품질의 비디오를 선택한다. The demand for live streaming has grown rapidly. Live video traffic is expected to account for 17% of Internet traffic by 2022. Current live streaming infrastructure relies on two key elements: Video is transmitted to the media server using a low-latency streaming protocol. On the distribution side, the client runs the Adaptive Bitrate (ABR) algorithm to select the highest quality video that can be streamed in real time.

수집 비디오 품질은 불행하게도 스트리머의 업링크 대역폭에 크게 좌우되기 때문에 기존의 라이브 스트리밍은 고품질 비디오(예를 들어, 4K/8K)를 지속적으로 제공하기에는 부족하다. 수집 경로가 혼잡해지면 전체 다운스트림 비디오 품질이 직접적으로 저하된다. 그러나, 비디오 품질은 라이브 스트리밍에서 사용자 참여에 영향을 미치는 가장 중요한 요소이다. 예를 들어, 라이브 시청자의 50% 이상이 비디오 품질이 90초 이상 저하되면 비디오 스트림을 포기한다. 이러한 시청자의 이탈은 라이브 스트리밍 제공업체의 수익에 큰 피해를 줄 수 있다. Traditional live streaming is inadequate to consistently deliver high-quality video (e.g. 4K/8K), as ingestion video quality is unfortunately highly dependent on the streamer's uplink bandwidth. Congestion of the collection path directly reduces overall downstream video quality. However, video quality is the most important factor affecting user engagement in live streaming. For example, more than 50% of live viewers will abandon a video stream if video quality degrades for more than 90 seconds. This loss of viewers can have a huge impact on the revenue of live streaming providers.

신경 강화 스트리밍의 최근 발전은 미디어 서버에서 계산을 활용하여 수집 비디오 품질을 향상시키는 데 큰 가능성을 보여준다. 수집 비디오 품질이 저하되면, 미디어 서버는 저품질 스트림 디코딩, 초고해상도 심층 신경망(Deep Neural Network, DNN) 적용, 및 초분해된 출력 인코딩으로 구성된 엔드-투-엔드 신경 강화를 실행하여 고품질 비디오를 복구한다. 이것은 다운스트림 비디오에서 극적인 품질 향상을 제공한다. Recent advances in neural enhanced streaming show great promise in leveraging computation in media servers to improve ingestion video quality. When the ingestion video quality deteriorates, the media server executes end-to-end neural enhancement, which consists of decoding the low-quality stream, applying a super-resolution Deep Neural Network (DNN), and super-resolved output encoding to recover high-quality video. . This provides dramatic quality improvement in downstream video.

하지만, 신경 강화는 상용 규모의 라이브 스트리밍을 지원하기에는 비용이 너무 많이 든다. 예를 들어, Twitch는 100,000개 이상의 동시 라이브 스트림을 지원한다. 이 설정에서 엔드-투-엔드 신경 강화를 적용하려면 수만 개의 GPU(graphics processing unit)가 필요하며, 퍼블릭 클라우드에서 시간당 $169,000 이상 비용이 든다. 비용 내역은 비디오 초해상도와 인코딩 모두 비용이 많이 든다는 것을 보여준다. 신경 초해상도는 판별 작업에 사용되는 DNN에 비해 100~1000배 더 많은 계산이 필요하며 비디오 인코딩은 초해상도보다 최대 3.3배 느리다. However, neural enhancement is too expensive to support commercial-scale live streaming. For example, Twitch supports over 100,000 simultaneous live streams. Applying end-to-end neural enhancement in this setup requires tens of thousands of graphics processing units (GPUs) and costs more than $169,000 per hour in the public cloud. The cost breakdown shows that both video super-resolution and encoding are expensive. Neural super-resolution requires 100 to 1000 times more computation than DNNs used for discrimination tasks, and video encoding is up to 3.3 times slower than super-resolution.

한국등록특허 제10-2313136호(2021. 10. 15.)Korean Patent No. 10-2313136 (2021. 10. 15.)

이에 본 발명의 기술적 과제는 이러한 점에 착안한 것으로, 본 발명의 목적은 신경망 강화 라이브 스트리밍의 한계에 동기를 부여하여 엔드-투-엔드 신경망 강화 비용을 줄이고, 수집된 비디오 스트림과 DNN(Deep Neural Network) 쌍을 가져와 고해상도 비디오 스트림을 출력하여 컴퓨팅 클러스터에서 전반적인 품질 향상을 극대화하기 위한 초해상화 가속 기능을 갖는 앵커 스케줄러를 제공하는 것이다. Accordingly, the technical problem of the present invention is focused on this point, and the purpose of the present invention is to reduce the cost of end-to-end neural network reinforcement by motivating the limitations of neural network-enhanced live streaming, and to reduce the cost of end-to-end neural network enhancement, Network) pairs and output high-resolution video streams to provide an anchor scheduler with super-resolution acceleration to maximize overall quality improvement in the computing cluster.

본 발명의 다른 목적은 상기한 초해상화 가속 기능을 갖는 앵커 스케줄러를 이용한 앵커 스케줄링 방법을 제공하는 것이다. Another object of the present invention is to provide an anchor scheduling method using an anchor scheduler having the above-described super-resolution acceleration function.

상기한 본 발명의 목적을 실현하기 위하여 일실시예에 따르면, 초해상화 가속 기능을 갖는 앵커 스케줄러는, 외부에서 제공되는 비디오 스트림을 디코딩하는 디코더; 및 상기 디코더에 의해 디코딩된 각 비디오 스트림에 컴퓨팅 리소스를 최적으로 할당하면서 부하를 분산시키는 리소스 관리 모듈을 포함한다. In order to realize the object of the present invention described above, according to one embodiment, an anchor scheduler with a super-resolution acceleration function includes: a decoder for decoding an externally provided video stream; and a resource management module that distributes the load while optimally allocating computing resources to each video stream decoded by the decoder.

일 실시예에서, 상기 리소스 관리 모듈은, 서버에서 실행되어 모든 비디오 스트림에서 가장 유리한 앵커 프레임을 선택하는 제로-추론 앵커 프레임 선택기; 및 앵커 프레임을 선택한 후 앵커 프레임 세분성에서 컴퓨팅 인스턴스 간에 로드 균형을 동적으로 조정하는 앵커-레벨 로드 밸런서를 포함할 수 있다. In one embodiment, the resource management module includes: a zero-guessing anchor frame selector running on a server to select the most advantageous anchor frame in all video streams; and an anchor-level load balancer that selects an anchor frame and then dynamically balances the load across compute instances at the anchor frame granularity.

일 실시예에서, 상기 제로-추론 앵커 프레임 선택기는, 각 비디오 스트림의 유형에 따라 프레임을 키 프레임 그룹, 대체 참조 프레임 그룹 및 기타 프레임 그룹으로 나누는 프레임 분할부; 상기 대체 참조 프레임 그룹 및 상기 기타 프레임 그룹의 프레임에 대해 가장 유리한 프레임을 반복적으로 선택하고 앵커 게인을 추정하는 앵커 게인 추정부; 상기 키 프레임 그룹, 대체 참조 프레임 그룹 및 기타 프레임 그룹 각각을 병합하여 키 프레임 글로벌 그룹, 대체 참조 프레임 글로벌 그룹 및 기타 프레임 글로벌 그룹으로 병합 및 정렬하는 그룹 병합 및 정렬부; 및 상기 키 프레임 글로벌 그룹에서 대체 참조 프레임 글로벌 그룹 및 상기 기타 프레임 글로벌 그룹까지 정렬된 글로벌 그룹에서 앵커 프레임을 반복적으로 선택하는 앵커 프레임 선택부를 포함할 수 있다. In one embodiment, the zero-inference anchor frame selector includes a frame division unit that divides frames into key frame groups, alternative reference frame groups, and other frame groups according to the type of each video stream; an anchor gain estimation unit that iteratively selects the most advantageous frames for frames in the alternative reference frame group and the other frame groups and estimates anchor gains; a group merging and sorting unit for merging and sorting each of the key frame group, replacement reference frame group, and other frame groups into a key frame global group, a replacement reference frame global group, and other frame global groups; and an anchor frame selection unit that repeatedly selects an anchor frame from global groups sorted from the key frame global group to the alternative reference frame global group and the other frame global group.

일 실시예에서, 상기 앵커 게인 추정부는, 프레임 전체에 걸쳐 누적된 잔차를 계산하는 잔차 계산 모듈; 각 반복에 대해 각 프레임이 감소하는 잔여량을 계산하는 잔여량 계산 모듈; 잔차를 가장 많이 줄이는 프레임을 선택하고 앵커 게인을 감소된 잔차의 양으로 설정하는 앵커 게인 설정 모듈; 및 각 프레임의 누적 잔차를 업데이트하는 업데이트 모듈을 포함할 수 있다. In one embodiment, the anchor gain estimation unit includes a residual calculation module that calculates residual accumulated over the entire frame; a residual calculation module that calculates a residual amount by which each frame is reduced for each iteration; an anchor gain setting module that selects the frame that reduces the residual the most and sets the anchor gain to the amount of the reduced residual; and an update module that updates the accumulated residual of each frame.

일 실시예에서, 상기 잔여량 계산 모듈은, In one embodiment, the remaining amount calculation module,

(여기서, Res(F[j])는 j번째 프레임의 누적 잔차이고, k는 잔차가 재설정되는 가장 가까운 프레임의 인덱스이다)를 이용하여 잔여량을 계산할 수 있다. (Here, Res(F[j]) is the accumulated residual of the jth frame, and k is the index of the nearest frame where the residual is reset).

일 실시예에서, 상기 제로-추론 앵커 프레임 선택기는 컴퓨팅 클러스터에서 실시간으로 처리할 수 있는 최대 앵커 프레임 수를 선택하여 제로-추론 알고리즘을 주기적으로 실행할 수 있다. In one embodiment, the zero-inference anchor frame selector may periodically execute a zero-inference algorithm by selecting the maximum number of anchor frames that can be processed in real time by the computing cluster.

일 실시예에서, 상기 제로-추론 앵커 프레임 선택기는, In one embodiment, the zero-inference anchor frame selector is:

(여기서, AF는 앵커 게인으로 정렬된 앵커 프레임이고, TDNN은 DNN 대기 시간이고, Tintv는 구성 가능한 매개변수인 앵커 프레임 선택 간격이고, M은 클러스터의 컴퓨팅 인스턴스 수이다)를 이용할 수 있다. (where AF is the anchor frame aligned with the anchor gain, TDNN is the DNN latency, Tintv is the anchor frame selection interval, which is a configurable parameter, and M is the number of compute instances in the cluster).

상기한 본 발명의 다른 목적을 실현하기 위하여 일실시예에 따른 앵커 스케줄링 방법은, 외부에서 제공되는 비디오 스트림을 디코딩하는 단계; 및 상기 디코딩된 각 비디오 스트림에 컴퓨팅 리소스를 최적으로 할당하면서 부하를 분산시키는 단계를 포함한다. In order to realize another object of the present invention described above, an anchor scheduling method according to an embodiment includes the steps of decoding an externally provided video stream; and distributing the load while optimally allocating computing resources to each of the decoded video streams.

일 실시예에서, 상기 부하를 분산시키는 단계는, (i) 서버에서 실행되어 모든 비디오 스트림에서 가장 유리한 앵커 프레임을 선택하는 단계; 및 (ii) 앵커 프레임을 선택한 후 앵커 프레임 세분성에서 컴퓨팅 인스턴스 간에 로드 균형을 동적으로 조정하는 단계를 포함할 수 있다. In one embodiment, the load balancing step includes: (i) executing on a server to select the most advantageous anchor frame from all video streams; and (ii) dynamically balancing load across compute instances at anchor frame granularity after selecting an anchor frame.

일 실시예에서, 상기 앵커 프레임을 선택하는 단계는, (i-1) 각 비디오 스트림의 유형에 따라 프레임을 키 프레임 그룹, 대체 참조 프레임 그룹 및 기타 프레임 그룹으로 나누는 단계; (i-2) 상기 대체 참조 프레임 그룹 및 상기 기타 프레임 그룹의 프레임에 대해 앵커 게인을 추정하는 단계; (i-3) 상기 키 프레임 그룹, 대체 참조 프레임 그룹 및 기타 프레임 그룹 각각을 병합하여 키 프레임 글로벌 그룹, 대체 참조 프레임 글로벌 그룹 및 기타 프레임 글로벌 그룹으로 병합 및 정렬하는 단계; 및 (i-4) 상기 키 프레임 글로벌 그룹에서 대체 참조 프레임 글로벌 그룹 및 상기 기타 프레임 글로벌 그룹까지 정렬된 글로벌 그룹에서 앵커 프레임을 반복적으로 선택하는 단계를 포함할 수 있다. In one embodiment, the step of selecting the anchor frame includes: (i-1) dividing the frames into key frame groups, replacement reference frame groups, and other frame groups according to the type of each video stream; (i-2) estimating anchor gains for frames of the alternative reference frame group and the other frame groups; (i-3) merging and sorting each of the key frame group, replacement reference frame group, and other frame groups into a key frame global group, a replacement reference frame global group, and other frame global groups; and (i-4) repeatedly selecting anchor frames from global groups sorted from the key frame global group to the alternative reference frame global group and the other frame global group.

일 실시예에서, 상기 키 프레임 그룹, 상기 대체 프레임 그룹 및 상기 기타 프레임 그룹의 순으로 상기 앵커 프레임을 선택하는 우선 순위가 있는 것을 특징으로 한다. In one embodiment, priority is given to selecting the anchor frame in the order of the key frame group, the replacement frame group, and the other frame group.

일 실시예에서, 상기 앵커 게인은 누적된 잔차를 기반으로 계산될 수 있다. In one embodiment, the anchor gain may be calculated based on accumulated residuals.

일 실시예에서, 상기 앵커 게인은 잔차를 사용하여 추정될 수 있다. In one embodiment, the anchor gain may be estimated using residuals.

일 실시예에서, 상기 앵커 게인을 추정하는 단계는, 프레임 전체에 걸쳐 누적된 잔차(residuals)를 계산하는 단계; 가장 유리한 프레임을 반복적으로 선택하고 앵커 게인을 추정하는 단계; 잔차를 가장 많이 줄이는 프레임을 선택하고 앵커 게인을 감소된 잔차의 양으로 설정하는 단계; 및 선택한 프레임의 영향을 반영하기 위해 각 프레임의 누적 잔차를 업데이트하는 단계를 포함할 수 있다. In one embodiment, estimating the anchor gain includes calculating residuals accumulated across frames; Iteratively selecting the most advantageous frames and estimating anchor gains; selecting the frame that reduces the residual the most and setting the anchor gain to the amount of the reduced residual; and updating the cumulative residual of each frame to reflect the influence of the selected frame.

이러한 초해상화 가속 기능을 갖는 앵커 스케줄러 및 이를 이용한 앵커 스케줄링 방법에 의하면, 제로-추론 앵커 프레임 선택 알고리즘을 통해 신경 추론없이 앵커 프레임을 선택하여 선택적 초해상도 비디오의 부호화를 가속시키면서 프레임별 추론을 실행하는 알고리즘과 비슷한 품질을 제공할 수 있다. 또한 리소스 관리 모듈을 통해 최적의 앵커 프레임 집합을 선택하기 위해 모든 비디오 스트림에서 전역 앵커 프레임 선택을 실행하고, 앵커 프레임의 오버헤드를 정확하게 예측할 수 있는 앵커 프레임 단위로 부하를 인스턴스에 분산시킬 수 있다. According to the anchor scheduler with such a super-resolution acceleration function and the anchor scheduling method using the same, anchor frames are selected without neural inference through a zero-inference anchor frame selection algorithm, and frame-by-frame inference is executed while accelerating the encoding of selective super-resolution video. It can provide similar quality to the algorithm used. Additionally, the resource management module can execute global anchor frame selection across all video streams to select the optimal set of anchor frames, and distribute the load across instances on a per-anchor frame basis, allowing the overhead of anchor frames to be accurately predicted.

도 1은 기존 스트리밍과 신경망 강화 스트리밍(적응형 스트리밍 사례)을 개략적으로 설명하기 위한 개념도이다.
도 2는 선택적 초해상도를 개략적으로 설명하기 위한 개념도이다.
도 3은 본 발명의 일 실시예에 따른 뉴로스케일러를 개략적으로 설명하기 위한 개념도이다.
도 4는 적응형 스트리밍을 위한 뉴로스케일러의 배포 모델을 개략적으로 설명하기 위한 개념도이다.
도 5는 도 3에 도시된 앵커 스케줄러를 개략적으로 설명하기 위한 개념도이다.
도 6은 도 3에 도시된 앵커 프레임 선택기를 개략적으로 설명하기 위한 개념도이다. Figure 1 is a conceptual diagram to schematically explain existing streaming and neural network-enhanced streaming (adaptive streaming case).
Figure 2 is a conceptual diagram schematically explaining selective super-resolution.
Figure 3 is a conceptual diagram schematically explaining a neuroscaler according to an embodiment of the present invention.
Figure 4 is a conceptual diagram schematically explaining the deployment model of a neuroscaler for adaptive streaming.
FIG. 5 is a conceptual diagram schematically explaining the anchor scheduler shown in FIG. 3.
FIG. 6 is a conceptual diagram schematically explaining the anchor frame selector shown in FIG. 3.

이하, 첨부한 도면들을 참조하여, 본 발명에 따른 초해상화 가속 기능을 갖는 앵커 스케줄러 및 이를 이용한 앵커 스케줄링 방법을 보다 상세하게 설명하고자 한다. 본 발명은 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 본문에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 개시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. Hereinafter, with reference to the attached drawings, an anchor scheduler with a super-resolution acceleration function according to the present invention and an anchor scheduling method using the same will be described in more detail. Since the present invention can be subject to various changes and can have various forms, specific embodiments will be illustrated in the drawings and described in detail in the text. However, this is not intended to limit the present invention to a specific disclosed form, and should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the present invention.

각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. 첨부된 도면에 있어서, 구조물들의 치수는 본 발명의 명확성을 기하기 위하여 실제보다 확대하여 도시한 것이다. While describing each drawing, similar reference numerals are used for similar components. In the attached drawings, the dimensions of the structures are enlarged from the actual size for clarity of the present invention.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. Terms such as first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, a first component may be referred to as a second component without departing from the scope of the present invention, and similarly, the second component may also be referred to as a first component. Singular expressions include plural expressions unless the context clearly dictates otherwise.

본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. In this application, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof.

또한, 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서, 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다. Additionally, unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by a person of ordinary skill in the technical field to which the present invention pertains. Terms defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and unless explicitly defined in the present application, should not be interpreted in an ideal or excessively formal sense. No.

이하, 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다. Hereinafter, the present invention will be described in detail with reference to the attached drawings.

라이브 스트리밍은 도 1과 같이 수집 및 배포 측면으로 구성된다. Live streaming consists of collection and distribution aspects, as shown in Figure 1.

도 1은 기존 스트리밍과 신경망 강화 스트리밍(적응형 스트리밍 사례)을 개략적으로 설명하기 위한 개념도이다. Figure 1 is a conceptual diagram to schematically explain existing streaming and neural network-enhanced streaming (adaptive streaming case).

도 1을 참조하면, 수집 측면에서 스트리머는 라이브 비디오를 캡처하여 미디어 서버에 업로드한다. 적응형 스트리밍에서 스트리머는 단일 비디오 스트림을 업로드하고, 미디어 서버는 이를 여러 품질 버전으로 트랜스코딩한다. Referring to Figure 1, on the collection side, the streamer captures live video and uploads it to a media server. In adaptive streaming, the streamer uploads a single video stream, and the media server transcodes it into multiple quality versions.

다자간 화상 회의에서 브로드캐스터는 동시 방송 또는 확장 가능한 비디오 코덱(SVC)을 사용하여 여러 품질의 비디오 스트림을 업로드하고, 미디어 서버는 비디오 스트림을 시청자(viewers)에게 전달한다. 다음으로 배포 측에서 클라이언트는 비디오 스트리밍 품질 조절을 위한 적응형 비트레이트(Adaptive Bitrate, ABR) 알고리즘을 실행하여 주어진 네트워크 대역폭에서 최고 품질의 비디오를 선택/다운로드한다. 상기 ABR 알고리즘은 휴리스틱 기반(Heuristics-based) 방법과 기계학습 기반(Machine Leaning-based) 방법으로 구분될 수 있다. 휴리스틱 기반 방법은 네트워크 대역폭(Bandwidth), 클라이언트 버퍼(Buffer) 점유율 및 이 둘을 결합한 혼합 (Hybrid) 기반으로 분류된다. In multi-party video conferencing, broadcasters upload video streams of multiple qualities using simulcast or scalable video codecs (SVC), and media servers deliver the video streams to viewers. Next, on the distribution side, the client runs the Adaptive Bitrate (ABR) algorithm for video streaming quality control to select/download the highest quality video in the given network bandwidth. The ABR algorithm can be divided into heuristics-based methods and machine learning-based methods. Heuristic-based methods are classified into network bandwidth, client buffer occupancy, and hybrid based methods that combine the two.

기존 라이브 스트리밍의 일반적인 제한 사항은 비디오 품질이 스트리머의 업링크 대역폭에 의해 제한된다는 것이다. 시청자에게 충분한 네트워크 대역폭이 있더라도 제한으로 인해 고품질 비디오를 즐길 기회가 박탈될 수 있다. A common limitation of traditional live streaming is that video quality is limited by the streamer's uplink bandwidth. Even if viewers have sufficient network bandwidth, limitations may deprive them of the opportunity to enjoy high-quality video.

초해상도는 저해상도 이미지에서 고해상도 이미지를 생성하는 기술의 한 종류이다. 최근 연구에서는 저해상도에서 고해상도로 매핑을 학습하기 위해 심층 신경망(Deep Neural Network DNN)을 사용하고 극적인 품질 향상을 보여준다. Super-resolution is a type of technology that creates high-resolution images from low-resolution images. Recent research uses deep neural networks (DNNs) to learn mapping from low-resolution to high-resolution and shows dramatic quality improvements.

신경 강화 라이브 스트리밍은 초고해상도 DNN을 사용하여 도 1과 같이 수집된 비디오 스트림을 향상시킨다. 기존의 라이브 스트리밍과 달리 미디어 서버는 엔드-투-엔드 신경 강화를 실행한다. 비디오가 도착하면, 먼저 원시 프레임으로 디코딩된다. 그런 다음, DNN은 초해상도 방식에 따라 모든 프레임 또는 일부 프레임에 적용된다. 출력은 클라이언트에게 전달하기 전에 비디오로 다시 인코딩된다. Neural-enhanced live streaming uses ultra-high-resolution DNNs to enhance the collected video stream, as shown in Figure 1. Unlike traditional live streaming, the media server implements end-to-end neural enhancement. When the video arrives, it is first decoded into raw frames. Then, the DNN is applied to all or some frames depending on the super-resolution method. The output is re-encoded into video before being delivered to the client.

안정적인 품질 개선을 제공하기 위해 콘텐츠 인식 DNN이 각 비디오 스트림에 일반적으로 사용되며 라이브 스트리밍 중에도 업데이트될 수 있다. 콘텐츠 인식 DNN 간의 컨텍스트 스위칭은 두 가지 유형의 오버헤드를 발생시킨다. To provide reliable quality improvement, content-aware DNNs are commonly used for each video stream and can also be updated during live streaming. Context switching between content-aware DNNs incurs two types of overhead.

첫째, 최신 DNN 컴파일러는 대상 가속기에 대한 최적화된 추론을 제공하지만 여기에는 최대 몇 분이 소요되는 모델 최적화로 인한 선행 비용이 포함된다. 그래프 수준 및 운영자 수준 최적화를 모두 적용하여 특정 대상에 맞는 DNN을 생성한다. First, while modern DNN compilers provide optimized inference for the target accelerator, this includes an upfront cost due to model optimization, which can take up to several minutes. Apply both graph-level and operator-level optimization to create a DNN tailored to your specific target.

둘째, 추론을 실행하기 전에 DNN과 프레임을 대상 가속기의 장치 메모리에 로드해야 한다. 신경 가속기(neural accelerator)는 일반적으로 메모리 크기가 제한되어 있기 때문에(예를 들어, NVIDIA T4의 경우 16GB) 각각 몇 GB의 메모리가 필요한 여러 콘텐츠 인식 DNN을 자주 교체해야 한다. Second, before executing inference, the DNN and frames must be loaded into the device memory of the target accelerator. Because neural accelerators are typically limited in memory size (for example, 16 GB for the NVIDIA T4), multiple content-aware DNNs, each requiring several GB of memory, must be replaced frequently.

선택적 초해상도(super-resolution, SR)는 비디오 프레임 전체에서 시간적 중복성을 활용하여 컴퓨팅 오버헤드를 줄인다. 신경 추론은 앵커 프레임이라고 하는 선택적 프레임에만 적용된다. Selective super-resolution (SR) reduces computing overhead by leveraging temporal redundancy across video frames. Neural inference applies only to selective frames, called anchor frames.

도 2는 선택적 초해상도를 개략적으로 설명하기 위한 개념도이다. Figure 2 is a conceptual diagram schematically explaining selective super-resolution.

도 2에서 볼 수 있듯이, 비앵커 프레임은 코덱 레벨 정보(예를 들어, 참조 인덱스, 움직임 벡터, 잔차)에 의해 안내되는 이전의 초해상도 프레임을 재사용하여 재구성된다. 여기에는 경량 쌍선형 보간 및 디코딩이 포함되므로 모바일 장치에서도 실시간으로 처리할 수 있다. As can be seen in Figure 2, non-anchor frames are reconstructed by reusing previous super-resolution frames guided by codec level information (e.g., reference index, motion vector, residual). It includes lightweight bilinear interpolation and decoding, allowing real-time processing even on mobile devices.

이전의 초해상도 프레임을 재사용하여 비앵커 프레임을 업스케일링하면 시간차로 인해 (프레임별 추론 대비) 품질 손실이 불가피하게 발생한다. 이 손실은 연속적인 비앵커 프레임에 누적되지만 앵커 프레임에서 재설정된다. 따라서, 품질을 최대화하려면 유익한 앵커 프레임 세트를 선택하는 것이 중요하다. 이를 위해 NEMO 알고리즘은 비용이 많이 드는 프레임당 추론에 의존한다. 따라서, 실시간으로 온라인에서 앵커 프레임 선택을 수행해야 하는 라이브 스트리밍에는 사용할 수 없다. Upscaling non-anchor frames by reusing previous super-resolution frames inevitably leads to quality loss (compared to frame-by-frame inference) due to time differences. This loss accumulates in successive non-anchor frames but is reset at the anchor frame. Therefore, it is important to select an informative set of anchor frames to maximize quality. To achieve this, the NEMO algorithm relies on expensive per-frame inference. Therefore, it cannot be used for live streaming where anchor frame selection must be performed online in real time.

도 3은 본 발명의 일 실시예에 따른 뉴로스케일러를 개략적으로 설명하기 위한 개념도이다. Figure 3 is a conceptual diagram schematically explaining a neuroscaler according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시예에 따른 뉴로스케일러(NeuroScaler)는 앵커 스케줄러(Anchor scheduler)(100) 및 앵커 인핸서(Anchor enhancer)(200)를 포함하여, 수집된 비디오 스트림과 DNN(Deep Neural Network) 쌍을 가져와 고해상도 비디오 스트림을 출력한다. 상기 DNN은 LiveNAS에서와 같이 온라인 학습을 통해 동적으로 업데이트될 수도 있다. Referring to FIG. 3, a NeuroScaler according to an embodiment of the present invention includes an anchor scheduler (100) and an anchor enhancer (200), and collects video streams and DNN ( Deep Neural Network) pairs are imported and output a high-resolution video stream. The DNN can also be dynamically updated through online learning, such as in LiveNAS.

앵커 스케줄러(100)는 초해상화 가속 기능을 갖고서, 수집된 비디오 스트림을 디코딩하고 상기 비디오 스트림에서 가장 유리한 앵커 프레임을 선택한다. 여기서, 앵커 프레임은 비디오 초해상도를 가속화하기 위해 선택적 초해상도를 확장하되 실시간으로 코덱 수준의 정보를 이용해 초해상도를 적용할 프레임을 의미한다. 그런 다음, 앵커 스케줄러(100)는, 각 비디오 스트림에 대해, 상기 DNN과 상기 선택된 앵커 프레임을 신경 가속기(neural accelerator)가 장착된 앵커 인핸서(200)에 전달한다. Anchor scheduler 100 has a super-resolution acceleration function to decode the collected video stream and select the most advantageous anchor frame from the video stream. Here, the anchor frame refers to a frame in which the optional super-resolution is extended to accelerate video super-resolution, but super-resolution is applied in real time using codec-level information. Then, the anchor scheduler 100, for each video stream, transfers the DNN and the selected anchor frame to the anchor enhancer 200 equipped with a neural accelerator.

앵커 인핸서(200)는 DNN을 전처리하고, 전처리된 DNN을 앵커 프레임에 적용하고, 초해상 출력을 다시 인코딩한다. The anchor enhancer 200 preprocesses the DNN, applies the preprocessed DNN to the anchor frame, and re-encodes the super-resolution output.

본 실시예에서, 앵커 스케줄러(100) 및 앵커 인핸서(200)의 모든 프로세스는 전체적인 처리량을 최대화하기 위해 파이프라인 연결되고 병렬화된다. In this embodiment, all processes of anchor scheduler 100 and anchor enhancer 200 are pipelined and parallelized to maximize overall throughput.

본 발명에 따른 뉴로스케일러에 의하면, 신경망 강화 라이브 스트리밍의 한계에 동기를 부여하여 엔드-투-엔드 신경망 강화 비용을 줄이고 컴퓨팅 클러스터에서 전반적인 품질 향상을 극대화할 수 있다. According to the neuroscaler according to the present invention, by motivating the limitations of neural network-enhanced live streaming, the end-to-end neural network enhancement cost can be reduced and the overall quality improvement in the computing cluster can be maximized.

도 4는 적응형 스트리밍을 위한 뉴로스케일러의 배포 모델을 개략적으로 설명하기 위한 개념도이다. Figure 4 is a conceptual diagram schematically explaining the deployment model of a neuroscaler for adaptive streaming.

도 4을 참조하면, 저화질 비디오 스트림(예를 들어, 360p 또는 720p)이 미디어 서버에 업로드되면, 뉴로스케일러는 실시간 초해상도를 사용하여 고해상도 비디오 스트림을 생성한다. Referring to Figure 4, when a low-quality video stream (e.g., 360p or 720p) is uploaded to a media server, the neuroscaler generates a high-resolution video stream using real-time super-resolution.

저해상도 버전(예를 들어, 240p 내지 720p)도 기존 트랜스코딩 파이프라인을 사용함으로써 수집된 비디오 스트림에서 생성된다. 뉴로스케일러를 사용하여 클라이언트는 수집 경로가 혼잡한 경우에도 고해상도 비디오를 볼 수 있다. 화상 회의에 뉴로스케일러를 배포하는 것은 위의 프로세스와 유사하지만 다중 해상도 트랜스코딩이 필요하지 않는다. Lower resolution versions (e.g., 240p to 720p) are also created from the collected video stream by using an existing transcoding pipeline. Using Neuroscaler, clients can view high-resolution video even when the collection path is congested. Deploying Neuroscaler for video conferencing is similar to the process above, but does not require multi-resolution transcoding.

도 5는 도 3에 도시된 앵커 스케줄러(100)를 개략적으로 설명하기 위한 개념도이다. FIG. 5 is a conceptual diagram schematically explaining the anchor scheduler 100 shown in FIG. 3.

도 3 및 도 5을 참조하면, 앵커 스케줄러(100)는 디코더(110) 및 리소스 관리 모듈(120)을 포함한다. 3 and 5, the anchor scheduler 100 includes a decoder 110 and a resource management module 120.

디코더(110)는 외부에서 제공되는 비디오 스트림을 디코딩하여 리소스 관리 모듈(120)에 제공한다. 디코딩된 비디오 스트림은 직접적으로 리소스 관리 모듈(120)에 제공될 수도 있고, 메모리를 경유하여 리소스 관리 모듈(120)에 제공될 수도 있다. The decoder 110 decodes an externally provided video stream and provides it to the resource management module 120. The decoded video stream may be provided directly to the resource management module 120 or may be provided to the resource management module 120 via memory.

리소스 관리 모듈(120)은 앵커 선택기(122) 및 앵커-레벨 로드 밸런서(124)를 포함하여, 각 비디오 스트림에 컴퓨팅 리소스를 최적으로 할당하면서 부하를 분산시킨다. 본 실시예에서, 리소스 관리 모듈(120)은 많은 수의 비디오 스트림을 처리할 때 품질 향상을 극대화하기 위해 각 비디오 스트림에 컴퓨팅 리소스를 최적으로 할당하면서 부하를 분산시킨다. 특히 리소스 관리 모듈(120)은 최적의 앵커 프레임 집합을 선택하기 위해 모든 비디오 스트림에서 전역 앵커 프레임 선택을 실행하고, 앵커 프레임의 오버헤드를 정확하게 예측할 수 있는 앵커 프레임 단위로 부하를 인스턴스에 분산시킨다. 본 실시예에서, 앵커 스케줄러(100)는 앵커 선택기(122) 및 앵커-레벨 로드 밸런서(124)를 포함하는 것을 설명하였으나, 이는 설명의 편의를 위해 논리적으로 구분하였을 뿐 하드웨어적으로 구분한 것은 아니다.Resource management module 120 includes an anchor selector 122 and an anchor-level load balancer 124 to distribute load while optimally allocating computing resources to each video stream. In this embodiment, the resource management module 120 balances the load while optimally allocating computing resources to each video stream to maximize quality improvement when processing a large number of video streams. In particular, the resource management module 120 performs global anchor frame selection in all video streams to select an optimal set of anchor frames, and distributes the load to instances in units of anchor frames where the overhead of anchor frames can be accurately predicted. In this embodiment, the anchor scheduler 100 has been described as including an anchor selector 122 and an anchor-level load balancer 124, but this is only logically divided for convenience of explanation and not hardwarely divided. .

앵커 선택기(122)는 중앙 집중식 서버에서 실행되어 모든 비디오 스트림에서 가장 유리한 앵커 프레임을 선택한다. 특히 앵커 선택기(122)는 다음의 수식 (1)과 같이 컴퓨팅 클러스터에서 실시간으로 처리할 수 있는 최대 앵커 프레임 수를 선택하여 제로-추론 알고리즘을 주기적으로 실행한다. Anchor selector 122 runs on a centralized server to select the most advantageous anchor frames from all video streams. In particular, the anchor selector 122 selects the maximum number of anchor frames that can be processed in real time in the computing cluster as shown in Equation (1) below and periodically executes the zero-inference algorithm.

[수식 1][Formula 1]

여기서, 는 앵커 게인으로 정렬된 앵커 프레임이고, 은 DNN 대기 시간이고, 는 구성 가능한 매개변수인 앵커 프레임 선택 간격이고, 은 클러스터의 컴퓨팅 인스턴스 수이다. here, is the anchor frame aligned with the anchor gain, is the DNN latency, is the anchor frame selection interval, which is a configurable parameter; is the number of compute instances in the cluster.

본 명세서에서는 별도의 언급이 없는 한 = 666ms(60fps 비디오의 경우 40프레임)를 사용한다. 이러한 전역 앵커 프레임 선택을 통해 비디오 스트림당 앵커 프레임의 불균형을 완화할 수 있다. Unless otherwise stated in this specification = Use 666ms (40 frames for 60fps video). This global anchor frame selection can alleviate the imbalance of anchor frames per video stream.

일반적으로 더 긴 스케줄링 간격을 사용하면 더 영향력 있는 앵커 프레임을 선택하여 품질을 높일 수 있다. 따라서, 적용 요건에 따라 간격을 조정해야 한다. In general, using longer scheduling intervals can improve quality by selecting more influential anchor frames. Therefore, the spacing must be adjusted according to application requirements.

앵커-레벨 로드 밸런서(124)는 앵커 프레임을 선택한 후 앵커 프레임 세분성에서 컴퓨팅 인스턴스 간에 로드 균형을 동적으로 조정한다. 이를 위해 앵커 스케줄러(100)는 선택한 앵커 프레임을 인스턴스별 그룹으로 나누고 각 그룹의 프레임은 해당 컴퓨팅 인스턴스로 전달되어 처리된다. Anchor-level load balancer 124 selects an anchor frame and then dynamically balances the load across compute instances at anchor frame granularity. To this end, the anchor scheduler 100 divides the selected anchor frames into groups for each instance and the frames of each group are delivered to the corresponding computing instance for processing.

실시간 제약을 충족하기 위해, 앵커 스케줄러(100)는 총 대기 시간이 앵커 프레임 선택 간격() 미만인 각 그룹에 대한 최대 앵커 프레임 수를 할당한다. 이러한 앵커-레벨 로드 밸런싱을 통해 인스턴스별 로드의 불균형을 완화할 수 있다. To meet real-time constraints, the anchor scheduler 100 determines that the total waiting time is the anchor frame selection interval ( ) Allocate a maximum number of anchor frames for each group that is less than . Through this anchor-level load balancing, the load imbalance for each instance can be alleviated.

도 6은 도 5에 도시된 앵커 선택기(122)를 개략적으로 설명하기 위한 개념도이다. FIG. 6 is a conceptual diagram schematically explaining the anchor selector 122 shown in FIG. 5.

도 5 및 도 6을 참조하면, 앵커 선택기(122)는 프레임 분할부(122a), 앵커 게인 추정부(122b), 그룹 병합 및 정렬부(122c) 및 앵커 프레임 선택부(122d)를 포함하여, 코덱 수준 정보(예를 들어, 프레임 유형, 잔차)를 활용하여 유익한 앵커 프레임을 선택한다. 본 실시예에서, 앵커 선택기(122)는 프레임 분할부(122a), 앵커 게인 추정부(122b), 그룹 병합 및 정렬부(122c) 및 앵커 프레임 선택부(122d)를 포함하는 것을 설명하였으나, 이는 설명의 편의를 위해 논리적으로 구분하였을 뿐 하드웨어적으로 구분한 것은 아니다. 5 and 6, the anchor selector 122 includes a frame splitting unit 122a, an anchor gain estimating unit 122b, a group merging and sorting unit 122c, and an anchor frame selection unit 122d, Codec-level information (e.g., frame type, residual) is utilized to select informative anchor frames. In this embodiment, the anchor selector 122 has been described as including a frame dividing unit 122a, an anchor gain estimating unit 122b, a group merging and sorting unit 122c, and an anchor frame selection unit 122d. For convenience of explanation, they are divided logically and not hardware-wise.

프레임 분할부(122a)는 각 비디오 스트림의 유형에 따라 프레임을 비디오 스트림별 그룹으로 나눈다. 즉, 프레임 분할부(122a)는 각 비디오 스트림의 유형에 따라 프레임을 키 프레임 그룹(), 대체 참조 프레임 그룹() 및 기타 프레임 그룹()으로 나눈다. 여기서, 그룹은 , 및 순으로 앵커 프레임을 선택하는 우선 순위가 있다. The frame division unit 122a divides frames into groups for each video stream according to the type of each video stream. That is, the frame division unit 122a divides frames into key frame groups (key frame groups) according to the type of each video stream. ), alternate reference frame group ( ) and other frame groups ( ) divided by Here, the group is , and There is a priority order for selecting anchor frames.

앵커 게인 추정부(122b)는 대체 참조 프레임 그룹() 및 기타 프레임 그룹()의 프레임에 대해 앵커 게인이라고 하는 앵커 프레임을 사용하는 이점을 추정한다. 앵커 게인은 후술하는 후술하는 알고리즘 1을 사용하여 누적된 잔차를 기반으로 계산된다. 키 프레임 그룹()의 경우, 키 프레임이 몇 개뿐이고 일반적으로 모두 앵커 프레임으로 선택되기 때문에 전체 성능과 동일한 앵커 게인을 갖는다고 가정한다. The anchor gain estimation unit 122b is an alternative reference frame group ( ) and other frame groups ( ), we estimate the benefit of using an anchor frame, called the anchor gain. The anchor gain is calculated based on the accumulated residuals using Algorithm 1, described later. Key frame group ( ), since there are only a few key frames and they are generally all selected as anchor frames, it is assumed to have an anchor gain equal to the overall performance.

앵커 게인 추정부(122b)는 잔차 계산 모듈(미도시), 잔여량 계산 모듈(미도시), 앵커 게인 설정 모듈(미도시) 및 업데이트 모듈(미도시)을 포함한다. 본 실시예에서, 앵커 게인 추정부(122b)는 잔차 계산 모듈, 잔여량 계산 모듈, 앵커 게인 설정 모듈 및 업데이트 모듈을 포함하는 것을 설명하였으나, 이는 설명의 편의를 위해 논리적으로 구분하였을 뿐 하드웨어적으로 구분한 것은 아니다. 잔차 계산 모듈은 프레임 전체에 걸쳐 누적된 잔차를 계산한다. 잔여량 계산 모듈은 각 반복에 대해 각 프레임이 감소하는 잔여량을 계산한다. 앵커 게인 설정 모듈은 잔차를 가장 많이 줄이는 프레임을 선택하고 앵커 게인을 감소된 잔차의 양으로 설정한다. 업데이트 모듈은 각 프레임의 누적 잔차를 업데이트한다. The anchor gain estimation unit 122b includes a residual calculation module (not shown), a residual calculation module (not shown), an anchor gain setting module (not shown), and an update module (not shown). In this embodiment, the anchor gain estimation unit 122b has been described as including a residual calculation module, a residual amount calculation module, an anchor gain setting module, and an update module, but these are only logically divided for convenience of explanation and hardwarely divided. I didn't do it. The residual calculation module calculates the residual accumulated over the entire frame. The residual calculation module calculates the residual amount by which each frame is reduced for each iteration. The anchor gain setting module selects the frame that reduces the residual the most and sets the anchor gain to the amount of the reduced residual. The update module updates the cumulative residual of each frame.

그룹 병합 및 정렬부(122c)는 비디오 스트림별 그룹을 글로벌 그룹으로 병합한다. 즉, 키 프레임 글로벌 그룹(), 대체 참조 프레임 글로벌 그룹(), 및 기타 프레임 글로벌 그룹()에는 각각 키 프레임, 대체 참조 프레임 및 모든 비디오 스트림의 일반 프레임이 포함된다. 동일한 그룹의 프레임은 앵커 게인에 따라 정렬된다. The group merging and sorting unit 122c merges groups for each video stream into a global group. That is, the keyframe global group ( ), alternative reference frame global group ( ), and other frame global groups ( ) includes key frames, alternate reference frames, and regular frames of all video streams, respectively. Frames of the same group are sorted according to anchor gain.

앵커 프레임 선택부(122d)는 키 프레임 글로벌 그룹()에서 대체 참조 프레임 글로벌 그룹() 및 기타 프레임 글로벌 그룹()까지 정렬된 글로벌 그룹에서 앵커 프레임을 반복적으로 선택한다. The anchor frame selection unit 122d is a key frame global group ( ) in the alternate reference frame global group ( ) and other frame global groups ( ) Iteratively selects anchor frames from the global group sorted.

본 실시예에서, 뉴로스케일러는 실시간 제약 조건을 충족하기 위해 DNN의 대기 시간을 한 번 측정하고 총 대기 시간이 사용 가능한 컴퓨팅 시간보다 짧은 최대 앵커 프레임 수를 선택한다. In this embodiment, the neuroscaler measures the latency of the DNN once and selects the maximum number of anchor frames whose total latency is less than the available computing time to meet real-time constraints.

이하, 뉴로스케일러가 잔차를 사용하여 알고리즘 1을 통해 앵커 게인을 추정하는 것에 대해 설명한다. Hereinafter, we will describe how the neuroscaler uses residuals to estimate anchor gains through Algorithm 1.

알고리즘 1은 뉴로스케일러가 어떻게 앵커 프레임의 게인(이득)을 추정하는지를 보여준다. Algorithm 1 shows how the neuroscaler estimates the gain of the anchor frame.

알고리즘 1은 먼저 프레임 전체에 걸쳐 누적된 잔차(residuals)를 계산한다(알고리즘 1, 라인 #2). Algorithm 1 first calculates the residuals accumulated across frames (Algorithm 1, line #2).

이어, 가장 유리한 프레임을 반복적으로 선택하고 앵커 게인을 추정한다. 즉, 각 반복에 대해 다음 수식(2)와 같이 각 프레임이 감소하는 잔여량을 계산한다(라인 #6-8). Then, the most advantageous frame is iteratively selected and the anchor gain is estimated. That is, for each repetition, the remaining amount by which each frame is reduced is calculated as shown in the following equation (2) (lines #6-8).

[수식 2][Formula 2]

여기서, Res(F[j])는 j번째 프레임의 누적 잔차이고, k는 잔차가 재설정되는 가장 가까운 프레임의 인덱스이다. 잔차는 키 프레임 또는 이전 반복에서 앵커 게인이 추정된 프레임에서 지워진다. 이러한 프레임이 주어진 프레임 내에 존재하지 않는 경우 알고리즘은 잔차가 다음 비디오 청크의 키 프레임에서 재설정된다고 예측한다. Here, Res(F[j]) is the cumulative residual of the jth frame, and k is the index of the nearest frame where the residual is reset. Residuals are cleared from key frames or frames where the anchor gain was estimated from the previous iteration. If such a frame does not exist within a given frame, the algorithm predicts that the residual is reset at the key frame of the next video chunk.

이어, 잔차를 가장 많이 줄이는 프레임을 선택하고 앵커 게인을 감소된 잔차의 양으로 설정한다(라인 #12,13). Next, the frame that reduces the residual the most is selected and the anchor gain is set to the amount of the reduced residual (lines #12, 13).

마지막으로, 선택한 프레임의 영향을 반영하기 위해 각 프레임의 누적 잔차를 업데이트한다(라인 #14). 앵커 게인을 신속하게 추정하기 위해 총 잔여 픽셀 값은 인코딩된 잔여 프레임의 크기로 근사화된다. 두 값 모두 상관 관계가 높기 때문에 품질에 최소한의 영향을 미친다(=PSNR에서 △0.05dB). Finally, the cumulative residual for each frame is updated to reflect the impact of the selected frame (line #14). To quickly estimate the anchor gain, the total residual pixel value is approximated by the size of the encoded residual frame. Since both values are highly correlated, they have minimal impact on quality (=△0.05dB in PSNR).

이상에서 설명된 바와 같이, 본 발명에 따르면, 제로-추론 앵커 프레임 선택 알고리즘을 통해 신경 추론없이 앵커 프레임을 선택하여 선택적 초해상도 비디오의 부호화를 가속시키면서 프레임별 추론을 실행하는 알고리즘과 비슷한 품질을 제공할 수 있다. 또한 리소스 관리 모듈을 통해 최적의 앵커 프레임 집합을 선택하기 위해 모든 비디오 스트림에서 전역 앵커 프레임 선택을 실행하고, 앵커 프레임의 오버헤드를 정확하게 예측할 수 있는 앵커 프레임 단위로 부하를 인스턴스에 분산시킬 수 있다.As described above, according to the present invention, anchor frames are selected without neural inference through a zero-inference anchor frame selection algorithm, thereby accelerating the encoding of selective super-resolution video while providing quality similar to that of an algorithm that performs frame-by-frame inference. can do. Additionally, the resource management module can execute global anchor frame selection across all video streams to select the optimal set of anchor frames, and distribute the load across instances on a per-anchor frame basis, allowing the overhead of anchor frames to be accurately predicted.

본 발명의 실시예에 따른 구성 요소들은 소프트웨어 또는 FPGA(Field Programmable Gate Array) 또는 ASIC(Application Specific Integrated Circuit)와 같은 하드웨어 형태로 구현될 수 있으며, 소정의 역할들을 수행할 수 있다. 그렇지만 '구성 요소들'은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, 각 구성 요소는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. Components according to embodiments of the present invention may be implemented in the form of software or hardware such as FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and may perform certain roles. However, 'components' are not limited to software or hardware, and each component may be configured to reside on an addressable storage medium or may be configured to run on one or more processors.

따라서, 일 예로서 구성 요소는 소프트웨어 구성 요소들, 객체지향 소프트웨어 구성 요소들, 클래스 구성 요소들 및 태스크 구성 요소들과 같은 구성 요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로 그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다.Thus, as an example, a component may include components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, procedures, and sub-processes. Includes routines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables.

구성 요소들과 해당 구성 요소들 안에서 제공되는 기능은 더 작은 수의 구성 요소들로 결합되거나 추가적인 구성 요소들로 더 분리될 수 있다. 이때, 도면들의 각 블록들은 컴퓨터 프로그램 인스트럭션들에 의해 수행될 수 있음을 이해할 수 있을 것이다. 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서를 통해 수행되는 그 인스트럭션들이 블록(들)에서 설명된 기능들을 수행하는 수단을 생성하게 된다. 이들 컴퓨터 프로그램 인스트럭션들은 특정 방식으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터를 이용하거나 또는 컴퓨터 판독 가능 메모리에 저장되는 것도 가능하므로, 그 컴퓨터를 이용하거나 컴퓨터 판독 가능 메모리에 저장된 인스트럭션들은 블록(들)에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다. 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 블록(들)에서 설명된 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다. Components and the functionality provided within them may be combined into a smaller number of components or further separated into additional components. At this time, it will be understood that each block in the drawings can be executed by computer program instructions. These computer program instructions may be mounted on a processor of a general-purpose computer, special-purpose computer, or other programmable data processing equipment, so that the instructions performed through the processor of the computer or other programmable data processing equipment may perform the functions described in the block(s). It creates the means to carry out these tasks. These computer program instructions may be stored in a computer-readable memory or may be stored in a computer-readable memory that can be directed to a computer or other programmable data processing equipment to implement a function in a particular manner. Instructions stored in memory may also produce manufactured items containing instruction means to perform the functions described in the block(s). Computer program instructions can also be mounted on a computer or other programmable data processing equipment, so that a series of operational steps are performed on the computer or other programmable data processing equipment to create a process that is executed by the computer, thereby generating a process that is executed by the computer or other programmable data processing equipment. Instructions that perform processing equipment may also provide steps for executing the functions described in the block(s).

또한, 각 블록은 특정된 논리적 기능(들)을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또, 몇 가지 대체 실행 예들에서는 블록들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 블록들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다. Additionally, each block may represent a module, segment, or portion of code that includes one or more executable instructions for executing specified logical function(s). Additionally, it should be noted that in some alternative execution examples it is possible for the functions mentioned in the blocks to occur out of order. For example, it is possible for two blocks shown in succession to be performed substantially at the same time, or it is possible for the blocks to be performed in reverse order depending on the corresponding function.

이때, 본 실시예에서 사용되는 '~부'라는 용어는 소프트웨어 또는 FPGA또는 ASIC과 같은 하드웨어 구성요소를 의미하며, '~부'는 어떤 역할들을 수행한다. 그렇지만 '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '~부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성 요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저 들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다. At this time, the term '~unit' used in this embodiment refers to software or hardware components such as FPGA or ASIC, and the '~unit' performs certain roles. However, '~part' is not limited to software or hardware. The '~ part' may be configured to reside in an addressable storage medium and may be configured to reproduce on one or more processors. Therefore, as an example, '~ part' refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. The functions provided within the components and 'parts' may be combined into a smaller number of components and 'parts' or may be further separated into additional components and 'parts'. Additionally, components and 'parts' may be implemented to regenerate one or more CPUs within a device or a secure multimedia card.

이상에서는 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to examples, those skilled in the art can make various modifications and changes to the present invention without departing from the spirit and scope of the present invention as set forth in the claims below. You will understand.

120 : 리소스 관리 모듈 122 : 앵커 선택기
124 : 앵커-레벨 로드 밸런서 122a : 프레임 분할부
122b : 앵커 게인 추정부 122c : 그룹 병합 및 정렬부
122d : 앵커 프레임 선택부 210 : 전처리 프로세서
220 : 추론 엔진 230 : 인코더120: Resource Management Module 122: Anchor Selector
124: anchor-level load balancer 122a: frame divider
122b: Anchor gain estimation unit 122c: Group merging and sorting unit
122d: anchor frame selection unit 210: pre-processor
220: Inference Engine 230: Encoder

Claims

A decoder that decodes an externally provided video stream; and
An anchor scheduler with a super-resolution acceleration function, comprising a resource management module that distributes the load while optimally allocating computing resources to each video stream decoded by the decoder.

The method of claim 1, wherein the resource management module,
A zero-inference anchor frame selector that runs on the server and selects the most advantageous anchor frame from any video stream; and
An anchor scheduler with super-resolution acceleration, comprising an anchor-level load balancer that selects an anchor frame and then dynamically balances the load across compute instances at anchor frame granularity.

3. The method of claim 2, wherein the zero-inference anchor frame selector comprises:
A frame divider that divides frames into key frame groups, alternate reference frame groups, and other frame groups according to the type of each video stream;
an anchor gain estimation unit that iteratively selects the most advantageous frames for frames in the alternative reference frame group and the other frame groups and estimates anchor gains;
a group merging and sorting unit for merging and sorting each of the key frame group, replacement reference frame group, and other frame groups into a key frame global group, a replacement reference frame global group, and other frame global groups; and
An anchor scheduler with a super-resolution acceleration function, comprising an anchor frame selection unit that repeatedly selects anchor frames from global groups sorted from the key frame global group to the alternative reference frame global group and the other frame global group.

The method of claim 3, wherein the anchor gain estimation unit,
a residual calculation module that calculates residuals accumulated across frames;
a residual calculation module that calculates a residual amount by which each frame is reduced for each iteration;
an anchor gain setting module that selects the frame that reduces the residual the most and sets the anchor gain to the amount of the reduced residual; and
An anchor scheduler with a super-resolution acceleration function, comprising an update module that updates the accumulated residual of each frame.

The method of claim 4, wherein the remaining amount calculation module,

(Here, Res(F[j]) is the accumulated residual of the jth frame, and k is the index of the nearest frame where the residual is reset) with a super-resolution acceleration function, characterized in that the residual is calculated using Anchor scheduler.

The method of claim 2, wherein the zero-inference anchor frame selector has a super-resolution acceleration function, wherein the zero-inference anchor frame selector selects the maximum number of anchor frames that can be processed in real time in the computing cluster and periodically executes the zero-inference algorithm. Anchor scheduler.

7. The method of claim 6, wherein the zero-inference anchor frame selector comprises:

(where AF is the anchor frame aligned with the anchor gain, T _DNN is the DNN latency, Tintv is the anchor frame selection interval, which is a configurable parameter, and M is the number of compute instances in the cluster). An anchor scheduler with super-resolution acceleration.

Decoding an externally provided video stream; and
An anchor scheduling method comprising distributing load while optimally allocating computing resources to each of the decoded video streams.

The method of claim 8, wherein distributing the load comprises:
(i) running on the server to select the most advantageous anchor frame from all video streams; and
(ii) selecting an anchor frame and then dynamically balancing the load among computing instances at the anchor frame granularity.

The method of claim 9, wherein selecting the anchor frame comprises:
(i-1) dividing frames into key frame groups, alternative reference frame groups and other frame groups according to the type of each video stream;
(i-2) estimating anchor gains for frames of the alternative reference frame group and the other frame groups;
(i-3) merging and sorting each of the key frame group, replacement reference frame group, and other frame groups into a key frame global group, a replacement reference frame global group, and other frame global groups; and
(i-4) An anchor scheduling method comprising repeatedly selecting anchor frames from global groups sorted from the key frame global group to the alternative reference frame global group and the other frame global group.

The anchor scheduling method of claim 9, wherein the anchor frame is selected in priority order from the key frame group, the replacement frame group, and the other frame group.

The anchor scheduling method of claim 9, wherein the anchor gain is calculated based on accumulated residuals.

The anchor scheduling method of claim 9, wherein the anchor gain is estimated using residuals.

The method of claim 9, wherein estimating the anchor gain comprises:
calculating residuals accumulated throughout the frame;
Iteratively selecting the most advantageous frames and estimating anchor gains;
selecting the frame that reduces the residual the most and setting the anchor gain to the amount of the reduced residual; and
An anchor scheduling method comprising updating the cumulative residual of each frame to reflect the influence of the selected frame.