KR20200145669A

KR20200145669A - Motion based adaptive rendering

Info

Publication number: KR20200145669A
Application number: KR1020200057637A
Authority: KR
Inventors: 아브히나브 골라스; 니콜라스 소레
Original assignee: 삼성전자주식회사
Priority date: 2019-06-19
Filing date: 2020-05-14
Publication date: 2020-12-30

Abstract

A method of performing adaptive shading of image frames by a Graphics Processing Unit (GPU) includes: determining, by the GPU, a first shading rate based on determining that a change in a plurality of underlying assets between a first image frame and a second image frame is above a first threshold; determining, by the GPU, a second shading rate based on determining that one or more viewports in the second image frame is similar to one or more viewports in the first image frame; determining, by the GPU, a third shading rate based on determining that a quality reduction filter is used; and selecting, by the GPU, a shading rate from among the first shading rate, the second shading rate, and the third shading rate for the first image frame. The present invention provides motion-based adaptive rendering in which the number of samples rendered in a block of pixels is reduced.

Description

Motion-based adaptive rendering {MOTION BASED ADAPTIVE RENDERING}

본 발명의 실시 예는 픽셀들의 블록에서 렌더링(rendering)되는 샘플들의 수가 감소되는 그래픽 처리를 수행하는 기법들에 관한 것이다. 보다 상세하게는, 본 발명의 실시 예는 개별 디스플레이 화면 타일들(픽셀들의 블록)에서 모션 및 다른 요소들을 자동적으로 분석하고 타일-대-타일(tile-by-tile) 기반으로 샘플링 결정을 내리는 것에 관한 것이다. Embodiments of the present invention relate to techniques for performing graphic processing in which the number of samples rendered in a block of pixels is reduced. More specifically, an embodiment of the present invention is directed to automatically analyzing motion and other elements in individual display screen tiles (blocks of pixels) and making a sampling decision on a tile-by-tile basis. About.

도 1은 오픈 그래픽 라이브러리 3.0 스탠다드(OpenGL®3.0 standard)를 기반으로 하는 그래픽 파이프라인(100)의 주요 부분들을 도시한다. 예시적인 단(계)(stage)들의 세트(set)는 버텍스 쉐이더 동작 단계(vertex shader operation stage)(105), 프리미티브 어셈블리 및 래스터화 단계(primitive assembly and rasterization stage)(110), 프래그먼트 픽셀 쉐이더 동작 단계(fragment pixel shader operation stage)(115), 프레임 버퍼 단계(frame buffer stage)(120), 및 텍스처 메모리(125)를 포함한다. 파이프라인은 버텍스 데이터, 쉐이드 버텍스들, 어셈블, 및 래스트화된 프리미티브들을 수신하도록, 그리고 프래그먼트들/픽셀들에서 쉐이딩(shading, 그림자 처리) 동작들을 수행하도록 동작한다. 1 shows the main parts of a graphics pipeline 100 based on the OpenGL® 3.0 standard. An exemplary set of stages is a vertex shader operation stage 105, a primitive assembly and rasterization stage 110, and a fragment pixel shader operation. A fragment pixel shader operation stage 115, a frame buffer stage 120, and a texture memory 125. The pipeline operates to receive vertex data, shade vertices, assemble, and rasterized primitives, and to perform shading (shadow processing) operations on fragments/pixels.

그래픽 파이프라인(100)의 일 양상에서 이미지의 모든 영역이 동일한 최소 해상도로 렌더링(rendering)된다. 특히, 일반적인 그래픽 파이프라인에서, 샘플링 레이트(픽셀 당 샘플들의 평균 수)는 전형적으로 이미지의 모든 픽셀마다 적어도 하나의 샘플이다. In one aspect of the graphics pipeline 100, all regions of the image are rendered at the same minimum resolution. In particular, in a typical graphics pipeline, the sampling rate (average number of samples per pixel) is typically at least one sample for every pixel in the image.

일반적인 그래픽 파이프라인의 일 양상에서 그것은 낭비적이고 원하는 것보다 더 많은 픽셀 쉐이딩 동작들이 요구된다. 특히, 이미지의 로컬 영역들에서 샘플링 레이트를 픽셀 당 하나의 샘플 미만으로 줄이기 위해(서브-샘플링(sub-sampling)/디-샘플링(de-sampling)) 그래픽 파이프라인이 전략적인 선택들을 자동으로 수행하는 자동화가 존재하지 않는다. 모바일 장치들의 맥락에서, 이것은 소비되는 에너지의 양이 원하는 것보다 높음을 의미한다.In one aspect of a typical graphics pipeline it is wasteful and requires more pixel shading operations than desired. In particular, in order to reduce the sampling rate to less than one sample per pixel (sub-sampling/de-sampling) in local regions of the image, the graphics pipeline automatically performs strategic choices. There is no automation. In the context of mobile devices, this means that the amount of energy consumed is higher than desired.

배경기술 항목에서의 상술된 정보는 기술의 배경에 대한 이해를 돕기 위한 것에 불과하므로, 선행 기술의 존재 또는 관련성을 인정하는 것으로 해석되어서는 아니된다. Since the above-described information in the background art section is only to help understanding the background of the technology, it should not be interpreted as acknowledging the existence or relevance of the prior art.

본 발명의 목적은 픽셀들의 블록에서 렌더링되는 샘플들의 수가 감소되는 모션 기반의 적응형 렌더링을 제공하는데 있다. It is an object of the present invention to provide a motion-based adaptive rendering in which the number of samples rendered in a block of pixels is reduced.

그래픽 시스템은 카메라의 기준 프레임에 대해 렌더링(rendering)되는 기초 객체(underlying object)들의 모션을 기반으로 프레임의 개별 부분을 적응적으로 렌더링한다. 일 실시 예에서, 적응형 렌더링(adaptive rendering)은 적어도 2개의 프레임들에 걸쳐 화면 상에 렌더링되는 객체의 속도에 적어도 부분적으로 기반한다. 화면 공간에서(픽셀들을 통해) 모션을 측정하면 객체 모션 및 카메라 모션을 포함하는 다른 모션 소스가 통합된다. 기본 모션의 속도가 준-정적(quasi-static) 한계 미만이면, 이전 프레임으로부터의 픽셀들의 프랙션(fraction)을 재사용할지 여부가 결정될 수 있다. 중간 속도 체제(regime)에서 풀(full) 샘플링 레이트(sampling rate)가 사용된다. 적어도 하나의 고속 체제에서, 감소된 샘플링 레이트를 선택할지 여부가 결정된다. 결정들은 타일-대-타일 기반으로 이루어질 수 있고, 여기서 타일은 이미지 내에서, 전형적으로 정사각형(square) 또는 직사각형 형상을 갖는 블록내에서 인접한 픽셀들 세트이다.The graphics system adaptively renders individual parts of the frame based on the motion of underlying objects rendered relative to the camera's reference frame. In one embodiment, adaptive rendering is based at least in part on the speed of an object rendered on the screen over at least two frames. Measuring motion in screen space (through pixels) incorporates other motion sources including object motion and camera motion. If the speed of the basic motion is less than the quasi-static limit, it may be determined whether to reuse the fraction of pixels from the previous frame. In the medium rate regime, a full sampling rate is used. In at least one high-speed regime, it is determined whether to select a reduced sampling rate. The decisions can be made on a tile-to-tile basis, where a tile is a set of adjacent pixels within an image, typically in a block having a square or rectangular shape.

방법의 일 실시 예는, 타일-대-타일 기반으로, 이전 프레임에 대한 현재 프레임에서의 객체들의 픽셀들의 속도를 결정하는 단계를 포함한다. 각 타일은 적어도 3개의 속도 카테고리들 중 하나로 분류되며, 적어도 3개의 속도 카테고리들은 준-정적 속도 카테고리, 중간 속도 카테고리, 및 고속 카테고리를 포함한다. 각 타일에 대한 샘플링 결정은 각 타일과 관련된 속도 카테고리의 적어도 일부에 기반하여 이루어진다. 샘플링 결정은 타일이 현재 프레임에서 픽셀 당 적어도 하나의 샘플의 풀 해상도 샘플링 레이트에서 샘플링될지 또는 현재 프레임에서 낮은 레이트에서 샘플링될지를 결정하는 것을 포함할 수 있다. 이후, 타일들은 렌더링된다. 일 실시 예에서, 샘플링 결정은 타일이 컬러(color) 또는 깊이(depth)의 에지(edge)를 포함할 가능성이 높은 것으로 검출될지 여부에 더 기반할 수 있다. 일 실시 예에서, 준-정적 속도 카테고리로 분류된 타일에 대해, 방법은 이전 프레임의 적어도 하나의 픽셀에 대한 픽셀 데이터를 타일에 복사함으로써 이전 프레임으로부터의 픽셀 데이터를 재사용하는 단계를 포함한다. 일 실시 예에서, 중간 속도 체제로 분류된 타일에 대해, 모든 픽셀은 적어도 한 번 샘플링된다. 일 실시 예에서, 적어도 하나의 고속 체제로 분류된 타일에 대해, 샘플들의 수가 타일과 연관된 픽셀들의 수보다 적은 샘플링 패턴의 선택이 이루어지고 샘플링되지 않는 픽셀 위치들에서 컬러를 결정하기 위해 보간(interpolation)이 수행된다. One embodiment of the method includes determining, on a tile-to-tile basis, the speed of pixels of objects in a current frame with respect to a previous frame. Each tile is classified into one of at least three speed categories, and the at least three speed categories include a semi-static speed category, a medium speed category, and a high speed category. The sampling decision for each tile is made based on at least a portion of the rate category associated with each tile. The sampling decision may include determining whether the tile is to be sampled at a full resolution sampling rate of at least one sample per pixel in the current frame or at a lower rate in the current frame. Then, the tiles are rendered. In an embodiment, the sampling decision may be further based on whether the tile is detected as having a high probability of including an edge of color or depth. In one embodiment, for tiles classified in a quasi-static velocity category, the method includes reusing pixel data from a previous frame by copying pixel data for at least one pixel of a previous frame to the tile. In one embodiment, for tiles classified in the medium rate regime, every pixel is sampled at least once. In one embodiment, for tiles classified in at least one high-speed regime, selection of a sampling pattern in which the number of samples is less than the number of pixels associated with the tile is performed, and interpolation is performed to determine color at non-sampled pixel locations. ) Is performed.

그래픽 시스템의 일 실시 예는 그래픽 프로세서와 적응형 샘플링 생성기 및 픽셀 쉐이더(pixel shader)를 포함하는 그래픽 파이프라인을 포함한다. 적응형 샘플링 생성기는 각 타일에서 객체들의 픽셀들의 속도에 적어도 부분적으로 기초하여 각 타일에 대한 요구되는 샘플 레이트를 결정하고 요구되는 샘플 레이트를 기반으로 샘플 패턴을 선택한다. 일 실시 예에서, 적응형 샘플링 생성기는 개별 타일에서 객체들의 속도 및 개별 타일이 에지를 포함하는지 여부의 조합을 기반으로 개별 타일에 대한 샘플 패턴 및 샘플 레이트를 결정한다. 일 실시 예에서, 그래픽 시스템은 이송(advection)을 수행하기 위한 이송 유닛을 포함하고, 준-정적 속도 제한 미만의 속도를 갖는 타일에 대해, 감소된 샘플링 레이트를 갖는 샘플 패턴이 선택되고, 이송 유닛은 이송을 통해 이전 프레임으로부터의 픽셀 데이터를 재사용함으로써 누락된 픽셀 데이터를 채운다. 일 실시 예에서, 그래픽 시스템은 재구성 유닛을 포함하고, 임계 속도 초과의 속도를 갖는 타일에 대해, 감소된 샘플링 레이트가 선택되고, 누락된 픽셀 데이터는 재구성 유닛에 의해 보간된다.An embodiment of a graphics system includes a graphics processor and a graphics pipeline that includes an adaptive sampling generator and a pixel shader. The adaptive sampling generator determines a required sample rate for each tile based at least in part on the speed of the pixels of the objects in each tile and selects a sample pattern based on the required sample rate. In one embodiment, the adaptive sampling generator determines a sample pattern and sample rate for an individual tile based on a combination of the speed of objects in the individual tile and whether the individual tile contains an edge. In one embodiment, the graphics system includes a transfer unit for performing an advection, and for tiles with a speed below the quasi-static speed limit, a sample pattern with a reduced sampling rate is selected, and the transfer unit Fills the missing pixel data by reusing the pixel data from the previous frame through transfer. In one embodiment, the graphics system includes a reconstruction unit, and for tiles having a rate above a threshold rate, a reduced sampling rate is selected, and missing pixel data is interpolated by the reconstruction unit.

일부 실시 예들에서, GPU(Graphics Processing Unit)에 의한 이미지 프레임들의 적응형 쉐이딩(adaptive shading)을 수행하는 방법은, 상기 GPU에 의해, 제1 이미지 프레임 및 제2 이미지 프레임 사이의 복수의 기초 자산(underlying asset)에서의 변화가 제1 임계값 초과라고 결정하는 것을 기반으로 제1 쉐이딩 레이트를 결정하는 단계, 상기 GPU에 의해, 상기 제2 이미지 프레임에서의 하나 이상의 뷰포트(viewport)가 상기 제1 이미지 프레임에서의 하나 이상의 뷰포트와 유사하다고 결정하는 것을 기반으로 제2 쉐이딩 레이트를 결정하는 단계, 상기 GPU에 의해, 품질 감소 필터가 사용된다고 결정하는 것을 기반으로 제3 쉐이딩 레이트를 결정하는 단계, 및 상기 GPU에 의해, 상기 제1 이미지 프레임에 대한 상기 제1 쉐이딩 레이트, 상기 제2 쉐이딩 레이트, 및 상기 제3 쉐이딩 레이트 중에서 쉐이딩 레이트를 선택하는 단계를 포함한다.In some embodiments, a method of performing adaptive shading of image frames by a Graphics Processing Unit (GPU), by the GPU, includes a plurality of underlying assets between a first image frame and a second image frame ( determining a first shading rate based on determining that a change in the underlying asset) exceeds a first threshold, by the GPU, at least one viewport in the second image frame is the first image Determining a second shading rate based on determining that it is similar to one or more viewports in the frame, determining, by the GPU, a third shading rate based on determining that a quality reduction filter is to be used, and the And selecting, by a GPU, a shading rate from among the first shading rate, the second shading rate, and the third shading rate for the first image frame.

일부 실시 예들에서, 상기 방법은, 상기 GPU에 의해, 상기 제1 이미지 프레임 및 상기 제2 이미지 프레임 사이의 상기 복수의 기초 자산에서의 상기 변화가 상기 제1 임계값 미만이라고 결정하는 것을 기반으로 상기 제2 이미지 프레임을 렌더링(rendering)하기 위해 상기 제1 이미지 프레임으로부터의 픽셀 데이터를 재사용하는 단계를 더 포함한다. 일부 실시 예들에서, 상기 방법은, 상기 GPU에 의해, 상기 제1 이미지 프레임부터 상기 제2 이미지 프레임까지 상기 픽셀 데이터를 재투사(reprojecting)함으로써 상기 제2 이미지 프레임을 렌더링하기 위해 상기 제1 이미지 프레임으로부터의 상기 픽셀 데이터를 재사용하는 단계를 더 포함한다. 일부 실시 예들에서, 상기 제1 이미지 프레임 및 상기 제2 이미지 프레임 사이의 상기 복수의 기초 자산에서의 상기 변화는 카메라 및 뷰포트와 연관된다. In some embodiments, the method further comprises determining, by the GPU, that the change in the plurality of underlying assets between the first image frame and the second image frame is less than the first threshold. And reusing pixel data from the first image frame to render the second image frame. In some embodiments, the method comprises, by the GPU, the first image frame to render the second image frame by reprojecting the pixel data from the first image frame to the second image frame. And reusing the pixel data from In some embodiments, the change in the plurality of underlying assets between the first image frame and the second image frame is associated with a camera and viewport.

일부 실시 예들에서, 상기 방법은, 상기 GPU에 의해, 상기 제2 이미지 프레임에서의 상기 하나 이상의 뷰포트 및 상기 제1 이미지 프레임에서의 상기 하나 이상의 뷰포트 사이의 변화가 제2 임계값 미만이라고 결정하는 것을 기반으로 상기 제2 이미지 프레임을 렌더링하기 위해 상기 제1 이미지 프레임으로부터의 픽셀 데이터를 재사용하는 단계를 더 포함하고, 상기 방법은, 상기 GPU에 의해, 상기 제1 이미지 프레임부터 상기 제2 이미지 프레임까지 상기 픽셀 데이터를 재투사함으로써 상기 제2 이미지 프레임을 렌더링하기 위해 상기 제1 이미지 프레임으로부터의 상기 픽셀 데이터를 재사용하는 단계를 더 포함한다.In some embodiments, the method further comprises determining, by the GPU, that a change between the one or more viewports in the second image frame and the one or more viewports in the first image frame is less than a second threshold. And reusing pixel data from the first image frame to render the second image frame based on, the method further comprising, by the GPU, from the first image frame to the second image frame And reusing the pixel data from the first image frame to render the second image frame by reprojecting the pixel data.

일부 실시 예들에서, 상기 품질 감소 필터가 사용된다고 결정하는 것은, 상기 GPU에 의해, 상기 제2 이미지 프레임의 품질의 감소를 결정하는 것, 상기 GPU에 의해, 상기 제2 이미지 프레임의 출력 컬러가 상기 제1 이미지 프레임으로부터의 픽셀 데이터의 가중 합(weighted sum)에 대응하는지를 결정하는 것, 그리고 상기 GPU에 의해, 상기 제2 이미지 프레임의 상기 출력 컬러가 상기 제1 이미지 프레임으로부터의 상기 픽셀 데이터의 상기 가중 합에 대응한다고 결정하는 것을 기반으로 상기 제1 이미지 프레임에 대한 상기 제3 쉐이딩 레이트를 결정하는 것을 포함한다.In some embodiments, determining that the quality reduction filter is used determines, by the GPU, a decrease in quality of the second image frame, by the GPU, the output color of the second image frame is Determining whether it corresponds to a weighted sum of pixel data from a first image frame, and by the GPU, the output color of the second image frame is the output color of the pixel data from the first image frame. And determining the third shading rate for the first image frame based on determining that it corresponds to a weighted sum.

일부 실시 예들에서, 상기 제2 이미지 프레임의 상기 출력 컬러가 상기 제1 이미지 프레임으로부터의 상기 픽셀 데이터의 상기 가중 합에 대응한다고 결정하는 것은, 상기 GPU에 의해, 상기 제2 이미지 프레임의 고차 함수(functional)를 결정하는 것을 포함한다. 일부 실시 예들에서, 상기 제1 이미지 프레임에서의 다수의 쉐이딩된 샘플들 및 상기 고차 함수에 대한 다수의 입력 값들 사이의 비례 관계는 상기 GPU의 워크로드(workload)에 기반한다. 일부 실시 예들에서, 상기 고차 함수는, 상기 품질 감소 필터를 입력으로 수신하고, 그리고 상기 제1 이미지 프레임에 대한 상기 제3 쉐이딩 레이트를 결정하도록 구성된다.In some embodiments, determining that the output color of the second image frame corresponds to the weighted sum of the pixel data from the first image frame is, by the GPU, a higher order function of the second image frame ( functional). In some embodiments, a proportional relationship between a plurality of shaded samples in the first image frame and a plurality of input values for the higher order function is based on the workload of the GPU. In some embodiments, the higher order function is configured to receive the quality reduction filter as an input and determine the third shading rate for the first image frame.

일부 실시 예들에서, 상기 가중 합은, 상기 제1 이미지 프레임을 예시로 하는 복수의 입력 이미지 프레임들에 걸쳐, 상기 제1 이미지 프레임의 상기 픽셀 데이터의 상기 가중 합을 포함한다. 일부 실시 예들에서, 상기 품질 감소 필터는 뎁스-오브-필드(depth-of-field) 필터, 모션 블러(motion blur) 필터, 및 스무딩(smoothing) 필터 중 하나 이상을 포함한다. 일부 실시 예들에서, 상기 품질 감소 필터는 상기 GPU에서 렌더링된 입력 텍스처들로부터의 다수의 샘플들 또는 탭(tap)들에 걸쳐 평균을 내도록 구성된다.In some embodiments, the weighted sum comprises the weighted sum of the pixel data of the first image frame over a plurality of input image frames exemplifying the first image frame. In some embodiments, the quality reduction filter includes one or more of a depth-of-field filter, a motion blur filter, and a smoothing filter. In some embodiments, the quality reduction filter is configured to average over multiple samples or taps from input textures rendered at the GPU.

일부 실시 예들에서, 이미지 프레임들의 적응형 쉐이딩(adaptive shading)을 수행하는 시스템은 메모리 및 상기 메모리에 연결된 GPU(Graphics Processing Unit)를 포함하되, 상기 GPU는 제1 이미지 프레임 및 제2 이미지 프레임 사이의 복수의 기초 자산(underlying asset)에서의 변화가 제1 임계값 초과라고 결정하는 것을 기반으로 제1 쉐이딩 레이트를 결정하고, 상기 제2 이미지 프레임에서의 하나 이상의 뷰포트(viewport)가 상기 제1 이미지 프레임에서의 하나 이상의 뷰포트와 유사하다고 결정하는 것을 기반으로 제2 쉐이딩 레이트를 결정하고, 품질 감소 필터가 사용된다고 결정하는 것을 기반으로 제3 쉐이딩 레이트를 결정하고, 그리고 상기 제1 이미지 프레임에 대한 상기 제1 쉐이딩 레이트, 상기 제2 쉐이딩 레이트, 및 상기 제3 쉐이딩 레이트 중에서 쉐이딩 레이트를 선택하도록 구성된다.In some embodiments, a system for performing adaptive shading of image frames includes a memory and a Graphics Processing Unit (GPU) connected to the memory, wherein the GPU includes a first image frame and a second image frame. A first shading rate is determined based on determining that a change in a plurality of underlying assets exceeds a first threshold, and at least one viewport in the second image frame is the first image frame. A second shading rate is determined based on determining that it is similar to one or more viewports in, and a third shading rate is determined based on determining that a quality reduction filter is used, and the second shading rate is determined for the first image frame. It is configured to select a shading rate from among 1 shading rate, the second shading rate, and the third shading rate.

일부 실시 예들에서, 상기 GPU는 상기 제1 이미지 프레임 및 상기 제2 이미지 프레임 사이의 상기 복수의 기초 자산에서의 상기 변화가 상기 제1 임계값 미만이라고 결정하는 것을 기반으로 상기 제2 이미지 프레임을 렌더링(rendering)하기 위해 상기 제1 이미지 프레임으로부터의 픽셀 데이터를 재사용하도록 더 구성된다. 일부 실시 예들에서, 상기 GPU는 상기 제1 이미지 프레임부터 상기 제2 이미지 프레임까지 상기 픽셀 데이터를 재투사(reprojecting)함으로써 상기 제2 이미지 프레임을 렌더링하기 위해 상기 제1 이미지 프레임으로부터의 상기 픽셀 데이터를 재사용하도록 더 구성된다.In some embodiments, the GPU renders the second image frame based on determining that the change in the plurality of underlying assets between the first image frame and the second image frame is less than the first threshold. It is further configured to reuse pixel data from the first image frame for rendering. In some embodiments, the GPU retrieves the pixel data from the first image frame to render the second image frame by reprojecting the pixel data from the first image frame to the second image frame. It is further configured to reuse.

일부 실시 예들에서, 상기 제1 이미지 프레임 및 상기 제2 이미지 프레임 사이의 상기 복수의 기초 자산에서의 상기 변화는 카메라 및 뷰포트와 연관되고, 상기 GPU는, 상기 제2 이미지 프레임에서의 상기 하나 이상의 뷰포트 및 상기 제1 이미지 프레임에서의 상기 하나 이상의 뷰포트 사이의 변화가 제2 임계값 미만이라고 결정하는 것을 기반으로, 상기 제1 이미지 프레임부터 상기 제2 이미지 프레임까지 픽셀 데이터를 재투사함으로써, 상기 제2 이미지 프레임을 렌더링하기 위해 상기 제1 이미지 프레임으로부터의 상기 픽셀 데이터를 재사용하도록 더 구성된다.In some embodiments, the change in the plurality of underlying assets between the first image frame and the second image frame is associated with a camera and viewport, and the GPU comprises: the one or more viewports in the second image frame. And reprojecting pixel data from the first image frame to the second image frame, based on determining that the change between the one or more viewports in the first image frame is less than a second threshold, the second Further configured to reuse the pixel data from the first image frame to render the image frame.

일부 실시 예들에서, 상기 GPU는, 상기 제2 이미지 프레임의 품질의 감소를 결정하고, 상기 제2 이미지 프레임의 출력 컬러가 상기 제1 이미지 프레임으로부터의 픽셀 데이터의 가중 합(weighted sum)에 대응하는지를 결정하고, 그리고 상기 제2 이미지 프레임의 상기 출력 컬러가 상기 제1 이미지 프레임으로부터의 상기 픽셀 데이터의 상기 가중 합에 대응한다고 결정하는 것을 기반으로 상기 제1 이미지 프레임에 대한 상기 제3 쉐이딩 레이트를 결정하도록 더 구성되고, 상기 제2 이미지 프레임의 상기 출력 컬러가 상기 제1 이미지 프레임으로부터의 상기 픽셀 데이터의 상기 가중 합에 대응한다고 결정하는 것은 상기 제2 이미지 프레임의 고차 함수(functional)를 결정하는 것을 포함한다.In some embodiments, the GPU determines a decrease in quality of the second image frame, and determines whether the output color of the second image frame corresponds to a weighted sum of pixel data from the first image frame. And determine the third shading rate for the first image frame based on determining that the output color of the second image frame corresponds to the weighted sum of the pixel data from the first image frame. And determining that the output color of the second image frame corresponds to the weighted sum of the pixel data from the first image frame comprises determining a functional of the second image frame. Include.

일부 실시 예들에서, 상기 제1 이미지 프레임에서의 다수의 쉐이딩된 샘플들 및 상기 고차 함수에 대한 다수의 입력 값들 사이의 비례 관계는 상기 GPU의 워크로드(workload)에 기반하고, 상기 고차 함수는, 상기 품질 감소 필터를 입력으로 수신하고, 그리고 상기 제1 이미지 프레임에 대한 상기 제3 쉐이딩 레이트를 결정하도록 구성된다. In some embodiments, a proportional relationship between a plurality of shaded samples in the first image frame and a plurality of input values for the higher-order function is based on a workload of the GPU, and the higher-order function, Receive the quality reduction filter as an input and determine the third shading rate for the first image frame.

일부 실시 예들에서, GPU(Graphics Processing Unit)에 의한 이미지 프레임들의 적응형 쉐이딩(adaptive shading)을 수행하는 방법은, 상기 GPU에 의해, 하나 이상의 휴리스틱(heuristic)을 상기 GPU의 픽셀 쉐이더(shader)에 적용함으로써 품질 감소 필터가 사용되었는지를 결정하는 단계, 상기 GPU에 의해, 상기 GPU에서 상기 품질 감소 필터가 사용된다고 결정하는 것을 기반으로 상기 픽셀 쉐이더에서의 함수의 고차 함수(functional)를 결정하는 단계, 상기 GPU에 의해, 상기 고차 함수를 기반으로 제1 이미지 프레임의 제1 쉐이딩 레이트를 결정하는 단계, 및 상기 GPU에 의해, 상기 제1 쉐이딩 레이트를 기반으로 상기 제1 이미지 프레임을 렌더링(rendering)하는 단계를 포함한다. 일부 실시 예들에서, 상기 제1 이미지 프레임을 렌더링하는 단계는, 상기 GPU에 의해, 상기 제1 쉐이딩 레이트를 기반으로 상기 제1 이미지 프레임의 적어도 제1 부분을 렌더링하는 단계를 포함하고, 상기 제1 이미지 프레임의 상기 제1 부분은 상기 제1 이미지 프레임에서의 픽셀들의 렌더링된 서브셋(subset)이다.In some embodiments, the method of performing adaptive shading of image frames by a Graphics Processing Unit (GPU) is, by the GPU, one or more heuristics are applied to a pixel shader of the GPU. Determining whether a quality reduction filter is used by applying, determining, by the GPU, a higher-order function of a function in the pixel shader based on determining that the quality reduction filter is used in the GPU, Determining, by the GPU, a first shading rate of a first image frame based on the higher-order function, and rendering the first image frame based on the first shading rate, by the GPU Includes steps. In some embodiments, the rendering of the first image frame comprises rendering, by the GPU, at least a first portion of the first image frame based on the first shading rate, and the first The first portion of an image frame is a rendered subset of pixels in the first image frame.

본 발명의 실시 예에 따르면, 픽셀들의 블록에서 렌더링되는 샘플들의 수가 감소되는 모션 기반의 적응형 렌더링을 제공된다.According to an embodiment of the present invention, motion-based adaptive rendering in which the number of samples rendered in a block of pixels is reduced is provided.

또한, 본 발명의 실시 예에 따르면, 개별 디스플레이 화면 타일들(픽셀들의 블록)에서 모션 및 다른 요소들을 자동적으로 분석하고 타일-대-타일(tile-by-tile) 기반으로 샘플링 결정이 이루어지는 적응형 쉐이딩(adaptive shading)을 수행하는 방법 및 시스템이 제공된다.In addition, according to an embodiment of the present invention, motion and other elements are automatically analyzed in individual display screen tiles (blocks of pixels), and a sampling decision is made on a tile-by-tile basis. A method and system for performing adaptive shading is provided.

도 1은 일반적인 그래픽 파이프라인을 도시한다.
도 2는 본 발명의 실시 예에 따른 그래픽 파이프라인을 도시한다.
도 3은 본 발명의 실시 예에 따른 적응형 디샘플링 생성기를 도시한다.
도 4는 본 발명의 실시 예에 따른 적응형 렌더링을 수행할 때 픽셀 속도 고려 사항들을 예시적으로 도시한다.
도 5는 본 발명의 실시 예에 따른 렌더링 및 재구성 옵션들을 나타내는 순서도이다.
도 6a는 본 발명의 실시 예에 따른 시각적 아티팩트들을 감소시키기 위한 디더링 샘플링 패턴들을 예시적으로 도시한다.
도 6b는 본 발명의 실시 예에 따른 샘플링 패턴들의 디더링을 수행하는 일반적인 방법을 도시한다.
도 7a는 본 발명의 실시 예에 따른 이송의 예시를 도시한다.
도 7b는 본 발명의 실시 예에 따른 그래픽 시스템에서 이송을 수행하는 일반적인 방법을 도시한다.
도 8은 본 발명의 실시 예에 따른 큐빅 스플라인 보간을 수행하기 위한 미리-계산된 가중치들을 사용하는 예시를 도시한다.
도 9는 본 발명의 실시 예에 따른 미리-계산된 가중치들을 결정하기 위한 고려 사항들과 관련된 샘플링 패턴의 예시를 도시한다.
도 10은 본 발명의 실시 예에 따른 미리-계산된 가중치들을 결정하기 위한 고려 사항들과 관련된 샘플링 패턴의 예시를 도시한다.
도 11은 본 발명의 실시 예에 따른 적응형 디샘플링의 일반적인 방법을 도시한다.
도 12는 본 발명의 실시 예에 따른 그래픽 시스템에서 큐빅 스플라인 보간을 수행하는 일반적인 방법을 도시한다.
도 13은 본 발명의 실시 예에 따른 그래픽 시스템에서 큐빅 스플라인 보간을 수행하는 일반적인 방법을 도시한다.
도 14는 이송 및 스플라인 재구성 사이의 차이들의 예시를 도시한다.
도 15a 및 15b는 픽셀 당 속도의 크기를 기반으로 다른 접근법들을 사용하여 프레임의 다른 영역들이 적응적으로 렌더링되는 예시를 도시한다.
도 16은 본 발명의 실시 예에 따른 스테레오스코픽 렌더링을 위해 이송을 사용하는 예시를 도시한다.
도 17은 본 발명의 실시 예에 따른 포비티드 렌더링이 적용된 적응형 렌더링을 도시한다.
도 18은 예시적인 GPU 파이프라인을 도시한다.
도 19a 및 19b는 다른 뷰포트로부터의 데이터를 재사용(또는 재투사)하는 예시를 도시한다.
도 20은 렌더링된 이미지에서 품질 감소를 검출하고 쉐이딩 레이트 이미지를 계산하는 쉐이더 분석 방법을 도시한 예시적인 순서도이다.
도 21은 GPU(Graphics Processing Unit)에 의해 이미지 프레임들의 적응형 쉐이딩을 수행하는 예시적인 방법을 도시한다.
도 22는 픽셀들의 예시적인 배열을 도시한다.
도 23은 예시적인 픽셀 매핑을 도시한다. 1 shows a typical graphics pipeline.
2 illustrates a graphics pipeline according to an embodiment of the present invention.
3 illustrates an adaptive desampling generator according to an embodiment of the present invention.
4 exemplarily shows pixel speed considerations when performing adaptive rendering according to an embodiment of the present invention.
5 is a flowchart illustrating rendering and reconstruction options according to an embodiment of the present invention.
6A exemplarily shows dithering sampling patterns for reducing visual artifacts according to an embodiment of the present invention.
6B shows a general method of dithering sampling patterns according to an embodiment of the present invention.
7A shows an example of transfer according to an embodiment of the present invention.
7B illustrates a general method of performing transfer in a graphic system according to an embodiment of the present invention.
8 illustrates an example of using pre-computed weights for performing cubic spline interpolation according to an embodiment of the present invention.
9 shows an example of a sampling pattern related to considerations for determining pre-computed weights according to an embodiment of the present invention.
10 shows an example of a sampling pattern related to considerations for determining pre-computed weights according to an embodiment of the present invention.
11 shows a general method of adaptive desampling according to an embodiment of the present invention.
12 illustrates a general method of performing cubic spline interpolation in a graphic system according to an embodiment of the present invention.
13 illustrates a general method of performing cubic spline interpolation in a graphic system according to an embodiment of the present invention.
14 shows an example of differences between feed and spline reconstruction.
15A and 15B show an example in which different regions of a frame are adaptively rendered using different approaches based on the magnitude of the per-pixel rate.
16 illustrates an example of using transport for stereoscopic rendering according to an embodiment of the present invention.
17 illustrates adaptive rendering to which fobited rendering is applied according to an embodiment of the present invention.
18 shows an exemplary GPU pipeline.
19A and 19B show an example of reusing (or reprojecting) data from another viewport.
20 is an exemplary flow chart illustrating a shader analysis method for detecting a quality decrease in a rendered image and calculating a shading rate image.
21 illustrates an exemplary method of performing adaptive shading of image frames by a Graphics Processing Unit (GPU).
22 shows an exemplary arrangement of pixels.
23 shows an exemplary pixel mapping.

예시적인 그래픽 파이프라인 시스탬 개요Overview of an exemplary graphics pipeline system

도 2는 본 발명의 실시 예에 따른 그래픽 파이프라인(200)을 도시한다. 그래픽 파이프라인(200)은 그래픽 하드웨어를 포함하는 GPU(Graphics Processing Unit)를 사용하여 구현될 수 있다. 그래픽 파이프라인(200)은 프레임의 영역들을 자동적으로 결정하는 것을 지원하기 위해 몇몇의 새로운 단계(단, 스테이지(stage))들 및 함수들을 포함하고, 프레임의 영역들은 인간 사용자에게 허용되는 시청 경험을 달성하기 위해 개별 타일들(픽셀들의 블록들) 내의 모든 픽셀들이 샘플링되고 렌더링되는 것을 요구하지 않는다. 본 명세서에서 사용된 바와 같이, 타일은 이미지 내에서, 전형적으로는 정사각형 형상을 갖는 블록내에서 픽셀들의 인접한 세트이다. 프레임(frame)이라는 용어는 일반적으로 미리 설정된 주파수(preset frequency)에서 디스플레이에 의해 읽혀지는 이미지를 렌더링(rendering)하기 위해 수행되는 동작들의 세트(set)를 설명하는데 사용된다. 그러나, 프레임이라는 용어는 이미지를 렌더링하는데 사용된 동작들의 세트로부터 도출된 렌더링된 이미지를 지칭하는데에도 사용된다. 2 shows a graphics pipeline 200 according to an embodiment of the present invention. The graphics pipeline 200 may be implemented using a Graphics Processing Unit (GPU) including graphics hardware. The graphics pipeline 200 includes several new stages (but stages) and functions to assist in automatically determining regions of the frame, and regions of the frame provide an acceptable viewing experience for a human user. It does not require that all pixels in individual tiles (blocks of pixels) be sampled and rendered to achieve. As used herein, a tile is an adjacent set of pixels within an image, typically within a block having a square shape. The term frame is generally used to describe a set of operations performed to render an image read by a display at a preset frequency. However, the term frame is also used to refer to a rendered image derived from a set of actions used to render the image.

일 실시 예에서, 적응형 디샘플링(AD; Adaptive Desampling) 샘플 생성기 단계(205)는 이미지의 로컬 영역들에서 샘플링 패턴을 조정하는 것을 지원하기 위해 제공되고, 여기서, 로컬 영역은 픽셀들의 블록(예를 들어, 4 x 4 픽셀들의 블록, 16 x 16, 또는 다른 크기)에 대응하는 타일이다. 디샘플링은 현재 프레임에서 샘플링 및 렌더링되는 타일 당 샘플들의 수의 감소이다. 예를 들어, 디샘플링은 타일에서 픽셀 당 하나의 샘플보다 평균적으로 적은 샘플링 및 렌더링을 포함할 수 있고, 따라서 서브-샘플링(sub-sampling)으로도 설명된다. 풀 이미지 해상도를 유지하기 위해, 누락된 픽셀 데이터의 값들을 획득하기 위해 2개의 다른 접근법들이 사용될 수 있다. 재구성 및 이송 단계(210)는 사용자의 시각적 품질을 유지하면서 타일에서 샘플링되고 렌더링될 필요가 있는 픽셀들의 수를 감소시키기 위해 2개의 다른 옵션들을 지원한다. 재구성 및 이송 단계(210)는 재구성 서브모듈(211) 및 이송 서브모듈(212)을 포함한다. 일 실시 예에서, 타일에서 렌더링된 픽셀들의 수를 감소시키는 제1 옵션은 해당 타일의 누락된 픽셀 데이터를 생성하기 위해 타일에서 고차 다항식 보간(interpolation) 및 필터링을 통해 재구성하는 것이다. 타일에서 렌더링된 픽셀들의 수를 감소시키는 제2 옵션은 이송(advection)이고, 이송은 이전 프레임에서 하나 이상의 픽셀의 위치를 식별하는 것 그리고 타일에서 픽셀들의 선택된 프랙션(fraction)에 대한 이전 프레임으로부터의 픽셀들을 재사용하는 것을 포함한다. In one embodiment, the adaptive desampling (AD) sample generator step 205 is provided to assist in adjusting the sampling pattern in local regions of the image, where the local region is a block of pixels (e.g. For example, a block of 4 x 4 pixels, 16 x 16, or another size). Desampling is the reduction in the number of samples per tile sampled and rendered in the current frame. For example, desampling may involve on average less sampling and rendering than one sample per pixel in a tile, and thus is also described as sub-sampling. To maintain full image resolution, two different approaches can be used to obtain values of the missing pixel data. The reconstruction and transfer step 210 supports two different options to reduce the number of pixels that need to be sampled and rendered in the tile while maintaining the user's visual quality. The reconfiguration and transfer step 210 includes a reconfiguration submodule 211 and a transfer submodule 212. In one embodiment, a first option to reduce the number of rendered pixels in a tile is to reconstruct the tile through higher-order polynomial interpolation and filtering to generate missing pixel data for that tile. A second option to reduce the number of rendered pixels in a tile is advection, which is to identify the location of one or more pixels in the previous frame and from the previous frame for a selected fraction of pixels in the tile. It involves reusing the pixels of

일 실시 예에서, 프레임 “n”으로부터의 객체들(220)의 프레임 n의 픽셀 데이터(215)는 다음 프레임 “n+1”에서의 픽셀 데이터의 가능한 재사용을 위해 저장된다. 또한, 버텍스(vertex) 좌표 데이터는 픽셀들의 프레임-투-프레임(frame-to-frame) 모션 벡터를 결정하는데 사용하기 위해 저장된다. 일 실시 예에서, 프레임 n으로부터의 픽셀 데이터 및 버텍스 좌표는 다음 프레임 n+1에서 사용하기 위해 버퍼 메모리에 저장된다. In one embodiment, pixel data 215 of frame n of objects 220 from frame “n” is stored for possible reuse of pixel data in next frame “n+1”. In addition, vertex coordinate data is stored for use in determining a frame-to-frame motion vector of pixels. In one embodiment, pixel data and vertex coordinates from frame n are stored in a buffer memory for use in the next frame n+1.

도 3은 본 발명의 실시 예에 따른 AD 샘플 단계(205)를 도시한다. 일 실시 예에서, 디샘플링 결정들은 속도 및 에지 검출(예를 들어, 깊이/Z에서의 에지 검출)을 기반으로 로컬 타일 영역들에서 이루어질 수 있다. 속도 버퍼(310)는 현재 프레임으로부터 및 이전 프레임으로부터 퍼 버텍스(per vertex) 좌표들을 수신한다. 개별 픽셀의 속도는 현재 프레임의 픽셀의 버텍스 좌표들과 이전 프레임에서의 픽셀의 버텍스 좌표들을 비교함으로써 결정될 수 있다. 일 실시 예에서, 포워드 스플래팅(forward splatting) 접근법은 장면에서의 프리미티브(primitive)들로 “속도 이미지”를 렌더링하고, 퍼-버텍스 속도를 버텍스 속성으로 사용함으로써 사용된다. 많은 그래픽 애플리케이션들은 렌더링 패스(pass)들 동안 픽셀 쉐이더(shader) 인스턴스(instance)들의 수를 감소시키는 기법으로 Z-버퍼를 렌더링한다. 속도 버퍼/이미지는 Z-버퍼로 렌더링될 수 있다. Z/깊이 버퍼가 생성되는 Z-패스 동안, 깊이를 스플래팅하고 업데이트하는 것 외에도, 속도는 각 픽셀에서도 업데이트된다. 속도 버퍼를 렌더링하면 화면 공간에서 퍼-픽셀(per-pixel) 속도 값들이 생성되고, 이것의 크기는 속도에 대응한다. 따라서, 4 x 4 타일과 같은, 타일은 각 픽셀과 연관된 픽셀 속도를 갖는다. 따라서, 타일은 최대 픽셀 속도, 평균(mean) 픽셀 속도, 중앙(median) 픽셀 속도, 및 최소 픽셀 속도를 갖는다. 일 실시 예에서, 평균 픽셀 속도는 디샘플링 결정들을 내리는데 사용되지만, 보다 일반적으로는 최대 픽셀 속도 또는 평균 픽셀 속도 또한 사용될 수 있다. 3 shows an AD sample step 205 according to an embodiment of the present invention. In an embodiment, desampling decisions may be made in local tile regions based on speed and edge detection (eg, edge detection in depth/Z). The velocity buffer 310 receives per vertex coordinates from the current frame and from the previous frame. The speed of an individual pixel may be determined by comparing the vertex coordinates of the pixel in the current frame with the vertex coordinates of the pixel in the previous frame. In one embodiment, the forward splatting approach is used by rendering a “velocity image” with primitives in the scene and using the per-vertex velocity as a vertex attribute. Many graphics applications render the Z-buffer with a technique that reduces the number of pixel shader instances during rendering passes. The velocity buffer/image can be rendered as a Z-buffer. During the Z-pass where the Z/Depth buffer is created, in addition to splatting and updating the depth, the speed is also updated for each pixel. Rendering the speed buffer creates per-pixel speed values in screen space, the size of which corresponds to speed. Thus, a tile, such as a 4 x 4 tile, has a pixel rate associated with each pixel. Thus, the tile has a maximum pixel rate, a mean pixel rate, a median pixel rate, and a minimum pixel rate. In one embodiment, the average pixel rate is used to make desampling decisions, but more generally a maximum pixel rate or an average pixel rate may also be used.

이동하는 객체들에서 시각적 아티팩트(artifact)들은 사람의 눈에 덜 인지된다. 따라서, 타일에서 샘플링 레이트가 감소될 수 있는지 여부에 대한 하나의 요소는 속도가 임계 속도를 초과하는지 여부이다. Visual artifacts in moving objects are less perceived by the human eye. Thus, one factor as to whether the sampling rate in the tile can be reduced is whether the rate exceeds the threshold rate.

그러나, 시각적 아티팩트들의 특정 타입들은 컬러의 에지에서 더 인지되기 쉬운 경향이 있다. 엄밀하게는, 이미지를 먼저 렌더링하지 않으면 최종 이미지에서 컬러 에지를 검출하는 것이 불가능하다. 그러나, 렌더링 이전에, 컬러의 에지의 높은 가능성을 검출하는 것이 가능할 수 있다. 즉, 일 실시 예에서, 에지 검출 모듈(305)은 픽셀들의 로컬 블록들에서 컬러의 에지의 가능성을 검출한다. 즉, 객체들에 걸쳐 컬러 변동의 높은 가능성이 있다고 가정함으로써, 컬러의 에지의 높은 가능성이 있는 영역들이 검출된다. 일 실시 예에서, 현재 프레임의 래스터화(rasterization)로부터의 Z값들은 에지 검출을 수행하기 위해 분석된다. 라플라스(laplace) 에지 검출기는 현재 픽셀을 중심으로 한 스텐실(stencil)로 정의될 수 있다. 타일에서의 임의의 픽셀은 픽셀에서 Z-버퍼의 라플라시안(laplacian)이 픽셀에서 Z-값을 곱한 임계값보다 크면 에지를 갖는 것으로 마크(mark)된다. 이것은 타일 당 하나의 비트 값을 정의한다. 보다 일반적으로, 에지 검출의 임의의 타입이 사용될 수 있다. However, certain types of visual artifacts tend to be more perceptible at the edge of the color. Strictly speaking, it is impossible to detect color edges in the final image without first rendering the image. However, prior to rendering, it may be possible to detect a high likelihood of an edge of color. That is, in one embodiment, the edge detection module 305 detects the possibility of an edge of color in local blocks of pixels. That is, by assuming that there is a high likelihood of color fluctuation across objects, areas with high likelihood of an edge of color are detected. In one embodiment, the Z values from rasterization of the current frame are analyzed to perform edge detection. The laplace edge detector may be defined as a stencil centered on the current pixel. Any pixel in the tile is marked as having an edge if the laplacian of the Z-buffer in the pixel is greater than the threshold multiplied by the Z-value in the pixel. This defines one bit value per tile. More generally, any type of edge detection can be used.

일 실시 예에서, 에지 마스크는 개별 타일에 대해 생성되고 에지 상태 비트는 타일이 적어도 하나의 에지를 포함하는지 여부를 가리키기 위해 생성될 수 있다. 일 구현에서, 더 일반적으로 다른 타일 크기들이 사용될 수 있으나, 에지 마스크는 각각의 4 x 4 픽셀들의 블록에 대해 생성된다. 속도 및 에지의 존재에 관한 정보는 타일의 샘플 패턴을 결정하기 위해 샘플 생성기(315)에 의해 사용된다. 일 실시 예에서, 에지가 검출되면 풀 샘플링 해상도가 이용된다. 에지가 검출되지 않고 타일이 제1 임계 속도보다 큰 속도를 가지면, 제1 감소된 샘플링 레이트가 사용된다. 에지가 검출되지 않고 타일이 제2 임계 속도를 초과하는 속도를 가지면 제2 감소된 샘플링 레이트가 사용된다. 다른 추가적인 옵션 요소들은 샘플링 레이트 결정을 내리는 데에도 고려될 수 있다. 일 실시 예에서, 샘플 패턴 옵션들은 풀 샘플 해상도(픽셀 당 적어도 하나의 샘플), 원-하프(one-half, 1/2) 해상도(각 타일에서 샘플링된 픽셀들의 절반), 및 원-쿼터(one-quarter, 1/4) 해상도(각 타일에서 샘플링된 4개의 픽셀들 중 하나)를 포함한다. 보다 일반적으로, 각각의 샘플 레이트에 대한 임계 파라미터들에 의해 제어되는 복수의 샘플링 레이트들이 제공될 수 있다. 또한, 선택된 샘플 레이트들은 선택된 블록/타일 크기에 최적화될 수 있다. 따라서, 예시적인 실시 예는 4 x 4 블록들에 대한 4, 8, 및 16개의 샘플들의 3개의 샘플 레이트들을 포함하지만, 접근법은 각각의 샘플 레이트에 대한 임계 파라미터들에 의해 각각 제어되는 샘플링 레이트들의 세트를 갖는 다른 고려 사항들 또는 블록 크기를 기반으로 변경될 수 있다. 따라서, 샘플링 레이트들의 수, N은, 블록/타일 크기 및 다른 요소들과 같은 구현 디테일(detail)들에 따라 3보다 클 수 있다. In one embodiment, an edge mask is generated for an individual tile and an edge status bit may be generated to indicate whether the tile includes at least one edge. In one implementation, more generally other tile sizes may be used, but an edge mask is created for each block of 4 x 4 pixels. Information about the velocity and the presence of an edge is used by the sample generator 315 to determine the sample pattern of the tile. In one embodiment, when an edge is detected, the full sampling resolution is used. If no edge is detected and the tile has a rate greater than the first threshold rate, then the first reduced sampling rate is used. If no edge is detected and the tile has a rate exceeding the second threshold rate, the second reduced sampling rate is used. Other additional optional factors may also be considered in making the sampling rate determination. In one embodiment, the sample pattern options are full sample resolution (at least one sample per pixel), one-half (1/2) resolution (half of the pixels sampled in each tile), and one-quarter ( one-quarter, 1/4) resolution (one of 4 pixels sampled from each tile). More generally, multiple sampling rates may be provided that are controlled by threshold parameters for each sample rate. Also, the selected sample rates can be optimized for the selected block/tile size. Thus, the exemplary embodiment includes 3 sample rates of 4, 8, and 16 samples for 4×4 blocks, but the approach is of the sampling rates each controlled by the threshold parameters for each sample rate. It can be changed based on block size or other considerations having a set. Thus, the number of sampling rates, N, may be greater than 3 depending on implementation details such as block/tile size and other factors.

일 실시 예에서, 동일한 유효 샘플링 레이트를 갖는 샘플링 패턴들의 선택으로부터 샘플링 패턴을 조정하기 위해 디더링 모듈(320)이 제공된다. 디더링(dithering)은 반복적인 시퀀스(예를 들어, 샘플 패턴1, 샘플 패턴2, 샘플 패턴3, 샘플 패턴4)이거나 랜덤화의 양상들을 포함할 수 있다. In one embodiment, a dithering module 320 is provided to adjust the sampling pattern from the selection of sampling patterns having the same effective sampling rate. Dithering may be a repetitive sequence (eg, sample pattern 1, sample pattern 2, sample pattern 3, sample pattern 4) or may include aspects of randomization.

디더링 모듈(320)에 의한 샘플링 패턴의 디더링은 인간 사용자들에 의한 샘플링 아티팩트들의 시각적 인식을 감소시킨다. 인간의 눈 및 인간의 뇌는 레이트가 생체(biological) 임계값보다 빠르면 이미지들을 비디오 시퀀스로 혼합(blend)하기 시작한다. 즉, 이미지들이 생체 임계값보다 빠른 레이트로 변할 때, 인간의 눈은 시간이 지남에 따라 이미지들을 혼합하고 그것들을 비디오와 유사한 연속적으로 변하는 시퀀스로 인식한다. 생체 임계값의 정확한 수치에 대한 일부 논쟁이 있다. 초당 약 12프레임의 프레임 레이트에서, 인간의 눈 및 뇌는 개별 이미지들 대신에 이동하는 이미지들의 시퀀스를 보기 시작한다. 그러나, 비교적 유동적인(덜컥거리지 않는) 이동의 시작들을 경험하기 위해 초당 약 15프레임의 다소 높은 프레임 레이트가 요구된다. 그러나, 기초 이미지(underlying image)들의 본성은 인간 관측자가 주어진 프레임 레이트에서 유동적인 모션을 인식할지 여부에 대한 추가적인 요소이기도 하다. 따라서, 인간의 눈은 초당 약 12프레임 이상의 프레임 레이트에서 디더링된 시각적 아티팩트들을 평균화하는 경향이 있다. 일 실시 예에서, 디더링은 모든 픽셀이 초당 적어도 15 프레임으로 렌더링되도록 수행되고, 이는 인간의 눈이 개별 이미지들을 식별할 수 있는 것보다 빠르다. 초당 60 프레임에서, 모든 4개의 프레임들마다 타일에서 샘플 패턴을 디더링하는 것은 적어도 초당 15 프레임으로 각 픽셀을 렌더링하는 것에 대응한다. Dithering of the sampling pattern by the dithering module 320 reduces visual perception of sampling artifacts by human users. The human eye and human brain begin to blend images into a video sequence when the rate is faster than the biological threshold. That is, when images change at a rate faster than the biometric threshold, the human eye blends the images over time and perceives them as a continuously changing sequence similar to a video. There is some debate about the exact number of biometric thresholds. At a frame rate of about 12 frames per second, the human eye and brain begin to see a sequence of moving images instead of individual images. However, a rather high frame rate of about 15 frames per second is required to experience relatively fluid (non-rumbling) beginnings of movement. However, the nature of the underlying images is also an additional factor as to whether a human observer will perceive fluid motion at a given frame rate. Thus, the human eye tends to average dithered visual artifacts at a frame rate of about 12 frames per second or more. In one embodiment, the dithering is performed such that every pixel is rendered at least 15 frames per second, which is faster than the human eye can identify individual images. At 60 frames per second, dithering the sample pattern in the tile every 4 frames corresponds to rendering each pixel at least 15 frames per second.

예시적인 모션 속도 체제들Example motion speed regimes

도 4는 본 발명의 실시 예에 따른 속도 체제들의 예시들을 도시한다. 모션은 객체 모션 및 카메라 모션의 조합이다. 속도는 타일에서 모션 벡터의 크기에 대응한다. 이 예시에서, 속도는 수용 가능한 시각적 품질을 갖기 위한 픽셀들의 블록에서 필요한 샘플들의 수의 지표이다. 모션이 픽셀들의 블록에 대해 특정 임계 속도(K_fast1)를 초과하면, 인간의 눈은 이동하는 객체에서 높은 주파수들을 인지할 수 없으므로 샘플들의 수가 감소(예를 들어, 4 x 4 타일에서 8개의 샘플들)된다는 지표일 수 있다. 속도가 더 높은 임계 속도(K_fast2)를 초과하면, 타일에서 샘플들의 수가 더 감소(예를 들어, 4 x 4 타일에서 4개의 샘플들)된다는 지표일 수 있다. 반면에, 타일에서 모션이, 속도(K_stat) 미만으로(또는 모션이 없음) 매우 느리면, 이전 프레임으로부터 픽셀 데이터를 재사용(예를 들어, 이송을 통해 이전 프레임으로부터의 8개의 컬러 값들을 재사용하고 4 x 4 타일에서 8개의 샘플들을 렌더링함)할 기회가 있을 수 있다. 이전 프레임으로부터 픽셀 데이터의 재사용은 또한 이전 프레임에서 현재 프레임까지 변하지 않는 그래픽 상태를 요구하고, 그래픽 상태는 사용된 쉐이더들, 쉐이더들에 제공된 상수들, 및 프레임들에 제공된 기하(geometry)를 포함한다. 풀 샘플링 해상도가 요구되는 속도 체제가 있을 수 있다. 예를 들어, K_stat 및 K_fast1 사이의 중간 속도 체제가 있을 수 있고, 중간 속도 체제에서 높은 시각적 품질을 달성하기 위해 풀 샘플링 해상도가 요구될 수 있다. 또한, 개별 타일들에 슈퍼-샘플링이 적용되는 시나리오들이 있을 수 있다. 예시적인 실시 예로서, 옵션이 Z-에지 경우의 슈퍼-샘플링을 지원하기 위해 제공될 수 있다. 4 shows examples of speed regimes according to an embodiment of the present invention. Motion is a combination of object motion and camera motion. The velocity corresponds to the size of the motion vector in the tile. In this example, the speed is an indicator of the number of samples required in a block of pixels to have an acceptable visual quality. If the motion exceeds a certain threshold speed (K _fast1 ) for a block of pixels, the human eye cannot perceive high frequencies in the moving object, so the number of samples decreases (e.g., 8 samples in a 4 x 4 tile). It may be an indicator that it is). If the speed exceeds the higher threshold speed (K _fast2 ), it may be an indicator that the number of samples in the tile is further reduced (eg, 4 samples in a 4 x 4 tile). On the other hand, if the motion in the tile is very slow, below the velocity (K _stat ) (or no motion), then reuse the pixel data from the previous frame (e.g., via transfer, reuse 8 color values from the previous frame There may be an opportunity to render 8 samples in a 4 x 4 tile). Reuse of pixel data from the previous frame also requires a graphics state that does not change from the previous frame to the current frame, and the graphics state includes the shaders used, the constants provided to the shaders, and the geometry provided to the frames. . There may be rate regimes that require full sampling resolution. For example, there may be an intermediate rate regime between K _stat and K _fast1 , and full sampling resolution may be required to achieve high visual quality in the medium rate regime. In addition, there may be scenarios in which super-sampling is applied to individual tiles. As an exemplary embodiment, an option may be provided to support super-sampling in a Z-edge case.

일 실시 예에서, 디샘플링(샘플링 레이트를 픽셀 당 하나의 샘플 미만으로 감소시키도록 샘플 패턴을 변경하는 것)은 속도가 제1 임계 속도(K_fast1)를 초과하면 허용된다. 일 실시 예에서, 샘플링 레이트는 속도가 제2 임계 속도(K_fast2)를 초과하면 더 감소될 수 있다. 디샘플링 수행 여부에 대한 결정은 에지가 검출되었는지 여부와 같은 다른 조건들에도 더 의존할 수 있다. In one embodiment, _desampling (changing the sample pattern to reduce the sampling rate to less than one sample per pixel) is allowed if the rate exceeds a first threshold rate (K _fast1 ). In an embodiment, the sampling rate may be further reduced when the rate exceeds the second threshold rate K _fast2 . The decision as to whether to perform desampling may further depend on other conditions such as whether an edge has been detected.

일 실시 예에서, 카메라 화면 공간에서의 모션은 현재 프레임 및 이전 프레임으로부터의 버텍스 위치 데이터를 구별함으로써 얻어진다. 타일의 속도 체제는 객체의 픽셀이 한 프레임에서 다른 프레임으로 얼마나 이동하였는지를 기반으로 모션 벡터의 크기를 계산함으로써 타일-대-타일 기반으로 분류된다. 전술된 바와 같이, 일 실시 예에서, 스플래팅은 퍼-픽셀 모션 벡터들을 결정하기 위해 Z-패스에서 사용된다. 일 실시 예에서, 속도 임계값들은 현재 프레임에서 적응형 디샘플링 또는 이송이 사용될지 여부를 결정하기 위한 입력들로서 정의되고 사용된다. 하나의 속도 체제는 객체의 픽셀들이 그것들의 이전 이미지 대응 부분들과 크게 다르지 않게 객체가 충분히 느리게 이동하는 준-정적 체제이다. 속도가 준-정적 속도 제한 내에 있으면, 이전 프레임으로부터 픽셀들을 재사용하기 위해 이송이 사용될지 여부에 대한 결정이 이루어질 수 있다. 일 실시 예에서, 준-정적 속도의 상한, K_stat은 프레임 n에서 주어진 타일(타일 m)에서의 픽셀이 프레임 n+1에서 동일한 타일에 유지되는 것이다. 일 실시 예에서, 속도가 K_stat미만이면, 이전 프레임으로부터의 픽셀들이 현재 프레임에서 사용될 수 있을지 여부를 결정하기 위해 추가적인 체크들이 수행된다. 이것은 이전 프레임에서 이송이 수용 가능한 결과를 생성했는지에 대한 체크를 포함할 수 있다. 또한, 체크는 현재 프레임에서 타일에 대한 픽셀 값들이 이전 프레임에 걸친 작은 이동과 일치하는지 체크하기 위해 수행될 수 있고, 이것은 불일치(discrepancy) 체크로 설명될 수 있다. 이송 불일치 상태 비트는 타일과 연관될 수 있는데, 타일이 픽셀 데이터의 적어도 일부의 이송에 적합함을 확인하기 위해 타일이 하나 이상의 불일치 체크를 통과했음을 가리킬 수 있다.In one embodiment, motion in camera screen space is obtained by distinguishing vertex position data from the current frame and the previous frame. The tile velocity regime is classified on a tile-to-tile basis by calculating the size of a motion vector based on how far a pixel of an object has moved from one frame to another. As described above, in one embodiment, splatting is used in the Z-pass to determine per-pixel motion vectors. In one embodiment, velocity thresholds are defined and used as inputs to determine whether adaptive desampling or transfer will be used in the current frame. A velocity regime is a quasi-static regime in which an object moves slowly enough so that the pixels of the object do not differ significantly from their previous image counterparts. If the speed is within the quasi-static speed limit, a decision can be made as to whether or not the transfer will be used to reuse pixels from the previous frame. In one embodiment, the upper limit of the quasi-static speed, K _stat is that pixels in a given tile (tile m) in frame n are kept on the same tile in frame n+1. In one embodiment, if the rate is less than K _stat , additional checks are performed to determine whether pixels from the previous frame can be used in the current frame. This may include checking whether the transfer in the previous frame produced acceptable results. Also, a check can be performed to check if the pixel values for the tile in the current frame match a small shift over the previous frame, which can be described as a discrepancy check. The transfer mismatch status bit may be associated with the tile, which may indicate that the tile has passed one or more discrepancy checks to ensure that the tile is suitable for transfer of at least some of the pixel data.

도 5는 속도, 에지 검출, 디더링, 스플라인(spline) 재구성 및 이송을 기반으로 적응형 렌더링 선택들을 예시적으로 도시하는 순서도이다. 일부의 일반적인 그래픽 파이프라인 특징들은 명확성을 위해 생략된다. 도 5는 본 발명의 실시 예에 따른 4 x 4 타일들이 사용된 특정 예시들을 도시한다. 픽셀 데이터를 렌더링하기 위해 초기 프리-패스가 수행된 후에 컬러 패스가 수행될 수 있다. 이미지의 장면 기하(505)는 애플리케이션에 의해 제공된다. Z-버퍼가 계산(510)되고, 에지 검출이 수행(515)된다. 장면 기하를 위해 모션 벡터들이 계산(520)된다. 퍼-픽셀 모션 벡터가 계산(525)된다. 타일에서 모션의 범위가 계산(530)된다. 이러한 정보를 기반으로, 1) 4 x 4 블록에서 4, 8, 또는 16개의 샘플들을 렌더링하고 보간을 수행하거나 또는 2) 8개를 렌더링하고 8개를 이송할지 여부에 대한 결정(535)이 이루어질 수 있다. 샘플링 패턴들에 대한 디더링(540)이 수행된다. 스플라인 재구성(545)이 픽셀 데이터를 재구성하기 위해 이용된다. 이송이 사용되면, 픽셀 값들 중 8개를 획득하기 위해 이송(550)이 사용되고, 그리고 나머지는 렌더링에 의해 획득된다. 5 is a flow chart illustrating exemplary adaptive rendering selections based on speed, edge detection, dithering, spline reconstruction and transport. Some common graphics pipeline features are omitted for clarity. 5 shows specific examples in which 4 x 4 tiles according to an embodiment of the present invention are used. A color pass may be performed after an initial pre-pass is performed to render the pixel data. The scene geometry 505 of the image is provided by the application. The Z-buffer is calculated (510) and edge detection is performed (515). Motion vectors are computed 520 for the scene geometry. A per-pixel motion vector is calculated (525). The range of motion in the tile is calculated 530. Based on this information, a decision 535 is made as to whether to 1) render 4, 8, or 16 samples in a 4 x 4 block and perform interpolation, or 2) render 8 and transfer 8. I can. Dithering 540 is performed on the sampling patterns. Spline reconstruction 545 is used to reconstruct the pixel data. If transfer is used, transfer 550 is used to obtain 8 of the pixel values, and the rest is obtained by rendering.

예시적인 샘플링 패턴들 및 디더링Example sampling patterns and dithering

도 6a는 샘플링 패턴들 및 디더링의 예시를 도시한다. 이 예시에서, 타일 크기는 4 x 4 픽셀들의 블록이다. 풀 해상도는 16개의 샘플들에 대응한다. 하프-해상도(8개의 샘플들) 및 원-쿼터 해상도(4개의 샘플들)는 샘플들의 패턴에서 변화를 허용한다. 즉, 8개의 샘플들의 경우에서, 샘플들의 배열은 제1 샘플 패턴, 제2 샘플 패턴, 제3 샘플 패턴, 등을 가질 수 있다. 미리-정의된 샘플링 패턴들을 갖는 것은 시간적인 컬러 평균화를 위한 샘플 패턴의 디더링을 지원한다. 미리-정의된 샘플링 패턴들은 몇 프레임들 마다 모든 픽셀 위치가 렌더링되도록 샘플링을 회전시키기 위해 선택된다. 샘플 패턴의 디더링은 다른 기법들에 의해 달성될 수 있다. 일 실시 예에서, 개별 프레임에서 샘플 패턴의 선택은 모듈로(modulo) k 카운터에 의한 시퀀스에서 디더링 모듈(320)에 의해 선택될 수 있다. 여러 프레임들에서 시간에 따라 샘플 위치들을 디더링하면 인간 관측자가 에러를 보기 어렵게 렌더링된다. 일 실시 예에서, 샘플 패턴들은 각각의 픽셀이 모든 k 프레임들마다 적어도 한번 렌더링되는 것을 보장하도록 선택되고, (n*n)/k는 n x n 타일 당 최소 샘플들의 수이다. 다른 실시 예에서, 시간적인 디더링은 샘플 패턴을 선택하기 위해 확률론적(stochastic) 접근법을 사용하여 구현된다. 6A shows an example of sampling patterns and dithering. In this example, the tile size is a block of 4 x 4 pixels. The full resolution corresponds to 16 samples. Half-resolution (8 samples) and one-quarter resolution (4 samples) allow a change in the pattern of samples. That is, in the case of 8 samples, the arrangement of samples may have a first sample pattern, a second sample pattern, a third sample pattern, and the like. Having pre-defined sampling patterns supports dithering of the sample pattern for temporal color averaging. Pre-defined sampling patterns are selected to rotate the sampling so that every pixel position is rendered every few frames. Dithering of the sample pattern can be achieved by other techniques. In an embodiment, selection of a sample pattern in an individual frame may be selected by the dithering module 320 in a sequence by a modulo k counter. Dithering the sample positions over time in multiple frames renders the error harder for a human observer to see. In one embodiment, the sample patterns are selected to ensure that each pixel is rendered at least once every k frames, where (n*n)/k is the minimum number of samples per n x n tile. In another embodiment, temporal dithering is implemented using a stochastic approach to selecting a sample pattern.

도 6b는 본 발명의 실시 예에 따른 디더링의 방법을 도시한다. 현재 프레임의 타일들은 감소된 평균 샘플링 레이트에서 서브-샘플링(sub-sampling)을 위해 선택(605)된다. 각각의 타일에 대해, 샘플링 패턴은 이전 프레임에 걸쳐 변하도록 선택(610)된다. 렌더링 및 재구성이 수행(615)된다. 추가적인 프레임들이 렌더링되어야 하면, 프로세스는 계속된다. 6B illustrates a method of dithering according to an embodiment of the present invention. The tiles of the current frame are selected 605 for sub-sampling at a reduced average sampling rate. For each tile, the sampling pattern is selected 610 to change over the previous frame. Rendering and reconstruction is performed (615). If additional frames have to be rendered, the process continues.

이송 예시Transfer example

도 7a는 이송의 예시를 도시한다. 4 x 4 타일(700)과 같은, 타일 영역에서, 이송은 이전 프레임의 주어진 위치에서의 픽셀로부터 현재 프레임의 대응하는 위치로 픽셀 데이터를 복사하는 것을 포함한다. 예를 들어, 개별 객체(예를 들어, 지면을 가로 질러 천천히 이동하는 공)는 어떤 속도로 공의 모든 픽셀들이 이동하도록 화면을 가로 질러 이동할 수 있다. 이 예시에서 한 프레임에서 다른 프레임으로 천천히 이동하는 공의 픽셀들 사이에서 높은 레벨의 시간적 일관성(temporal coherence)이 있다. 이 경우, 변경들은 주로 모션이다. 프레임들에 걸쳐 볼의 개별 픽셀들의 모션을 결정함으로써, 픽셀 데이터는 프레임들에 걸쳐 복사될 수 있다. 이 예시에서, 모션은 픽셀 데이터가 현재 픽셀 위치로부터 이전 프레임의 동일한 타일에서의 픽셀로 매핑(mapping)될 수 있을 정도로 충분히 느리다. 이전 프레임에서 픽셀의 위치는, x(n-1) = x - mv(x)로 계산될 수 있고, 이 때 mv(x)는 모션 벡터이다. 결과적으로, 이것은 픽셀 데이터가 x(n-1)로부터 x(n)으로 복사되는 것을 허용한다. 즉, 프레임들 사이의 픽셀의 이동(모션)이 작으면, 현재 프레임의 픽셀 위치는 이젠 프레임의 픽셀로 다시 투사(projection)되고 이전 프레임으로부터의 픽셀 데이터가 복사된다. x(n-1)이 십진(decimal) 요소들을 가지면, 쌍선형(bilinear) 또는 임의의 높은 차수의 보간법이 사용될 수 있다. 7A shows an example of transfer. In a tile area, such as a 4 x 4 tile 700, the transfer involves copying pixel data from a pixel at a given location in the previous frame to a corresponding location in the current frame. For example, an individual object (for example, a ball slowly moving across the ground) can move across the screen so that all pixels of the ball move at a certain speed. In this example, there is a high level of temporal coherence between the pixels of the ball slowly moving from one frame to another. In this case, the changes are primarily motion. By determining the motion of individual pixels of the ball across frames, pixel data can be copied across frames. In this example, the motion is slow enough so that the pixel data can be mapped from the current pixel location to a pixel in the same tile in the previous frame. The position of the pixel in the previous frame can be calculated as x(n-1) = x-mv(x), where mv(x) is a motion vector. Consequently, this allows pixel data to be copied from x(n-1) to x(n). That is, if the movement (motion) of pixels between frames is small, the pixel position of the current frame is now projected back to the pixel of the frame, and pixel data from the previous frame is copied. If x(n-1) has decimal elements, a bilinear or arbitrary high-order interpolation method can be used.

도 7a의 예시에서, 이송은 렌더링과 혼합된다. 일 실시 예에서, 타일에서 픽셀들의 절반(705)에 대해 이송이 사용되고 픽셀들의 다른 절반(710)은 렌더링된다. 단일 프레임에서 이송 및 렌더링을 혼합하면 이송만 수행한 것과 연관된 시각적 아티팩트들이 감소된다. 즉, 전형적인 인간 관측자에 의해 검출 가능한 이송으로 인한 시각적 에러의 가능성이 최소화된다. 시간적인 디더링과 함께, 이는 시간에 따라 에러가 누적되지 않도록 보장함에 따라, 전형적인 인간 관측자에 의해 발견될 시각적 에러의 가능성을 감소시킨다. 렌더링된 픽셀들 및 이송된 픽셀들의 1:1 비율은 하나의 옵션이지만, 보다 일반적으로는 다른 비율들도 이용될 수 있다. In the example of Fig. 7A, transport is mixed with rendering. In one embodiment, transfer is used for half of the pixels 705 in the tile and the other half of the pixels 710 are rendered. Mixing transport and rendering in a single frame reduces the visual artifacts associated with performing only transport. That is, the possibility of visual errors due to transport detectable by a typical human observer is minimized. In conjunction with temporal dithering, this ensures that errors do not accumulate over time, thereby reducing the likelihood of visual errors that will be found by typical human observers. The 1:1 ratio of rendered pixels and transferred pixels is one option, but more generally other ratios may be used.

전술된 바와 같이, 일 실시 예에서, 최대 속도는 이송이 허용될지 여부에 대한 조건으로서 사용된다. 일 실시 예에서, 기준은 작은 이웃한 픽셀 위치들의 로컬 변형이 강체 변환(rigid transformation)으로 분류될 수 있을 정도로 임계 속도가 충분히 낮다는 것이고, 강체 변환에서 픽셀들의 위치에서의 변화는 전체 픽셀들의 세트에 대한 변환 및 회전 중 하나를 사용하여 원하는 정확도로 표현될 수 있다. 예를 들어, 이송을 위한 최대 속도는 픽셀 움직임의 크기가 k개의 픽셀들의 임계값 미만인 것일 수 있다. 강체 변환은 임의의 속도에서 발생할 수 있지만, 속도가 증가함에 따라 가능성이 감소하여, 속도 임계값은 이송이 유리한 경우에 대한 기준으로 사용될 수 있다. 개별 타일들에 대해 불일치 체크가 수행되어 이송이 허용 가능한 결과들을 생성하는지 여부를 결정할 수 있다. 이러한 불일치 체크는 현재 프레임에서 수행될 수 있고 각 타일마다 1-비트 값으로 기록될 수 있으며, 체크가 이송 결과들이 부정확했음을 가리키면 불일치 체크에 실패한 타일의 이웃에서의 이송을 비활성화 시킬지 여부에 대한 결정이 다음 프레임에서 이루어지도록 할 수 있다. 즉, 이러한 구현에서, 이송은 프레임 n의 타일에 대해 수행되고, 불일치 체크는 프레임 n에서 수행되고 프레임 n+1에 의해 소비(사용)된다. 그런 다음 프레임 n+1은 불일치 체크(프레임 n에서 계산된)를 사용하여 프레임 n+1의 타일에 대해 이웃한 이송을 수행할지 여부를 결정한다. 프레임 n에서의 불일치 체크가 이송 결과가 수용 가능했음을 가리키면, 이송은 프레임 n+1에서 허용된다. 그렇지 않으면, 이송은 선택된 수의 프레임들에 대해 턴 오프된다. 불일치 체크는 유효한 이송의 기초 가정(underlying assumption)들과 일치하지 않는 타일의 픽셀 값들에서 큰 변화가 있는지 여부를 기반으로 체크된다. 객체의 픽셀들이 천천히 움직이면, 타일은 2개의 프레임들 사이에서 크게 변할 것으로 기대되지 않는다. 즉, 타일의 상태가 크게 변하면 불일치 체크는 실패한다. 타일 상태 불일치 비트(예를 들어, 0 또는 1)는 불일치 체크를 통과했는지 여부를 가리키는데 사용될 수 있다. 타일 상태 변경이 허용되는 정도는, 예를 들어, 이송의 계산상 이점들 및 시각적 아티팩트들의 발생의 최소화 사이의 트레이드-오프(tradeoff)를 기반으로 휴리스틱(heuristic)적으로 또는 경험적으로 결정될 수 있다. As mentioned above, in one embodiment, the maximum speed is used as a condition as to whether or not transfer is allowed. In one embodiment, the criterion is that the critical speed is sufficiently low that the local transformation of small neighboring pixel positions can be classified as a rigid transformation, and the change in the position of the pixels in the rigid transformation is the entire set of pixels. It can be expressed with the desired accuracy using either transform and rotation for. For example, the maximum speed for transport may be that the magnitude of the pixel movement is less than a threshold value of k pixels. Rigid body transformation can occur at any speed, but the likelihood decreases as the speed increases, so that the speed threshold can be used as a criterion for cases where feed is advantageous. A discrepancy check can be performed on individual tiles to determine whether the transfer produces acceptable results. This discrepancy check can be performed in the current frame and recorded as a 1-bit value for each tile.If the check indicates that the transfer results are inaccurate, a decision on whether to disable the transfer to the neighbor of the tile that failed the discrepancy check is made. It can be done in the next frame. That is, in this implementation, the transfer is performed on the tile of frame n, and the discrepancy check is performed on frame n and consumed (used) by frame n+1. Then, frame n+1 uses a discrepancy check (calculated in frame n) to determine whether to perform a neighbor transfer for the tile in frame n+1. If the discrepancy check in frame n indicates that the transfer result was acceptable, transfer is allowed in frame n+1. Otherwise, the transfer is turned off for the selected number of frames. The discrepancy check is checked based on whether there is a large change in the pixel values of the tile that do not match the underlying assumptions of valid transport. If the object's pixels move slowly, the tile is not expected to change significantly between the two frames. That is, if the state of the tile changes significantly, the discrepancy check fails. The tile state mismatch bit (eg, 0 or 1) can be used to indicate whether the mismatch check has passed. The degree to which a tile state change is allowed can be determined heuristically or empirically, for example, based on a trade-off between computational benefits of transport and minimization of the occurrence of visual artifacts.

불일치 체크들을 수행하는 다른 방법들이 이용될 수 있다. 현재 프레임 n에서의 타일에서의 이송을 수행하고, 불일치 체크를 수행한 다음, 불일치 체크를 이용하여 프레임 n+1에서의 이송을 수행할지 여부를 결정하는 계산상의 이점들이 있다. 그러나, 불일치 체크의 대안적인 구현이 사용 될 수 있는데, 즉 불일치 체크가 프레임 n에서 수행되고, 프레임 n에서 이송이 이용될지 여부를 결정하여 이전 프레임으로부터의 픽셀들을 재사용하는데 사용될 수 있음이 이해될 것이다. Other methods of performing mismatch checks may be used. There are computational advantages of performing the transfer in the tile in the current frame n, performing a discrepancy check, and then deciding whether to perform the transfer in the frame n+1 using the discrepancy check. However, it will be appreciated that an alternative implementation of the discrepancy check could be used, i.e. the discrepancy check is performed in frame n and can be used to reuse the pixels from the previous frame by determining whether or not the transfer will be used in frame n. .

원하는 경우, 다양한 향상 기능들을 사용하여 정확도가 향상될 수 있다. 일 실시 예에서, 전후 에러 정정 보상(BFECC; back and forth error correction and compensation)이 사용될 수 있다. BFECC는 세미-라그랑지안 이송(Semi-Lagrangian advection)으로부터 결정된 위치를 이용하고, 현재 프레임에서 새로운 위치를 얻기 위해 해당 좌표에서 속도를 더한다. 에러가 없으면, 해당 좌표는 원래 위치(x, y)와 같아야 한다. 그렇지 않으면, (x-vx, y-vy)로부터 해당 오차의 절반을 감산하여, 픽셀의 속도가 정확하다고 가정할 때, 픽셀의 절반까지 정확한, 위치의 2차 정확한 추정값이 획득된다. If desired, accuracy can be improved using various enhancement functions. In one embodiment, back and forth error correction and compensation (BFECC) may be used. BFECC uses the position determined from the Semi-Lagrangian advection, and adds the velocity at the corresponding coordinates to get a new position in the current frame. If there is no error, the coordinates should be the same as the original position (x, y). Otherwise, by subtracting half of the error from (x-vx, y-vy), assuming that the speed of the pixel is correct, a second-order accurate estimate of the position, accurate to half the pixel, is obtained.

도 7b는 본 발명의 실시 예에 따른 이송을 수행하는 일반적인 방법을 도시한다. 타일이 이송에 적합한지에 대한 결정(1405)이 이루어진다. 적합성은, 임의의 추가 불일치 체크들을 통과하여 보강된, 준-정적 범위 내에 속도 범위가 있는지 여부에 기반한다. 타일이 이송에 적합하면, 이전 프레임에서 대응하는 픽셀 위치들의 블록에서 결정(1410)이 이루어진다. 이전 프레임의 타일로부터 픽셀들의 선택된 부분이 재사용(1420)된다. 나머지 픽셀들은 렌더링(1425)된다. 7B shows a general method of performing transfer according to an embodiment of the present invention. A determination 1405 is made as to whether the tile is suitable for transport. Suitability is based on whether there is a speed range within the quasi-static range, reinforced by passing any additional discrepancy checks. If the tile is suitable for transport, a decision 1410 is made in the block of corresponding pixel locations in the previous frame. The selected portion of pixels from the tile of the previous frame is reused (1420). The remaining pixels are rendered 1425.

이미지 보간 및 재구성 예시들Image interpolation and reconstruction examples

도 8은 디샘플링 상황에 대한 픽셀 컬러 값들의 이미지 보간 및 재구성의 예시를 도시한다. 일 실시 예에서, 컬러 값들의 가중된 합은 렌더링되지 않은 픽셀들을 재구성하는데 사용된다. 주어진 가중치 함수 w를 선택하면, 특정 샘플링 패턴에서 발생한 픽셀들의 각 구성에 대해 정규화된 가중치들의 세트가 미리 계산될 수 있다. 예를 들어, 4 x 4 블록에서 4개의 픽셀들이 렌더링되면, 나머지 12개의 픽셀들은 이웃한 블록들뿐만 아니라, 동일한 블록 내에서 렌더링된 픽셀들의 가중된 합을 사용하여 표현될 수 있다. 또한, 이웃한 블록들에서 가능한 픽셀 구성들의 세트는 샘플링 패턴들의 세트에 의해 제한되므로, 이 경우, 모든 가능한 가중치 세트들은 미리 계산될 수 있다. 8 shows an example of image interpolation and reconstruction of pixel color values for a desampling situation. In one embodiment, the weighted sum of color values is used to reconstruct the unrendered pixels. When a given weight function w is selected, a set of normalized weights for each configuration of pixels generated in a specific sampling pattern can be calculated in advance. For example, if 4 pixels are rendered in a 4 x 4 block, the remaining 12 pixels may be expressed using a weighted sum of pixels rendered in the same block as well as neighboring blocks. Further, since the set of possible pixel configurations in neighboring blocks is limited by the set of sampling patterns, in this case, all possible sets of weights can be calculated in advance.

전통적으로, GPU는 쌍선형 보간을 이용한다. 그러나, 쌍선형 보간은 다양한 단점들을 갖는다. 일 실시 예에서, 조각별(piece-wise) 큐빅 다항식(큐빅 스플라인이라고도 함)과 같은, 적어도 3차 이상의 고차 다항식들은 스파스(sparse) 샘플들의 효율적인 재구성을 위해 사용된다. Traditionally, GPUs use bilinear interpolation. However, bilinear interpolation has various disadvantages. In one embodiment, higher order polynomials of at least third order, such as a piece-wise cubic polynomial (also referred to as a cubic spline), are used for efficient reconstruction of sparse samples.

큐빅 스플라인과 같은, 고차 다항식들은 쌍선형 보간보다 높은 주파수의 스펙트럼을 매핑할 수 있고, 서브-샘플링된 블록들로부터 재구성된 데이터의 높은 정확도(충실도, fidelity)를 제공한다. 또한, 쌍선형 보간을 사용하면, 한 측면의 선형 보간이 부정확하고 컬러 스펙트럼 범위를 초과할 수 있으므로, 샘플들은 픽셀의 양 측면에 대해 선호된다. 대조적으로, 넓은 지원(1 픽셀 이상 떨어진)를 사용하는 고차 다항식들은 렌더링된 이미지 데이터의 고차 함수(functional, high-order function) 형태를 정확하게 근사할 가능성이 더 높다. 다양한 고차 다항식들이 사용될 수 있지만, 큐빅 스플라인은 2차 다항식(quadratic polynominals)보다 연속성(continuity) 특성들이 우수하다. 디샘플링 이전에 수행된 에지-검출 단계로 인해, 재구성을 겪는 타일은 급격한 불연속성들을 갖지 않을 것이고, 급격한 불연속성들에서 고차 다항식 재구성이 저조하게 수행될 수 있다. Higher order polynomials, such as cubic splines, can map a higher frequency spectrum than bilinear interpolation, and provide high accuracy (fidelity) of reconstructed data from sub-sampled blocks. Also, with bilinear interpolation, samples are preferred for both sides of the pixel, since the linear interpolation of one side may be inaccurate and exceed the color spectrum range. In contrast, higher order polynomials using wide support (more than one pixel apart) are more likely to accurately approximate the functional, high-order function form of the rendered image data. Although various higher-order polynomials can be used, cubic splines have better continuity properties than quadratic polynominals. Due to the edge-detection step performed prior to desampling, a tile undergoing reconstruction will not have abrupt discontinuities, and higher order polynomial reconstruction may perform poorly at abrupt discontinuities.

서브-샘플링을 수행하는 일 양상에 따르면 런타임(runtime)에서 스파스 샘플 데이터가 있다. k x k 픽셀 영역과 같은, 개별 블록 영역에서, 디샘플링은 4 x 4 픽셀들의 블록으로부터 4개 또는 8개의 픽셀들과 같은, 렌더링된 픽셀들의 서브셋(subset)을 도출할 수 있다. 누락된 픽셀 데이터는 재구성될 필요가 있다. 미리 결정된 샘플 패턴들을 갖는 것의 결과로, 가능한 샘플 위치들의 유한한 세트가 있게 된다. 이를 통해, 로컬 스텐실들의 고정된 세트가 런타임에 앞서 생성되고, 저장되고, 그리고 큐빅 스플라인들 또는 다른 고차 다항식들을 사용하여 픽셀 데이터를 재구성하는데 사용된다. 하드웨어에서 고차 다항식들을 평가하는 일반적인 방법들은 계산적으로 비용이 많이 든다. 이와 대조적으로, 본 발명의 실시 예들에서, 미리 계산된 스텐실들의 고정된 세트의 사용은, 일반적인 고차 다항식 평가를 수행하는, 런타임 동안, 계산적인 오버헤드를 제거한다. 샘플들의 정적인 세트의 사용은 재구성될 필요가 있을 수 있는 픽셀들의 가능한 구성들의 결정을 가능케 하므로, 필요한 스텐실들은 미리-계산될 수 있다. According to an aspect of performing sub-sampling, there is sparse sample data at runtime. In an individual block area, such as a k x k pixel area, desampling can derive a subset of rendered pixels, such as 4 or 8 pixels from a block of 4 x 4 pixels. The missing pixel data needs to be reconstructed. As a result of having predetermined sample patterns, there is a finite set of possible sample positions. In this way, a fixed set of local stencils is created, stored, and used to reconstruct pixel data using cubic splines or other higher order polynomials prior to runtime. Common methods of evaluating higher order polynomials in hardware are computationally expensive. In contrast, in embodiments of the present invention, the use of a fixed set of precomputed stencils eliminates computational overhead during runtime, performing general higher order polynomial evaluation. The use of a static set of samples allows the determination of possible configurations of pixels that may need to be reconstructed, so the necessary stencils can be pre-computed.

일 실시 예에서, 고차 다항식 보간은 미리-계산된 가중치들을 사용하는 정적 스텐실 동작들로서 구현된다. 일 실시 예에서, 스텐실들의 테이블은 재구성 및 이송 단계(210)의 재구성 서브모듈(211)에 대한 공간 재구성을 위해 저장되고 이용 가능하게 된다. 스텐실들의 테이블은 알려진 샘플 위치들에 기반한 가중치들을 제공한다. 일 실시 예에서, 스텐실들의 테이블은 정의된 샘플 패턴 내에서 각 픽셀 위치에 대한 미리-계산된 모든 스텐실 가중치들을 갖는다. 미리-계산된 가중치들은 정적인 스텐실 동작들을 사용하여 고차 다항식 재구성이 수행되게 할 수 있다. In one embodiment, higher order polynomial interpolation is implemented as static stencil operations using pre-computed weights. In one embodiment, the table of stencils is stored and made available for spatial reconfiguration for the reconfiguration submodule 211 in the reconstruction and transfer step 210. The table of stencils provides weights based on known sample locations. In one embodiment, the table of stencils has all pre-computed stencil weights for each pixel location within a defined sample pattern. The pre-computed weights can cause higher order polynomial reconstruction to be performed using static stencil operations.

일 실시 예에서, 런타임 동안 보간될 필요가 있을 수 있는 타일(예를 들어, 4 x 4 타일)에서 모든 가능한 픽셀 위치들에 대한 5 x 5 스텐실들의 세트가 결정된다. 각 5 x 5 스텐실은 각 픽셀 위치 및 이웃한 구성에 대해 계산된다. 각 스텐실은 가중치 값들의 리스트(list) 및 샘플 포인트들의 대응하는 위치들을 제공한다. 스텐실들은 재구성 및 이송 단계(210)의 재구성 서브모듈(211)에 대한 재구성 목적을 위해 이용 가능한 상수 메모리 테이블에 저장된다. 일 실시 예에서, 런타임에서, 보간되어야 하는 각 픽셀에 대해, 픽셀 좌표들 및 샘플링 마스크를 사용하여 인덱스가 이러한 테이블로 계산된다. 일 구현에서, 각 스텐실은 (a) 타일 내에서 픽셀의 위치 및 (b) 렌더링에 사용되는 샘플링 마스크를 사용하여 어드레스(address)된다. 따라서, 디더링이 사용되는 경우, 선택되는 스텐실은 주어진 서브샘플링(subsampling)의 정도에 대해 선택된 샘플 패턴에 의존할 것이다. In one embodiment, a set of 5 x 5 stencils for all possible pixel positions in a tile (eg, 4 x 4 tile) that may need to be interpolated during runtime is determined. Each 5 x 5 stencil is computed for each pixel location and neighboring configuration. Each stencil provides a list of weight values and corresponding locations of sample points. The stencils are stored in a constant memory table available for reconfiguration purposes for the reconfiguration submodule 211 of the reconstruction and transfer step 210. In one embodiment, at runtime, for each pixel that needs to be interpolated, an index is computed into this table using the pixel coordinates and the sampling mask. In one implementation, each stencil is addressed using (a) the location of the pixel within the tile and (b) the sampling mask used for rendering. Thus, if dithering is used, the stencil selected will depend on the selected sample pattern for a given degree of subsampling.

일 실시 예에서, 고차 다항식 보간은 가중치들 및 샘플 컬러 값들의 곱들을 축적하기 위해 승산기(multiplier)/가산기(adder)를 사용하여 수행된다. 축적된 값은 나눗셈에 의해 정규화되며, 이는 많은 경우들에서 정수 포맷(format)의 비트 시프트(이동)에 의해서, 또는 플로팅(floating) 포인트 포맷의 감산에 의해서 수행될 수 있다. 따라서, 미리-계산된 가중치들과 스텐실들의 사용은 비교적 적은 계산적인 노력으로 런타임에서 고차 다항식 보간이 계산될 수 있게 한다. In one embodiment, higher order polynomial interpolation is performed using a multiplier/adder to accumulate the products of weights and sample color values. The accumulated values are normalized by division, which in many cases can be done by bit shifting (movement) in integer format, or by subtraction of floating point format. Thus, the use of pre-computed weights and stencils allows higher order polynomial interpolation to be computed at runtime with relatively little computational effort.

알려진 픽셀 컬러 값들의 가중된 합으로서 픽셀 컬러들을 계산하고 재구성하는데 사용되는 큐빅 스플라인 함수(cubic spline function)의 예시는 다음과 같다.An example of a cubic spline function used to calculate and reconstruct pixel colors as a weighted sum of known pixel color values is as follows.

일 실시 예에서, 픽셀 컬러 값을 결정하기 위해 가중된 합을 표현하는 공식은 다음의 가중치들 w()에 기반한다. In one embodiment, the formula representing the weighted sum to determine the pixel color value is based on the following weights w().

이 때, c(i, j)는 픽셀 위치 (i, j)에서의 컬러 값이고, w()는 2차원 스플라인 함수이고, 그리고 “Filled”는 렌더링된 픽셀들의 세트이다. 2차원 스플라인 함수는 2개의 1차원 스플라인 함수들의 곱 또는 w(i, j)=k(i)k(j)이고, 여기서 1차원 스플라인 함수 k()는 큐빅 필터 공식에 기반하며, 큐빅 필터 공식은 Don P. Mitchell과 Arun N. Netravali의 논문, “Reconstruction Filters in Computer Graphics,”Computer Graphics, Volume 22, Number 4, August 1988, pp. 221-228에 다음과 같이 기재되어 있다.In this case, c(i, j) is the color value at the pixel position (i, j), w() is the two-dimensional spline function, and “Filled” is the set of rendered pixels. The 2D spline function is the product of two 1D spline functions, or w(i, j)=k(i)k(j), where the 1D spline function k() is based on the cubic filter formula, and the cubic filter formula The paper by Don P. Mitchell and Arun N. Netravali, “Reconstruction Filters in Computer Graphics,” Computer Graphics, Volume 22, Number 4, August 1988, pp. It is stated in 221-228 as follows:

Mitchell과 Netravali의 논문에서 거리들은 스케일링된 픽셀 공간에서 다음과 같이 정의된다. In the paper by Mitchell and Netravali, distances are defined in scaled pixel space as:

샘플 포인트들의 상대적인 위치를 제한함으로써, 가중치들 및 분모들은 스텐실들로 미리-계산될 수 있다. 스플라인 함수는 제한된 방식으로 정의되므로, x의 크기의 스케일링은 2개의 픽셀 지지 반경(radius)들과 같은 원하는 지지 반경으로 함수들을 확장하는데 사용될 수 있다. By limiting the relative position of the sample points, the weights and denominators can be pre-computed into stencils. Since the spline function is defined in a limited way, scaling of the size of x can be used to extend the functions to a desired support radius, such as two pixel support radii.

크기 n x n의 타일에 대해, (n/k)*(n/k)의 가능한 구성들로 k x k 정사각형을 배열하는 것이 가능하다. 4*s의 샘플링 레이트에서 s개의 정사각형들이 필요하므로, (n*n)/(k*k*s)의 샘플링 패턴들이 발생한다. For a tile of size n x n, it is possible to arrange a k x k square in possible configurations of (n/k)*(n/k). Since s squares are required at a sampling rate of 4*s, sampling patterns of (n*n)/(k*k*s) occur.

도 9는 X들이 렌더링된 샘플들을 마크하고 O가 보간 위치를 마크하는 4 x 4 타일에서의 샘플링 패턴의 예시를 도시한다. O를 중심으로 하는 5 x 5 스텐실이 사용된다. 이러한 4 x 4 타일 외부의 임의의 액세스(access)가 유효하지 않다고 가정하면, 스텐실은 4 x 4 타일 외부의 임의의 위치들에 대해 0의 가중치들을 가지며, 이는 스텐실 테이블로부터 제거된다. 왼쪽 상단 픽셀이 (0, 0)이라고 가정한 다음, 테이블 엔트리는, 적절한 가중치들(w0, w1, w2, w3) 및 정규화 요소(w)와 함께, (0, 0), (2, 0), (0, 2), (2, 2)로 필요한 위치들을 읽는다. 이후, 가중된 합은 승산-및-축적 동작을 사용하여 각 컬러 구성요소에 대해 1/w (w0*c(0, 0) + w1*c(2, 0) + w2*c(0, 2) + w3*c(2, 2))로 계산될 수 있다. 그러나, 보다 일반적으로, 재구성은 하나의 타일로 제한되지 않지만, 스텐실의 영향의 영역은 이웃하는 4 x 4 블록들로도 확장될 수 있다. 9 shows an example of a sampling pattern in a 4×4 tile where Xs mark rendered samples and O marks the interpolation position. A 5 x 5 stencil centered on O is used. Assuming that any access outside this 4x4 tile is not valid, the stencil has weights of 0 for any locations outside the 4x4 tile, which are removed from the stencil table. Assuming the top left pixel is (0, 0), then the table entry is (0, 0), (2, 0), with appropriate weights (w0, w1, w2, w3) and normalization factor (w) Read the required positions as, (0, 2), (2, 2). Then, the weighted sum is 1/w (w0*c(0, 0) + w1*c(2, 0) + w2*c(0, 2) for each color component using multiply-and-accumulate motion. ) + w3*c(2, 2)). However, more generally, the reconstruction is not limited to one tile, but the area of influence of the stencil can also be extended to neighboring 4 x 4 blocks.

5 x 5 스텐실을 가정하면, 미리-계산되어야 할 모든 24개의 값들(픽셀 자체는 컬러 값이 없으므로 중심은 항상 0이다)이 있다. 이들 중, 4 x 4 타일 당 8개의 샘플들이 사용되면 12개의 값들을 남기면서, 최대 절반이 렌더링될 수 있다. 일 실시 예에서, 각 스텐실은 0이 아닌 가중치들의 수의 4-비트 카운트를 포함하고, 하나의 청크(chunk)에 저장된 8-비트 가중치들이 이어지고, 중심에 대한 x 및 y 좌표들에 대한 3-비트 좌표 오프셋(offset)들의 2개의 청크들이 이어진다. Assuming a 5 x 5 stencil, there are all 24 values that need to be pre-calculated (the pixel itself has no color value, so the center is always 0). Of these, if 8 samples per 4 x 4 tile are used, up to half can be rendered, leaving 12 values. In one embodiment, each stencil contains a 4-bit count of the number of non-zero weights, followed by 8-bit weights stored in one chunk, and 3- for x and y coordinates for the center. Two chunks of bit coordinate offsets follow.

일 실시 예에서, 스텐실들은 샘플링 패턴들의 순서대로 저장된다. 일 실시 예에서, 동일한 샘플링 레이트에 대한 다른 샘플링 패턴들은 서로의 회전들이므로, 패턴들의 2개의 세트들이 있다. 픽셀 (i, j)에 대한 데이터를 가리키는 인덱스 리스트와 함께, 이것들은 4 x 4 타일 내에서 행 주요 순서로 저장될 수 있다. 샘플링 마스크의 회전들을 위해, 좌표들은 적절하게 변환된다. In one embodiment, the stencils are stored in order of the sampling patterns. In one embodiment, different sampling patterns for the same sampling rate are rotations of each other, so there are two sets of patterns. With a list of indexes pointing to the data for pixels (i, j), these can be stored in row major order within 4 x 4 tiles. For rotations of the sampling mask, the coordinates are converted appropriately.

도 10을 참조하면, 가능한 16개 중 8개의 샘플들이 렌더링되는 픽셀들의 4 x 4 타일의 경우가 고려된다. 이 예시에서, 스텐실들은 가중치 함수가 주어진 각각의 알려지지 않은 픽셀에 대해 정의된다. 이러한 스텐실들은 미리-정의된 스텐실들의 세트로부터 런타임에서 회수(retrieve)될 수 있다. 지지 반경이 2 픽셀인 큐빅 스텐실들의 예시적인 경우에서, 슈퍼-샘플링이 수행되지 않으면 이러한 스텐실들의 크기는 5 x 5이다. k x k 타일 영역으로의 액세스들을 본질적으로 제한해야 되면, 스텐실들은 타일 외부로 빠지는(fall) 이러한 픽셀들에 대해 0의 가중치들을 갖도록 적절히 수정될 수 있다. 셈플들의 수는 픽셀들의 수보다 적을 필요가 없음에 주목하는 것이 중요하다. 안티-앨리어싱(anti-aliasing)을 위해 슈퍼-샘플링이 필요한 지역들에서, 샘플들의 수는 픽셀들의 수를 초과(예를 들어, 16 픽셀 4 x 4 타일에 대한 32개의 샘플들)할 수 있다. 이러한 경우들에 대해 적절한 미리-계산된 스텐실들이 추가될 것이다. Referring to FIG. 10, a case of a 4 x 4 tile of pixels in which 8 samples out of 16 possible samples are rendered is considered. In this example, stencils are defined for each unknown pixel given a weight function. These stencils can be retrieved at runtime from a set of pre-defined stencils. In the exemplary case of cubic stencils with a supporting radius of 2 pixels, the size of these stencils is 5 x 5 unless super-sampling is performed. If accesses to the k x k tile region must be essentially restricted, the stencils can be appropriately modified to have weights of zero for those pixels that fall out of the tile. It is important to note that the number of samples need not be less than the number of pixels. In areas where super-sampling is required for anti-aliasing, the number of samples may exceed the number of pixels (eg, 32 samples for a 16 pixel 4 x 4 tile). Appropriate pre-computed stencils will be added for these cases.

일 실시 예에서, 각 샘플링 패턴은 스파스 정사각형 패턴들(예를 들어, 정사각형 패턴으로 렌더링될 4개의 샘플들)의 조합으로 정의된다. 정사각형 패턴들을 선택하는 것은 4개의 픽셀들(쿼드(quad))의 그룹이 디폴트(default) 처리 유닛인 애플리케이션들에서 유용하다. 그러나, 보다 일반적으로, 샘플링 위치들의 다른 배열들이 샘플링 패턴들에서 사용될 수 있다. 일 실시 예에서, 샘플 패턴들은 4 x 4 타일들 내에서 3 x 3 크기의 정사각형들이다. 따라서, 인접한 버텍스들은 각 축을 따라 2 픽셀 떨어져 있다. In one embodiment, each sampling pattern is defined as a combination of sparse square patterns (eg, four samples to be rendered as a square pattern). Choosing square patterns is useful in applications where a group of 4 pixels (quad) is the default processing unit. However, more generally, other arrangements of sampling locations may be used in the sampling patterns. In one embodiment, the sample patterns are squares of size 3 x 3 in 4 x 4 tiles. Thus, adjacent vertices are 2 pixels apart along each axis.

일 실시 예에서, 주어진 샘플링 레이트에서 서브-샘플링되는 개별 프레임의 모든 영역들에 대해 동일한 샘플링 패턴이 사용된다. 이러한 실시 예에서, 주어진 샘플링 레이트에서 서브-샘플링된 모든 타일들에서 동일한 샘플링 패턴이 사용되고, 이는 모든 프레임에서 샘플 위치들의 간격을 일정하게 유지하여 재구성 루틴(routine)을 단순화하기 때문이다.In one embodiment, the same sampling pattern is used for all regions of an individual frame that are sub-sampled at a given sampling rate. In this embodiment, the same sampling pattern is used in all tiles sub-sampled at a given sampling rate, because the spacing of the sample positions in every frame is kept constant to simplify the reconstruction routine.

일 실시 예에서, 샘플링 패턴들은 SIMD(single instruction multiple data) 처리 유닛들을 활용하기 위해 쿼드(quad)들에 기반한다. 샘플들의 일관된 간격은 강력한 보간을 제공하고 최종 이미지에서 풀 픽셀 해상도를 달성하는데 도움이 된다. In one embodiment, the sampling patterns are based on quads to utilize single instruction multiple data (SIMD) processing units. The consistent spacing of the samples provides robust interpolation and helps to achieve full pixel resolution in the final image.

도 11은 본 발명의 실시 예에 따른 적응형 디샘플링 및 스플라인 보간의 일반적인 방법을 도시한다. 타일의 속도 범위가 서브-샘플링을 위한 속도 범위 내에 있는지 여부의 결정 및 에지들의 존재에 대한 체크(1505)가 이루어진다. 서브-샘플링 레이트 및 선택된 샘플 패턴의 결정(1510)이 이루어진다. 타일의 픽셀들은 샘플링 패턴을 기반으로 쉐이드(1515)된다. 스플라인 보간이 수행될 수 있는 누락된 픽셀 값들을 보간하기 위해 재구성이 수행(1520)된다. 11 shows a general method of adaptive desampling and spline interpolation according to an embodiment of the present invention. A determination is made 1505 for the presence of edges and a determination of whether the speed range of the tile is within the speed range for sub-sampling. A determination 1510 of the sub-sampling rate and the selected sample pattern is made. Pixels of the tile are shaded 1515 based on the sampling pattern. Reconstruction is performed 1520 to interpolate the missing pixel values for which spline interpolation may be performed.

도 12는 본 발명의 실시 예에 따른 큐빅 스플라인 보간을 수행하는 방법을 도시한다. 타일이 스파스 샘플링을 위해 선택(1705)된다. 샘플 패턴이 선택(1710)된다. 픽셀들이 샘플링된 위치들에 대해 렌더링(1715)된다. 미리-계산된 가중치들을 기반으로 큐빅 스플라인 보간을 통해 누락된 픽셀 데이터의 재구성이 수행(1720)된다.12 illustrates a method of performing cubic spline interpolation according to an embodiment of the present invention. Tiles are selected for sparse sampling (1705). A sample pattern is selected (1710). Pixels are rendered 1715 for the sampled locations. Reconstruction of missing pixel data is performed through cubic spline interpolation based on the pre-computed weights (1720).

도 13은 본 발명의 실시 예에 따른 미리-계산된 가중치들을 포함하는 스텐실들을 이용하는 방법을 도시한다. 미리-계산된 가중치들이 샘플 패턴에서의 각 누락된 픽셀 위치에 대해 생성(1805)된다. 미리-계산된 가중치들을 포함하는 스텐실이 저장(1810)된다. 저장된 스텐실이 런타임 동안 액세스(1815)된다. 액세스된 스텐실이 큐빅 스플라인 보간을 수행하기 위해 사용(1820)된다. 13 illustrates a method of using stencils including pre-computed weights according to an embodiment of the present invention. Pre-computed weights are generated (1805) for each missing pixel location in the sample pattern. A stencil containing pre-computed weights is stored 1810. The stored stencil is accessed 1815 during runtime. The accessed stencil is used 1820 to perform cubic spline interpolation.

이송 및 재구성의 예시 비교Comparison of examples of transfer and reconstruction

도 14는 큐빅 스플라인들을 통해 이송 및 재구성의 양상들의 예시를 도시한다. 타일 크기는 4 x 4 타일 크기이다. 이전 프레임에서의 픽셀 패턴은 바둑판 패턴(checkerboard pattern)이다. 렌더링된 픽셀 값들은 R로 표시된다. 왼쪽의 예시에서, 이송은 4 x 4 타일에서 이전 프레임으로부터 픽셀 컬러 데이터의 절반을 재사용하기 위해 수행된다. 타일과 연관된 속도는 매우 느리고, 픽셀들의 절반은 이전 프레임의 픽셀 값들로부터 그것들을 복사함으로써 이송된다. 이전 프레임에서 동일한 타일로부터 픽셀 데이터의 재사용을 가리키기 위해 4개의 픽셀들에 대한 화살표들이 도시된다. 이 경우, 컬러 정보가 컬러의 번짐(bleeding) 없이 복사된다. 오른쪽의 예시에서, 프레임 당 절반-픽셀 변위(displacement)에 대응하는 상당한 타일 속도가 있다. 이 예시에서 재구성은 큐빅 스플라인 보간을 기반으로 수행된다. x0.5 픽셀들에 따른 속도는 각각의 렌더링된 픽셀이 흑색 및 백색 사이의 정확한 회색을 갖도록 한다. 따라서, 재구성된 픽셀들은 동일한 값을 갖는다. 즉, 컬러 값들은 정확하고, 풀 해상도 렌더링은 또한 동일한 값들을 생성한다. 14 shows an example of aspects of transport and reconstruction through cubic splines. The tile size is 4 x 4 tile size. The pixel pattern in the previous frame is a checkerboard pattern. The rendered pixel values are denoted by R. In the example on the left, the transfer is performed to reuse half of the pixel color data from the previous frame in a 4 x 4 tile. The speed associated with the tile is very slow, and half of the pixels are transferred by copying them from the pixel values of the previous frame. Arrows for four pixels are shown to indicate reuse of pixel data from the same tile in the previous frame. In this case, color information is copied without color bleeding. In the example on the right, there is a significant tile rate that corresponds to half-pixel displacement per frame. In this example, the reconstruction is performed based on cubic spline interpolation. The speed according to x0.5 pixels ensures that each rendered pixel has an exact gray color between black and white. Therefore, the reconstructed pixels have the same value. That is, the color values are correct, and full resolution rendering also produces the same values.

자동적인 타일-대-타일 적응형 렌더링 예시Example of automatic tile-to-tile adaptive rendering

도 15a는 픽셀 속도가 다른 영역들과 다르고 일부 영역들이 컬러 에지들을 포함하는 영역들을 갖는 장면에서의 프레임의 예시를 도시한다. 예시로서, 장면은 오토바이 위의 라이더뿐만 아니라 바람에 느리게 움직이는 식물(초목)들과 같은 정적인 객체들 및 준-정적인 객체들을 포함한다. 따라서, 다른 속도 체제들로 분류될 수 있는 영역들이 있다. 결과적으로, 도 15b의 박스들에 의해 표시된 바와 같이, 장면의 다른 영역들은 다른 픽셀 속도들을 가지며, 일부 영역들은 적응형 렌더링에 대한 다른 기회들을 제공한다. 그 결과, 개별 프레임에서 시스템은 자동적으로 개별 타일들을 분석하고, 타일-대-타일 기반으로 디샘플링하고 이송을 수행할지, 디샘플링하고 큐빅 스플라인 보간을 수행할지, 또는 일반 디폴트 샘플링 스킴(scheme)을 이용할지 여부를 결정한다. 타일 기반으로 슈퍼-샘플링을 수행할지 여부에 대한 개별적인 결정들도 이루어질 수 있다. 시스템이 이러한 최적화를 자동적으로 수행하므로, 상대적인 파라미터 값들이 각각 정의되었다고 가정할 때, 애플리케이션 개발자로부터 특별한 입력들이 요구되지 않는다. 15A shows an example of a frame in a scene where the pixel rate is different from other areas and some areas have areas including color edges. By way of example, the scene includes static objects such as plants (vegetation) slowly moving in the wind as well as riders on a motorcycle and semi-static objects. Thus, there are areas that can be classified into different speed regimes. As a result, as indicated by the boxes in Fig. 15B, different regions of the scene have different pixel velocities, and some regions provide different opportunities for adaptive rendering. As a result, in individual frames, the system automatically analyzes individual tiles, desampling and transports on a tile-to-tile basis, whether to perform desampling and cubic spline interpolation, or a general default sampling scheme. Decide whether to use it. Individual decisions can also be made as to whether or not to perform super-sampling on a tile-based basis. Since the system automatically performs this optimization, no special inputs from the application developer are required, assuming that the relative parameter values are each defined.

스테레오스코픽(stereoscopic) 렌더링 예시Stereoscopic rendering example

본 발명의 실시 예는 단일(스테레오스코픽이 아닌) 디스플레이를 생성하는데 사용될 수 있다. 그러나, 가상 현실 애플리케이션에 대해서도 스테레오스코픽 렌더링이 적용될 수 있다. 도 16을 참조하면, 좌안 이미지 및 우안 이미지에 대응하는 각각의 눈에 대해 별개의 이미지들이 생성되는 경우를 고려한다. 이송은 스테레오스코픽 렌더링의 효율을 향상시키는데 사용될 수 있다. 일 실시 예에서, 좌측 이미지가 생성된다. 트렌스레이션(translation) 모션(motiontrans)은 좌안 이미지의 일부를 우안 이미지로 변환하는 트렌스레이션으로 정의된다. 일 실시 예에서, 샘플 생성기(의사) 결정은 왼쪽 이미지로부터 픽셀 값들의 이송을 시도하기 위해 오른쪽 이미지에 대한 샘플링(의사) 결정을 증강한다. 일 실시 예에서, 샘플링은 Z-기반이고, 좌측 이미지 및 우측 이미지의 최소 Z가 임계 Z보다 큰지 여부에 대한 테스트가 수행된다. 좌측 이미지 및 우측 이미지의 최소 Z가 임계 Z보다 크면 (min(Zleft, Zright) > Zthresh), 픽셀들은 트렌스레이션 모션(motiontrans)을 사용하여 왼쪽 프레임으로부터 오른쪽으로 이송된다. 그렇지 않으면, 렌더링은 모션 기반의 샘플링 레이트에 기반한다. 도 16에서 도시된 바와 같이, 이것은 우안 이미지가 좌안 이미지로부터 이송된 픽셀들 및 렌더링된 픽셀들의 조합이 되게 한다. Embodiments of the present invention may be used to create a single (non-stereoscopic) display. However, stereoscopic rendering can also be applied to virtual reality applications. Referring to FIG. 16, a case in which separate images are generated for each eye corresponding to a left-eye image and a right-eye image is considered. Transfer can be used to improve the efficiency of stereoscopic rendering. In one embodiment, the left image is created. Translation Motiontrans is defined as a translation that converts a part of a left-eye image into a right-eye image. In one embodiment, the sample generator (pseudo) decision augments the sampling (pseudo) decision for the right image to attempt to transfer pixel values from the left image. In one embodiment, the sampling is Z-based, and a test is performed as to whether the minimum Z of the left image and the right image is greater than the threshold Z. If the minimum Z of the left image and the right image is greater than the threshold Z (min(Zleft, Zright)> Zthresh), then pixels are transferred from the left frame to the right using a translation motion (motiontrans). Otherwise, rendering is based on a motion based sampling rate. As shown in Fig. 16, this causes the right eye image to be a combination of transferred pixels and rendered pixels from the left eye image.

적응형 렌더링을 사용한 포비티드(foveated) 렌더링Foveated rendering with adaptive rendering

도 17은 적응형 렌더링이 포비티드 렌더링에 적용된 실시 예를 도시한다. 눈의 인간 망막의 구조는 건강한 인간의 눈에서 가장 높은 시력(visual acuity)을 제공하는 포비아(fovea) 부분을 갖는다. 건강한 인간의 눈의 가장 높은 시력은 작은 각도의 원뿔(cone) 내에 있으며 각도 거리가 증가함에 따라 떨어진다. 포비티드 렌더링은 사용자가 바라보는 곳 근처에서 높은 디테일을 렌더링하고 포커스 포인트(focus point)에서 멀어 질수록 디테일을 낮춘다. 도 17은 초점(x, y)(1725)을 도시한다. 샘플링 레이트는 초점(focal point)으로부터 반경 거리가 증가함에 따라 감소한다(예를 들어, 1/(초점으로부터의 거리)). 감소는 특정 반경 거리에서 단계적인 방식으로 수행될 수 있다. 예를 들어, 특정 개수의 샘플들은 원형 영역(1720)에서 반경 거리(r0)(1715)까지 렌더링될 수 있다. r0부터 r1(1705)까지의 환형 영역(1710)에서 더 적은 수의 샘플들이 렌더링된다. r1보다 큰 반경 거리를 갖는 영역에서 훨씬 더 적은 수의 샘플들이 렌더링된다. 예시적인 실시 예로서, 16개의 샘플들이 (x, y) 및 r0 사이의 영역에서 렌더링되고, 8개의 샘플들이 r0 및 r1 사이의 영역에서 렌더링되고, 4개의 샘플들이 r1 이후의 영역에 있을 수 있다. 보다 일반적으로, 다른 반경적으로 변하는 샘플링 함수들이 사용될 수 있다. 17 illustrates an embodiment in which adaptive rendering is applied to fobited rendering. The structure of the human retina of the eye has a portion of the fovea that provides the highest visual acuity in a healthy human eye. The highest visual acuity of a healthy human eye is within a small angular cone and falls with increasing angular distance. Fobited rendering renders high detail near where the user is looking and lowers the detail as it moves away from the focus point. 17 shows the focal point (x, y) 1725. The sampling rate decreases as the radial distance from the focal point increases (eg 1/(distance from focus)). The reduction can be done in a stepwise manner at a certain radial distance. For example, a specific number of samples may be rendered from a circular area 1720 to a radius distance r0 1715. Fewer samples are rendered in the annular region 1710 from r0 to r1 (1705). A much smaller number of samples are rendered in an area with a radial distance greater than r1. As an exemplary embodiment, 16 samples may be rendered in the region between (x, y) and r0, 8 samples may be rendered in the region between r0 and r1, and 4 samples may be in the region after r1. . More generally, other radially varying sampling functions may be used.

본 발명은 특정 실시 예들과 관련하여 설명되었지만, 본 발명은 설명된 실시 예들로 제한되지 않는 것으로 이해될 것이다. 반대로, 이것은 후술되는 청구 범위에 의해 정의된 본 발명의 기술적 사상에 포함됨에 따라, 대안들, 수정들, 및 균등물들을 포함하는 것으로 의도된다. 본 발명은 이러한 특정 실시 예들의 전부 또는 일부 없이도 실시될 수 있다. 또한, 불필요하게 본 발명을 불명료하게 하지 않기 위해서 주지된 특징들은 상세하게 설명되지 않을 수 있다. 본 발명의 실시 예에 따르면, 구성 요소들, 프로세스 단계들, 및/또는 데이터 구조들은 운영 체제들, 프로그래밍 언어들, 컴퓨팅 플랫폼, 컴퓨터 프로그램들, 및/또는 컴퓨팅 머신들의 다양한 타입들을 사용하여 구현될 수 있다. 또한, 본 발명이 속한 기술 분야에서의 통상의 기술자는 하드웨어 장치들, 필드 프로그래머블 게이트 어레이(FPGA; field programmable gate array)들, 주문형 반도체(ASIC; application specific integrated circuit)들, 또는 이와 같은 장치들을 인식할 것이고, 또한 이러한 장치들은 본 명세서에서 개시된 기술적 사상의 범위를 벗어나지 않고 사용될 수 있다. 또한, 본 발명은 메모리 장치와 같은 컴퓨터 판독 가능 매체 상에 저장된 컴퓨터 명령어들의 세트로서 명백하게 구현될 수 있다. While the present invention has been described in connection with specific embodiments, it will be understood that the invention is not limited to the described embodiments. On the contrary, this is intended to include alternatives, modifications, and equivalents, as included in the technical spirit of the present invention as defined by the following claims. The present invention may be practiced without all or part of these specific embodiments. In addition, well-known features may not be described in detail in order not to unnecessarily obscure the present invention. According to an embodiment of the present invention, components, process steps, and/or data structures may be implemented using various types of operating systems, programming languages, computing platforms, computer programs, and/or computing machines. I can. In addition, those of ordinary skill in the art to which the present invention pertains recognizes hardware devices, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), or such devices. In addition, these devices can be used without departing from the scope of the technical idea disclosed in the present specification. Further, the present invention can be explicitly implemented as a set of computer instructions stored on a computer readable medium such as a memory device.

인트라-프레임(intra-frame) 및 인터-프레임(inter-frame) 정보를 사용하는 적응형 렌더링 예시Example of adaptive rendering using intra-frame and inter-frame information

픽셀 쉐이딩(PS; pixel shading)은 디스플레이에서 이미지의 각 픽셀의 컬러를 계산하는 것으로 정의될 수 있다. 예를 들어, 컴퓨터 그래픽에서, 픽셀 쉐이더(또는 프래그먼트 쉐이더(fragment shader))는 단일 픽셀의 컬러, 밝기, 대비, 및/또는 다른 특징들을 검출하는데 도움이 되는 컴퓨터 프로그램일 수 있다. 본 개시의 예시적인 실시 예들에서, “픽셀 샘플링” 및 “픽셀 쉐이딩”은 동일한 동작을 의미할 수 있고 상호 교환 가능하게 사용될 수 있다. Pixel shading (PS) may be defined as calculating the color of each pixel of an image in a display. For example, in computer graphics, a pixel shader (or fragment shader) may be a computer program that helps to detect the color, brightness, contrast, and/or other characteristics of a single pixel. In exemplary embodiments of the present disclosure, “pixel sampling” and “pixel shading” may mean the same operation and may be used interchangeably.

픽셀 쉐이딩 동작들을 수행하는 GPU(graphics processing unit)들은 그것들의 에너지 소비에 따라 점점 제한되고 있다. 도 1과 관련하여 논의된 바와 같이, 픽셀 쉐이딩은 픽셀 쉐이딩 동작을 위해 수행된 연산 동안 소비되는 에너지에 대해 고가의 동작일 수 있다. 실시간 렌더링 애플리케이션들에서 쉐이딩 비용을 제한하는 것은 어려울 수 있으므로, 렌더링 동안 픽셀 쉐이딩(PS) 호출(invocation)들의 수를 감소시키는 것이 바람직할 수 있다. 일부 GPU들은 픽셀 당 적어도 1개의 PS 호출을 가질 수 있고, 이는 GPU의 에너지 소비 및 성능에 관하여 낭비적일 수 있다. 예를 들어, 화면의 각 픽셀에 대해, 픽셀 쉐이더들은 픽셀의 컬러를 계산하기 위해 호출될 수 있다. Graphics processing units (GPUs) that perform pixel shading operations are increasingly limited depending on their energy consumption. As discussed with respect to FIG. 1, pixel shading may be an expensive operation with respect to energy consumed during operations performed for pixel shading operations. Limiting shading cost in real-time rendering applications can be difficult, so it may be desirable to reduce the number of pixel shading (PS) invocations during rendering. Some GPUs may have at least 1 PS call per pixel, which may be wasteful with regard to the GPU's energy consumption and performance. For example, for each pixel on the screen, pixel shaders can be called to calculate the color of the pixel.

쉐이딩 레이트는 픽셀 쉐이더들이 호출되는 해상도를 지칭할 수 있다. 일부 경우들에서, 쉐이딩 레이트는 전체 화면 해상도와 다를 수 있다. 높은 쉐이딩 레이트는 더 높은 시각적 정확도를 제공할 수 있지만, GPU와 연관된 비용(예를 들어, 처리 전력, 에너지 소비, 등)을 증가시킬 수 있다. 반면에, 낮은 쉐이딩 레이트는 낮은 GPU 비용에서 낮은 시각적 정확도를 제공할 수 있다. 렌더링 동안, 쉐이딩 레이트가 설정되면, 해당 쉐이딩 레이트는 프레임 내의 모든 픽셀들에 적용된다. 그러나, 프레임 내의 모든 픽셀들은 동일한 레벨의 시각적 정확도를 요구하지 않을 수 있다(예를 들어, 이전 프레임과 비교하여 현재 프레임에서 크게 변하지 않은 픽셀들).The shading rate may refer to the resolution at which pixel shaders are called. In some cases, the shading rate may be different from the full screen resolution. A high shading rate can provide higher visual accuracy, but can increase the cost associated with the GPU (eg, processing power, energy consumption, etc.). On the other hand, a low shading rate can provide low visual accuracy at low GPU cost. During rendering, if a shading rate is set, the shading rate is applied to all pixels in the frame. However, all pixels in a frame may not require the same level of visual accuracy (eg, pixels that have not changed significantly in the current frame compared to the previous frame).

그러나, 일부 GPU들은 애플리케이션들이 쉐이딩 레이트를 감소시키게 하는 추가 기능(예를 들어, 도 4와 관련하여 설명된 샘플링 또는 쉐이딩 레이트 감소)을 가질 수 있다. 일부 경우들에서, 관련 기술의 GPU들은 또한 GPU에서 실행되는 애플리케이션들(예를 들어, 렌더링 애플리케이션들)이 쉐이딩 레이트를 제공하게 하거나, 방법들을 정의할 수 있게 하고, 방법들에 의해 GPU(예를 들어, GPU의 소프트웨어(SW) 계층)는 픽셀 쉐이딩에 사용되는 쉐이딩 레이트를 계산할 수 있다. 예를 들어, 일부 관련 기술의 방법들은 시각적 품질에 영향을 미치지 않는 프레임의 영역들에서 쉐이딩 레이트를 선택적으로 감소시킬 수 있으므로, 추가적인 성능 향상이 달성될 수 있고, 결과적으로 GPU 비용을 감소시킬 수 있다. 그러나, 애플리케이션이 쉐이딩 레이트를 제공하게 하거나 쉐이딩 레이트를 계산하는 방법들을 정의하는 이러한 관련 기술의 방법들은 자동적이거나 명료(transparent)하지 않을 수 있고, GPU에서 실행되는 애플리케이션(예를 들어, 렌더링 애플리케이션)이 명시적으로 그것들을 선택하는 것이 요구될 수 있다.However, some GPUs may have additional functionality that allows applications to reduce the shading rate (eg, the sampling or shading rate reduction described in connection with FIG. 4 ). In some cases, GPUs of the related technology also allow applications running on the GPU (e.g. rendering applications) to provide a shading rate, or to define methods, and by means of methods to a GPU (e.g. For example, the software (SW) layer of the GPU) may calculate a shading rate used for pixel shading. For example, some methods of related technology can selectively reduce the shading rate in areas of the frame that do not affect the visual quality, so that additional performance enhancement can be achieved, and as a result, GPU cost can be reduced. . However, the methods of these related technologies that allow the application to provide a shading rate or define methods to calculate the shading rate may not be automatic or transparent, and the application running on the GPU (e.g., rendering application) You may be required to select them explicitly.

픽셀 쉐이딩 호출들의 이상적인 수는 이미지 신호의 나이퀴스트(Nyquist) 샘플링으로 결정될 수 있다. 예를 들어, 이미지 신호의 나이퀴스트 샘플링에 따르면, 이미지를 정확하게 재구성하기 위한 샘플들의 수는 이미지 신호에서 최대 주파수의 적어도 2배 이상이어야 한다. The ideal number of pixel shading calls can be determined by Nyquist sampling of the image signal. For example, according to the Nyquist sampling of the image signal, the number of samples for accurately reconstructing the image must be at least twice the maximum frequency in the image signal.

본 개시의 일부 실시 예들은 렌더링에서 나이퀴스트 최적성에 접근하기 위해 요구될 수 있는 GPU에 하드웨어 및 소프트웨어 추가를 제공할 수 있다. 또한, 본 개시의 일부 실시 예들은 이미지의 쉐이딩 레이트를 제어하는 방법들뿐만 아니라, 최종 이미지 품질을 유지하면서 쉐이딩 비용을 최소화하기 위해 쉐이딩 레이트를 자동적으로 제어하는 휴리스틱들 및 메커니즘들을 제공할 수 있다. Some embodiments of the present disclosure may provide hardware and software additions to the GPU that may be required to approach Nyquist optimality in rendering. In addition, some embodiments of the present disclosure may provide not only methods of controlling the shading rate of an image, but also heuristics and mechanisms for automatically controlling the shading rate to minimize shading costs while maintaining final image quality.

본 개시의 일부 실시 예들은 휴리스틱들의 세트들을 이용할 수 있고, 휴리스틱들의 세트들은 다양한 쉐이딩 연산들에 대한 원하는 쉐이딩 레이트를 결정하기 위해 프레임 내에서 역방향 및/또는 순방향 에러 분석을 사용할 수 있는 소프트웨어(SW) 및/또는 하드웨어(HW)에서 실행될 수 있다. 예를 들어, 모든 샘플들을 쉐이딩할 필요가 없는 시나리오들을 결정하기 위해 역방향 오류 분석 접근법이 사용될 수 있고, 예를 들어, 쉐이딩 요구 사항에서의 감소는, 예를 들어, DoF(Depth-of-Field), 모션 블러, 카메라 블러, 또는 렌더링 파이프라인에서 다른 유사한 품질 감소 단계들에서 비롯된다. 예를 들어, DoF(Depth-of-Field)는 이미지 품질을 감소시킬 수 있으므로, 유사한 품질을 달성하기 위해 쉐이딩된 샘플들이 적게 필요할 수 있다. 이러한 경우, 드라이버-컴파일러 상호 작용의 조합은 쉐이딩 레이트를 감소시키는데 사용될 수 있다. Some embodiments of the present disclosure may use sets of heuristics, and the sets of heuristics are software (SW) that may use reverse and/or forward error analysis within a frame to determine a desired shading rate for various shading operations. And/or hardware (HW). For example, a reverse error analysis approach can be used to determine scenarios where it is not necessary to shade all samples, e.g., a reduction in shading requirements, e.g., Depth-of-Field (DoF) , Motion blur, camera blur, or other similar quality reduction steps in the rendering pipeline. For example, Depth-of-Field (DoF) can reduce image quality, so fewer shaded samples may be required to achieve similar quality. In this case, a combination of driver-compiler interaction can be used to reduce the shading rate.

예를 들어, 도 7a와 관련하여 논의된 바와 같이, 일 실시 예에서, 전후 에러 정정 보상(BFECC)은 세미-라그랑지안 이송(Semi-Lagrangian advection)에서 결정된 위치를 이용할 수 있고 현재 프레임에서 새로운 위치를 획득하기 위해 해당 좌표에서의 속도를 추가할 수 있다. 에러가 없으면, 해당 좌표는 원래 위치 (x, y)와 같아야 한다. 그렇지 않으면, (x-vx, y-vy)로부터 해당 오차의 절반을 감산하여, 속도가 정확(예를 들어, 픽셀이 정확)하다고 가정할 때, 픽셀의 절반까지 정확한, 위치의 2차 정확한 추정값이 획득될 수 있다.For example, as discussed in connection with FIG. 7A, in one embodiment, the front and rear error correction compensation (BFECC) may use a position determined in a Semi-Lagrangian advection and a new position in the current frame. You can add the velocity at that coordinate to get it. If there is no error, the coordinates should be the same as the original position (x, y). Otherwise, subtract half of that error from (x-vx, y-vy), assuming the velocity is correct (e.g., the pixel is correct), a second-order accurate estimate of the position, accurate to half the pixel. Can be obtained.

예를 들어, 본 개시의 일부 실시 예들은, 현재 또는 이전 프레임에서의 임의의 이미지(예를 들어, 도 2, 도 4, 도 7a, 도 7b, 및 도 14와 관련하여 논의된 실시 예들과 유사)로부터, 픽셀(또는 렌더링된) 데이터의 일부 또는 전부를 재사용(또는 재투사)하기 위해 휴리스틱들의 세트를 이용할 수 있다. 예를 들어, 본 개시의 일부 실시 예들은, 프레임 내에서, 렌더링될 이미지가 다른 이미지의 하나 이상의 뷰포트(viewport)와 상당히(또는 충분히) 유사한 하나 이상의 뷰포트를 이용하는 것을 검출할 수 있거나, 이미지 내에서, 하나 이상의 뷰포트가 서로 유사한 것을 검출할 수 있거나, 또는 카메라로부터 특정 거리를 넘어서면, 렌더링된 결과들은 이러한 유사한 뷰포트들과 충분히 동일할 것임을 확립할 수 있다(예를 들어, 유사한 실시 예들은 도 16 내지 17과 관련되어 논의됨). 또한, 본 개시의 일부 실시 예들은, 뷰포트들의 유사성의 결정에 기반한 현재 또는 이전 프레임에서의 임의의 이미지로부터, 픽셀(또는 렌더링된) 데이터의 일부 또는 전부를 재사용(또는 재투사)할 수 있다. For example, some embodiments of the present disclosure are similar to any image in the current or previous frame (e.g., similar to the embodiments discussed with respect to FIGS. 2, 4, 7A, 7B, and 14). ), a set of heuristics can be used to reuse (or reproject) some or all of the pixel (or rendered) data. For example, some embodiments of the present disclosure may detect, within a frame, that an image to be rendered uses one or more viewports that are substantially (or sufficiently) similar to one or more viewports of another image, or within an image. , One or more of the viewports may detect that they are similar to each other, or that if they exceed a certain distance from the camera, it may be established that the rendered results will be sufficiently identical to these similar viewports (e.g., similar embodiments are shown in FIG. To 17). Further, some embodiments of the present disclosure may reuse (or reproject) some or all of the pixel (or rendered) data from any image in the current or previous frame based on the determination of similarity of viewports.

본 개시의 일부 실시 예들은, API 특징들을 위한 하드웨어를 이용하는 하드웨어 및 소프트웨어 분석에 의해 쉐이딩 샘플들의 수를 집합적으로 감소시키기 위해 현재 또는 이전 프레임에서의 임의의 이미지로부터, 픽셀 데이터의 일부를 재사용하고 다양한 쉐이딩 연산들을 위한 원하는 쉐이딩 레이트를 결정하기 위해 프레임 내에서 역방향 및/또는 순방향 에러 분석을 결합할 수 있다. 예를 들어, 본 개시의 일부 실시 예들은 현재 프레임의 렌더링을 돕는 하나 이상의 이전 프레임으로부터 중간 렌더링된 결과들을 재사용하기 위해 드라이버 자원 추적과 함께 애플리케이션 보조를 사용하여 프레임들에 걸쳐 객체들을 추적할 수 있다. 더욱이, 본 개시의 일부 실시 예들은, 프레임들에 걸쳐, 특정 이미지들 내의 특정 영역들이 최종 표시된 이미지(예를 들어, 특정 이미지 또한 포함해서)에 기여하고, 프레임들에 걸쳐 변경되지 않으며, 그리고 오래된 렌더링된 사본들로부터 재사용(또는 재투사)될 수 있음을 검출할 수 있다. Some embodiments of the present disclosure reuse a portion of pixel data from any image in the current or previous frame to collectively reduce the number of shading samples by hardware and software analysis using hardware for API features. Reverse and/or forward error analysis can be combined within a frame to determine the desired shading rate for various shading operations. For example, some embodiments of the present disclosure may track objects across frames using application assistance with driver resource tracking to reuse intermediate rendered results from one or more previous frames that aid in rendering of the current frame. . Moreover, some embodiments of the present disclosure, across frames, in which certain regions within certain images contribute to the final displayed image (eg, also including certain images), do not change across frames, and are out of date. It can be detected that it can be reused (or reprojected) from the rendered copies.

세밀한(fine-grained) 쉐이딩 레이트 제어는, 예를 들어, 본 개시의 도 3, 4, 및 5의 예시적인 실시 예들과 관련하여 논의되었다. 도 18의 예시적인 실시 예는 렌더링 비용을 더욱 감소시키기 위한 추가적인 휴리스틱과 함께 전술된 감소된 샘플링 레이트를 선택하는 예시적인 방법들의 개념 및 기능(예를 들어, 도 3, 4, 5, 11, 및 17과 관련하여, 예를 들어, 속도 및 에지 검출을 기반으로 감소된 샘플링 레이트를 선택하는 것과, 일부 경우들(예를 들어, 도 17)에서, 초점으로부터 반경 거리를 증가시키는 것)을 실습한다. 또한, 도 18의 예시적인 실시 예(예를 들어, 쉐이딩 레이트 이미지를 생성하는 예시적인 방법)는 API 확장들로 코드화될 수 있고, 이는 임의의, 예를 들어, 렌더링된 4 x 4 픽셀들의 블록에 대해 쉐이딩된 픽셀들의 수를 제어하기 위해 임의의 이미지에 부착될 수 있다. Fine-grained shading rate control has been discussed, for example, in connection with the exemplary embodiments of FIGS. 3, 4, and 5 of the present disclosure. The exemplary embodiment of FIG. 18 shows the concept and functionality of the exemplary methods of selecting the reduced sampling rate described above with additional heuristics to further reduce the rendering cost (e.g., FIGS. 3, 4, 5, 11, and With respect to 17, for example, choosing a reduced sampling rate based on speed and edge detection, and in some cases (e.g., increasing the radial distance from the focal point, in Figure 17) are practiced. . In addition, the exemplary embodiment of FIG. 18 (e.g., an exemplary method of generating a shading rate image) may be coded with API extensions, which may be any, e.g., a block of rendered 4 x 4 pixels. It can be attached to any image to control the number of pixels shaded for.

도 18은 쉐이딩 레이트를 감소시킴으로써 GPU 파이프라인의 쉐이딩 및 고정된 함수 단계들 모두에서 수행되는 렌더링 작업을 줄이기 위해 휴리스틱들의 예시적인 세트들을 이용하는 증강된 GPU 파이프라인을 도시한다. 도 18의 예시적인 실시 예에서, 쉐이딩 레이트 이미지의 검출 및 계산은 GPU의 하드웨어 및/또는 소프트웨어 구성 요소들의 임의의 조합에 의해 수행될 수 있다. 18 shows an augmented GPU pipeline that uses exemplary sets of heuristics to reduce the rendering work performed in both the shading and fixed function steps of the GPU pipeline by reducing the shading rate. In the exemplary embodiment of FIG. 18, the detection and calculation of the shading rate image may be performed by any combination of hardware and/or software components of the GPU.

도 18의 GPU 파이프라인은, 입력 단계(1802), 쉐이딩-레이트 이미지 및 정밀 제어 지원을 갖는 GPU 파이프라인 단계(1804), 쉐이딩 감소 파이프라인 단계(1806), 쉐이딩 레이트 이미지 단계(1808), 및 출력 단계(1810)를 포함할 수 있다. The GPU pipeline of FIG. 18 includes an input stage 1802, a GPU pipeline stage 1804 with shading-rate images and fine control support, a shading reduction pipeline stage 1806, a shading rate image stage 1808, and An output step 1810 may be included.

입력 단계(1802)는 입력 렌더링 콜(call)들 또는 입력 이미지 기하(예를 들어, 입력 이미지의 3차원 좌표들, 컬러, 텍스처(texture), 등)를 GPU 파이프라인 단계(1804)에 제공할 수 있다. GPU 파이프라인 단계(1804)는 함수의 측면에서 도 2의 그래픽 파이프라인(200)과 유사할 수 있다. GPU 파이프라인 단계(1804)는 고정된 함수 기하 파이프라인 단계(1812), 기하와 관련된 쉐이더들 단계(1814), 고정된 함수 래스터화 또는 보간 파이프라인 단계(1816), 및 픽셀 또는 샘플 쉐이딩과 관련된 쉐이더들 단계(1818)를 포함할 수 있다. The input stage 1802 provides input rendering calls or input image geometry (e.g., three-dimensional coordinates of the input image, color, texture, etc.) to the GPU pipeline stage 1804. I can. The GPU pipeline step 1804 may be similar to the graphics pipeline 200 of FIG. 2 in terms of functions. The GPU pipeline stage 1804 includes a fixed function geometry pipeline stage 1812, a geometry related shaders stage 1814, a fixed function rasterization or interpolation pipeline stage 1816, and pixel or sample shading. Shaders step 1818 may be included.

입력 단계(1802)로부터의 입력 이미지 기하는 GPU 파이프라인 단계(1804)의 고정된 함수 기하 파이프라인 단계(1812)에서 수신될 수 있다. 고정된 함수 기하 파이프라인 단계(1812)로부터의 출력은 기하와 관련된 쉐이더들(예를 들어, 버텍스, 헐(hull), 도메인, 기하, 등) 단계(1814)에 대한 입력으로서 쓰일(serve) 수 있다. 단계(1814)(예를 들어, 버텍스 쉐이더(예를 들어, 도 2의 버텍스 쉐이더(105)))는 수신된 입력으로부터의 버텍스들을 처리할 수 있고, 변환, 스키닝(skinning), 모핑(morphing), 및 퍼-버텍스 라이팅(lighting)과 같은 퍼-버텍스 동작들을 수행할 수 있다. 또한, 단계(1814)는 로우-디테일 서브디비전 표면들을 GPU 상의 하이-디테일 프리미티브들로 변환시킬 수 있고, 고차 표면들을 렌더링 등에 적합한 구조들로 변환시키기 위해 테셀레이션(tessellation) 타일들을 수행(또는 브레이크-업)할 수 있다. 단계(1814)로부터의 출력은 고정된 함수 래스터화 또는 보간 파이프라인 단계(1816)에 대한 입력으로서 쓰일 수 있다. The input image geometry from the input step 1802 may be received in the fixed function geometry pipeline step 1812 of the GPU pipeline step 1804. The output from the fixed function geometry pipeline step 1812 is the number of shaders associated with the geometry (e.g., vertex, hull, domain, geometry, etc.) to serve as input to step 1814. have. Step 1814 (e.g., vertex shader (e.g., vertex shader 105 in Fig. 2)) can process vertices from the received input, transform, skinning, and morphing. Per-vertex operations such as, and per-vertex lighting may be performed. In addition, step 1814 may convert low-detail subdivision surfaces into high-detail primitives on the GPU, and perform tessellation (or break-up) tiles to convert higher-order surfaces into structures suitable for rendering, etc. I can do it. The output from step 1814 can be used as an input to a fixed function rasterization or interpolation pipeline step 1816.

래스터화 단계(1816)는 실시간 3D 그래픽을 표시할 목적으로 기하와 관련된 쉐이더들 단계(1814)로부터 수신된 벡터 정보(예를 들어, 형태들 또는 프리미티브들로 구성됨)를 래스터 이미지(예를 들어, 픽셀들로 구성됨)로 변환시킬 수 있다. 래스터화 단계(1816)는 도 2의 래스터화 단계(110)와 동일할 수 있다. 래스터화 단계(1816)로부터의 출력은 픽셀 또는 샘플 쉐이딩과 관련된 쉐이더들 단계(1818)에 대한 입력으로서 쓰일 수 있다. 단계(1818)(예를 들어, 픽셀-쉐이더)는 도 2의 프래그먼트 또는 픽셀 쉐이더 동작 단계(115)와 동일할 수 있고, 퍼-픽셀 라이팅 및 후-처리와 같은 쉐이딩 기법들(예를 들어, 리치(rich) 쉐이딩 기법들)을 가능하게 할 수 있다. 일부 실시 예들에서, 픽셀 쉐이더는 상수 변수들, 텍스처 데이터, 보간된 퍼-버텍스 값들, 및 퍼-픽셀 출력들을 생성하기 위한 다른 데이터를 조합하는 프로그램이다. 래스터라이저(rasterizer) 단계(예를 들어, 1816)는 프리미티브로 커버된 각 픽셀에 대해 픽셀 쉐이더를 한번씩 호출한다. The rasterization step 1816 converts vector information (e.g., consisting of shapes or primitives) received from the geometry-related shaders step 1814 for the purpose of displaying a real-time 3D graphic into a raster image (e.g., Consisting of pixels). The rasterization step 1816 may be the same as the rasterization step 110 of FIG. 2. The output from the rasterization step 1816 can be used as an input to the shaders step 1818 related to pixel or sample shading. Step 1818 (e.g., pixel-shader) may be the same as the fragment or pixel shader operation step 115 of FIG. 2, and shading techniques such as per-pixel lighting and post-processing (e.g., Rich shading techniques). In some embodiments, the pixel shader is a program that combines constant variables, texture data, interpolated per-vertex values, and other data to generate per-pixel outputs. The rasterizer step (e.g., 1816) calls the pixel shader once for each pixel covered by the primitive.

단계(1818)로부터의 출력은 출력 단계(1810)에 대한 입력으로 쓰일 수 있다. 출력 단계(1810)는 파이프라인 단계들(예를 들어, 1812, 1814, 1816), 픽셀 쉐이더들에 의해 생성된 픽셀 데이터(예를 들어, 1818), 렌더링 타겟들의 내용들, 및 깊이 또는 스텐실 버퍼들의 내용들의 조합을 사용하여 최종 렌더링된 픽셀 컬러를 생성할 수 있다. 출력 단계(1810)는 보이는 픽셀들을 결정(예를 들어, 깊이-스텐실 테스트와 함께)하고 최종 픽셀 컬러들을 혼합하기 위한 최종 단계일 수 있다. The output from step 1818 can be used as an input to an output step 1810. The output stage 1810 includes pipeline stages (e.g., 1812, 1814, 1816), pixel data generated by pixel shaders (e.g., 1818), the contents of the rendering targets, and a depth or stencil buffer. A combination of their contents can be used to create the final rendered pixel color. The output step 1810 may be the final step for determining the visible pixels (eg, with a depth-stencil test) and mixing the final pixel colors.

일부 실시 예들에서, 래스터화 단계(1816)는 기하 쉐이더 단계(1814)로부터 수신된 벡터 정보뿐만 아니라 쉐이딩 레이트 이미지 단계(1808)로부터 생성된 쉐이딩 레이트 이미지에 기반한 그것의 출력(예를 들어, 래스터 이미지)을 생성한다. 쉐이딩 레이트 이미지 단계(1808)는 쉐이딩 감소 파이프라인 단계(1806)로부터의 출력을 기반으로 쉐이딩 레이트 이미지를 생성할 수 있다. 일부 실시 예들에서, 쉐이딩 감소 파이프라인 단계(1806)는 API를 지원하는 임의의 GPU에 확장을 공급(feed)하기 위해 애플리케이션의 지식 없이 쉐이딩 레이트 이미지를 자동적으로 생성하는 소프트웨어 또는 하드웨어의 세트일 수 있다. In some embodiments, the rasterization step 1816 is based on the vector information received from the geometric shader step 1814 as well as its output (e.g., a raster image) based on the shading rate image generated from the shading rate image step 1808. ). The shading rate image step 1808 may generate a shading rate image based on the output from the shading reduction pipeline step 1806. In some embodiments, shading reduction pipeline stage 1806 may be a set of software or hardware that automatically generates a shading rate image without application knowledge to feed extensions to any GPU that supports the API. .

쉐이딩 감소 파이프라인 단계(1806)는 샘플링 레이트 단계(또는 휴리스틱)(1820)(예를 들어, 품질 감소 필터(DoF(depth-of-field), 모션 블러, 등)의 함수로서 요구되는 샘플링 레이트), 이전 프레임들로부터의 재사용 vs 리드로우(redraw) 분석 단계(또는 휴리스틱)(1822), 다른 뷰포트들로부터의 재사용 vs 리드로우 분석 단계(또는 휴리스틱)(1824), 및 쉐이딩 레이트 고차 함수 결정 단계(1826)를 포함한다. Shading reduction pipeline stage 1806 is a sampling rate stage (or heuristic) 1820 (e.g., the required sampling rate as a function of a quality reduction filter (depth-of-field, motion blur, etc.)). , Reuse from previous frames vs. redraw analysis step (or heuristic) 1822, reuse from other viewports vs. readrow analysis step (or heuristic) 1824, and shading rate higher order function determination step ( 1826).

샘플링 레이트 단계(1820)는 GPU 상에 렌더링된 입력 텍스처들로부터 다수의 샘플들(또는 탭(tap)들)을 평균화하는 특징을 갖는, 품질 감소 필터들(예를 들어, DoF, 모션 블러, 스무딩(smoothing) 필터들, 또는 이와 유사한 것)의 검출에 의존한다. 이러한 패턴이 검출되면, 패턴은 수용 가능한(또는 충분한) 품질을 달성하기에 바람직한(또는 필수적인) 쉐이딩 레이트를 결정하는데 이용될 수 있고, 이는 본 개시의 후반부에서 도 20과 관련하여 논의된다. Sampling rate step 1820 is characterized by averaging multiple samples (or taps) from input textures rendered on the GPU, quality reduction filters (e.g., DoF, motion blur, smoothing). (smoothing) filters, or the like). If such a pattern is detected, the pattern can be used to determine the desired (or necessary) shading rate to achieve an acceptable (or sufficient) quality, which is discussed with respect to FIG. 20 in the latter part of this disclosure.

일부 실시 예들에서, 쉐이딩 레이트 감소는 또한 렌더링된 이미지에서 기초 자산(underlying asset)의 크로스-프레임 변화들(예를 들어, 입력 이미지 프레임 및 출력 이미지 프레임 사이의 기초 자산(예를 들어, 카메라 및 뷰포트)의 변화들)을 분석함으로써 달성될 수 있다. In some embodiments, the shading rate reduction may also include cross-frame changes of the underlying asset in the rendered image (e.g., the underlying asset (e.g., camera and viewport) between the input image frame and the output image frame. ) Of changes).

일 실시 예에서, GPU는 기초 자산들, 예를 들어, 카메라 및 뷰포트(예를 들어, 장면이 관측자에 의해 어떻게 관측되는지를 변경하는 임의의 것)의 변화를 체크할 수 있고, 이전 이미지로부터의 데이터를 재사용하기 위해 재투사를 사용할 수 있다. 기하, 쉐이더들, 상태들, 또는 변환을 포함하는 임의의 다른 자산들이 변경되면, 이전 이미지 데이터를 현재 이미지로 재사용 또는 재투사하는 것이 불가능하다.In one embodiment, the GPU can check for changes in underlying assets, e.g., cameras and viewports (e.g., anything that changes how the scene is viewed by the observer), and from previous images. Reprojection can be used to reuse data. If any other assets including geometry, shaders, states, or transformations change, it is impossible to reuse or reproject previous image data into the current image.

예를 들어, 일 실시 예에서, 단계(1822)에서, 렌더링된 이미지의 영역에서 기초 자산들이 변경되지 않거나(또는 변경이 임계값 미만), 또는 변경이 선형적인 변환(예를 들어, 아핀(affine) 변환)으로 표현될 수 있으면, 이전 프레임(들)으로부터 이미지 데이터는 다시 렌더링(또는 리드로우)되는 대신에 재사용될 수 있다. 렌더링된 이미지의 이러한 영역들에서, 쉐이딩 레이트는 제로(zero)로 감소될 수 있고, 픽셀 데이터는 필요한 경우 작은 변환(또는 변형)으로 이전 프레임(들)으로부터 복사될 수 있다(예를 들어, 이전 프레임으로부터 픽셀들의 프랙션(일부)을 재사용하는 것은 도 2, 4, 7a, 및 14와 관련하여 논의됨). 일부 실시 예들에서, 재투사는 픽셀 데이터의 재사용 대신에 사용될 수 있다.For example, in one embodiment, in step 1822, the underlying assets in the area of the rendered image do not change (or the change is less than a threshold), or the change is a linear transformation (e.g., affine ) Transform), the image data from the previous frame(s) can be reused instead of being rendered (or read-rowed) again. In these regions of the rendered image, the shading rate can be reduced to zero, and the pixel data can be copied from the previous frame(s) with small transformations (or transformations) if necessary (e.g. Reusing a fraction (some) of pixels from a frame is discussed with respect to FIGS. 2, 4, 7A, and 14). In some embodiments, reprojection may be used instead of reuse of pixel data.

일부 실시 예들에서, 재투사는 재사용의 예시(예를 들어, 특수화 또는 사용 경우)일 수 있다. 일부 실시 예들에서, 재사용은 데이터가 이전 이미지로부터 현재 이미지로 복사되는 것을 암시할 수 있다. 재투사는 다른 예시적인 사용 경우들을 포함할 수 있는데, 예를 들어, 카메라가 이동하면, 픽셀들이 뒤틀릴 수 있고, 그러므로 이전 이미지로부터 픽셀 데이터를 재사용하기 위해, 이전 이미지로부터의 픽셀 데이터는 현재 이미지로 재투사될 수 있다. In some embodiments, reprojection may be an example of reuse (eg, specialization or use case). In some embodiments, reuse may imply that data is copied from the previous image to the current image. Reprojection may include other example use cases, e.g., if the camera is moved, the pixels can be distorted, so to reuse pixel data from the previous image, the pixel data from the previous image is Can be reprojected into.

일부 실시 예들에서, 이전 프레임으로부터 픽셀 데이터의 재사용은 그래픽 상태가 이전 프레임으로부터 현재 프레임으로 변경되지 않는 것을 요구할 수 있고, 그래픽 상태는 사용된 쉐이더들, 쉐이더들에 제공된 상수들, 및 프레임들에 제공된 기하를 포함한다. In some embodiments, reuse of pixel data from a previous frame may require that the graphics state does not change from the previous frame to the current frame, and the graphics state is the shaders used, constants provided to the shaders, and provided to the frames. Includes geometry.

일부 실시 예들에서, 쉐이딩 레이트 감소는 동일한 이미지 또는 동일한 프레임 내에서 다수의 유사한 뷰포트들을 검출하는 것(예를 들어, 현재 이미지 프레임의 제1 뷰포트 및 현재 이미지 프레임의 제2 뷰포트 사이의 변화가 임계값 미만)에 의해 달성될 수 있다. 예를 들어, 단계(1824)에서, GPU는, 프레임 내에서, 렌더링될 이미지가 다른 이미지의 하나 이상의 뷰포트와 상당히(또는 충분히) 유사한 하나 이상의 뷰포트를 이용하는 것을 검출할 수 있거나, 또는 이미지 내에서, 하나 이상의 뷰포트가 서로 유사한 것을 검출할 수 있거나, 또는 카메라로부터 특정 거리를 넘어서면, 렌더링된 결과들은 이러한 유사한 뷰포트들과 충분히 유사하거나 동일할 것임을 확립할 수 있다(예를 들어, 유사한 실시 예는 본 개시의 도 16과 관련되어 논의됨). 본 개시의 일부 실시 예들은, 뷰포트들의 유사성을 결정한 것을 기반으로, 현재 또는 이전 프레임에서의 임의의 이미지로부터, 픽셀 데이터의 전부 또는 일부를 재사용(또는 재투사)할 수 있다. In some embodiments, the shading rate reduction is the detection of the same image or multiple similar viewports within the same frame (e.g., the change between the first viewport of the current image frame and the second viewport of the current image frame is a threshold. Less than). For example, at step 1824, the GPU may detect, within the frame, that the image to be rendered uses one or more viewports that are substantially (or sufficiently) similar to one or more viewports of other images, or within the image, If more than one viewport can detect something similar to each other, or beyond a certain distance from the camera, it can be established that the rendered results will be sufficiently similar or identical to these similar viewports (e.g., similar embodiments are Discussed in connection with FIG. 16 of the disclosure). Some embodiments of the present disclosure may reuse (or re-project) all or part of pixel data from an arbitrary image in a current or previous frame based on determining the similarity of viewports.

예를 들어, 도 19a 및 19b는 스테레오 렌더링 경우들에서 다른 뷰포트로부터의 데이터를 재사용(또는 재투사)하는 예시뿐만 아니라 카메라들에 걸쳐 픽셀 데이터를 재사용(또는 재투사)하는 예시를 도시한다. 도 19a 내지 19b의 예시적인 실시 예에서, 변환 매트릭스는 뷰포트 2(1902)에서의 임의의 픽셀(x, y)을 뷰포트 1(1904)에서의 다른 픽셀 좌표(x', y')로 매핑하기 위해 정의될 수 있다. 따라서, (x', y')가 경계 내에 있고, 동일한 폐색(occlusion) 특성들을 가지면, 해당 데이터(예를 들어, 픽셀 데이터)는 뷰포트 1(1904)로부터 뷰포트 2(1902)로 복사될 수 있다. 일부 실시 예들에서, 폐색 특성들은 뷰포트들에 걸쳐 복사된 데이터가 동일한 객체 또는 객체들의 동일한 클래스(class)로부터 오는 것을 암시할 수 있고, 클래스는 상기 객체들의 깊이 범위에 의해 정의될 수 있다. 일부 실시 예들에서, (x', y')가 뷰포트 1(1904)에서의 범위를 벗어나면, 복사할 참조 데이터가 없으므로, 픽셀(x, y)은 뷰포트 2(1902)에서 렌더링될 것이다. 일부 실시 예들에서, 다른 뷰포트들 로부터의 재사용 vs 리드로우 분석 단계(또는 휴리스틱)(1824)는, 라이팅(lighting)이 뷰포트들에 걸쳐 상당히(또는 상대적으로 크게) 변할 수 있으므로, 카메라(1906)에 근접한 객체들을 식별하고 개별적으로 다룰 수 있다. For example, FIGS. 19A and 19B show an example of reusing (or reprojecting) data from a different viewport in stereo rendering cases, as well as an example of reusing (or reprojecting) pixel data across cameras. In the exemplary embodiment of FIGS. 19A-19B, the transformation matrix maps any pixel (x, y) in viewport 2 1902 to other pixel coordinates (x', y') in viewport 1 1904. Can be defined for Thus, if (x', y') is within the boundary and has the same occlusion characteristics, the data (eg, pixel data) can be copied from viewport 1 (1904) to viewport 2 (1902). . In some embodiments, occlusion characteristics may imply that data copied across viewports comes from the same object or the same class of objects, and the class may be defined by the depth range of the objects. In some embodiments, if (x', y') is out of range in viewport 1 1904, then pixel (x, y) will be rendered in viewport 2 1902 since there is no reference data to copy. In some embodiments, the reuse vs. readrow analysis step (or heuristic) 1824 from different viewports can be applied to the camera 1906 as lighting can vary significantly (or relatively large) across the viewports. Close objects can be identified and handled individually.

카메라로부터 멀리 떨어진 객체들 또는 배경에 대해, 뷰포트들에서의 차이는 라이팅에 상대적으로 크거나 충분한 차이를 일으키지 않을 정도로 충분히 작은 것으로 고려될 수 있으므로, 픽셀 데이터의 재사용(또는 재투사)를 허용한다. 본 개시의 일부 실시 예에서, 가까운 객체들(또는 전경)이 있는 영역들의 쉐이딩 레이트는 요구되는 최대일 수 있는 반면에, 먼 객체들(또는 배경)이 있는 영역들에 대한 것은 지원되는 최소일 수 있으며, 나머지 렌더링되지 않은 픽셀들/샘플들은 뷰포트들에 걸쳐 데이터를 복사(또는 재사용이나 재투사)함으로써 채워진다. For objects or backgrounds far from the camera, differences in viewports may be considered relatively large or small enough to not cause sufficient differences in lighting, thus allowing reuse (or reprojection) of pixel data. In some embodiments of the present disclosure, the shading rate of areas with nearby objects (or foreground) may be the maximum required, while those with distant objects (or background) may be the minimum supported. And the remaining unrendered pixels/samples are filled by copying (or reusing or reprojecting) the data across the viewports.

예를 들어, 도 19b에서, 뷰포트 1(1904)에서, 전경(1908) 및 배경(1910) 모두는 완전히 렌더링되었다. 그러나, 뷰포트 2(1902)에서는, 전경 영역(1912) 및 영역(1916)만이 렌더링되었다. 뷰포트들(1904 및 1902)의 전경들(1098 및 1912)에서의 차이가 상당하게(또는 충분히 높게) 고려될 수 있으므로, 뷰포트 2(1902)에서의 전경 영역(1912)은 렌더링되었다. 해당 영역이 뷰포트 1(1904)에서 결코 샘플링되지 않았으므로, 뷰포트 2(1902)의 영역(1916)은 렌더링되었다. 뷰포트 1(1904)의 배경(1910)으로부터의 렌더링된 데이터는 뷰포트 2(1902)의 배경(1914)에서 재사용(일부 경우들에서 재투사)될 수 있는데, 왜냐하면 뷰포트 1(1904) 및 뷰포트 2(1902)의 배경들(1910 및 1914)은 카메라로부터 멀기 때문이고, 따라서 뷰포트들(1904 및 1902)의 배경들(1910 및 1914)에서의 차이는 라이팅에서 상대적으로 크거나 상당한 차이를 유발하지 않을 정도로 작게 고려될 수 있다. 도 19a 내지 19b에서, 뷰포트 1(1904) 및 뷰포트 2(1902)는 좌안 이미지 및 우안 이미지(예를 들어, 도 16과 관련하여 논의됨)에 대응할 수 있다. For example, in FIG. 19B, in viewport 1 1904, both foreground 1908 and background 1910 have been fully rendered. However, in viewport 2 1902, only the foreground region 1912 and region 1916 were rendered. As the difference in the foregrounds 1098 and 1912 of the viewports 1904 and 1902 can be considered considerably (or sufficiently high), the foreground region 1912 in viewport 2 1902 was rendered. Area 1916 in viewport 2 1902 was rendered because that area was never sampled in viewport 1 1904. The rendered data from the background 1910 of viewport 1 1904 can be reused (reprojected in some cases) in the background 1914 of viewport 2 1902, because viewport 1 1904 and viewport 2 ( 1902)'s backgrounds 1910 and 1914 are far from the camera, so the difference in the backgrounds 1910 and 1914 of the viewports 1904 and 1902 does not cause a relatively large or significant difference in lighting. Can be considered small. In FIGS. 19A-19B, viewport 1 1904 and viewport 2 1902 may correspond to a left eye image and a right eye image (eg, discussed with respect to FIG. 16 ).

도 18로 되돌아가서, 각각의 휴리스틱들(1820, 1822, 및 1824)은 그것들의 각각의 분석에 기반한 제안된 쉐이딩 레이트를 결정할 수 있다. 예를 들어, 그것들의 각각의 분석에 기반하여, 휴리스틱(1820)은 입력 이미지의 제1 쉐이딩 레이트를 제안하거나 결정할 수 있고, 휴리스틱(1822)은 입력 이미지의 제2 쉐이딩 레이트를 제안하거나 결정할 수 있고, 그리고 휴리스틱(1824)은 입력 이미지의 제3 쉐이딩 레이트를 제안하거나 결정할 수 있다.Returning to Figure 18, each of the heuristics 1820, 1822, and 1824 can determine a proposed shading rate based on their respective analysis. For example, based on their respective analysis, the heuristic 1820 may suggest or determine a first shading rate of the input image, and the heuristic 1822 may suggest or determine a second shading rate of the input image and And, the heuristic 1824 may suggest or determine a third shading rate of the input image.

일부 실시 예들에서, 각각의 휴리스틱들(예를 들어, 1820, 1822, 및 1824)은 렌더링된 이미지의 특정 영역을 분석할 수 있고 그에 따른 쉐이딩 레이트를 제공할 수 있으므로, GPU는, 필요에 따르거나 원하는대로, 쉐이딩 감소 파이프라인 단계(1806)에서 이러한 휴리스틱들(예를 들어, 1820, 1822, 및 1824)을 다수 적용할 수 있다. 예를 들어, 일 실시 예에서, 휴리스틱(1820)은 렌더링된 이미지나 출력 이미지의 제1 영역을 분석할 수 있고 제1 영역에 대한 입력 이미지의 제1 쉐이딩 레이트를 제공할 수 있다. 유사하게, 일부 실시 예들에서, 휴리스틱(1822)은 렌더링된 이미지나 출력 이미지의 제2 영역을 분석할 수 있고 제2 영역에 대한 입력 이미지의 제2 쉐이딩 레이트를 제공할 수 있으며, 그리고 휴리스틱(1824)은 렌더링된 이미지나 출력 이미지의 제3 영역을 분석할 수 있고 제3 영역에 대한 입력 이미지의 제3 쉐이딩 레이트를 제공할 수 있다. In some embodiments, each of the heuristics (e.g., 1820, 1822, and 1824) can analyze a specific region of the rendered image and provide a shading rate accordingly, so that the GPU can, as needed, or As desired, many of these heuristics (eg, 1820, 1822, and 1824) can be applied in shading reduction pipeline stage 1806. For example, in an embodiment, the heuristic 1820 may analyze a first region of a rendered image or an output image and provide a first shading rate of an input image for the first region. Similarly, in some embodiments, the heuristic 1822 may analyze a second region of the rendered image or output image and provide a second shading rate of the input image for the second region, and the heuristic 1824 ) May analyze a third area of the rendered image or output image and provide a third shading rate of the input image for the third area.

일부 실시 예들에서, 렌더링된 이미지 또는 출력 이미지의 제1 영역은 렌더링된 이미지 또는 출력 이미지에서의 픽셀들의 렌더링된 서브셋(예를 들어, 제1 서브셋)이고, 렌더링된 이미지 또는 출력 이미지의 제2 영역은 렌더링된 이미지 또는 출력 이미지에서의 픽셀들의 다른 렌더링된 서브셋(예를 들어, 제2 서브셋)이고, 그리고 렌더링된 이미지 또는 출력 이미지의 제3 영역은 렌더링된 이미지 또는 출력 이미지에서의 픽셀들의 또 다른 렌더링된 서브셋(예를 들어, 제3 서브셋)이다. In some embodiments, the first region of the rendered image or output image is a rendered subset (e.g., a first subset) of pixels in the rendered image or output image, and the second region of the rendered image or output image Is the rendered image or another rendered subset of pixels in the output image (e.g., a second subset), and a third region of the rendered image or output image is another rendered image or another of the pixels in the output image. This is the rendered subset (eg, the third subset).

휴리스틱들(예를 들어, 1820, 1822, 및 1824)에 따라 쉐이딩 레이트들이 결정되면, 1825에서, GPU는 쉐이딩 레이트 이미지(예를 들어, 1808에서)를 결정하기 위해, 해상도 분석(resolving) 필터(예를 들어, 모든 휴리스틱들(예를 들어, 1820, 1822, 및 1824)에 걸쳐 최대 쉐이딩 레이트를 초래할 수 있는 최대 필터)를 사용할 수 있다. 예를 들어, GPU는 최대 필터를 사용하여 입력 이미지에 대한 제1 쉐이딩 레이트, 제2 쉐이딩 레이트, 및 제3 쉐이딩 레이트 중에서 최대 쉐이딩 레이트를 선택할 수 있다.If the shading rates are determined according to the heuristics (e.g., 1820, 1822, and 1824), at 1825, the GPU is to determine the shading rate image (e.g., at 1808), a resolution resolving filter ( For example, a maximum filter that can result in a maximum shading rate across all heuristics (eg, 1820, 1822, and 1824) can be used. For example, the GPU may select a maximum shading rate from among a first shading rate, a second shading rate, and a third shading rate for the input image using the maximum filter.

일부 실시 예들에서, 휴리스틱들(예를 들어, 1820, 1822, 및 1824)에 따른 쉐이딩 레이트들을 기반으로, GPU는 단계(1826)에서 쉐이딩 레이트 고차 함수(functional)를 결정하여 단계(1808)에서 쉐이딩 레이트 이미지를 계산하거나 생성할 수 있다. 일부 실시 예들에서, 각각의 휴리스틱들(예를 들어, 1820, 1822, 및 1824)에 의해 제안된 쉐이딩 레이트들로부터의 최대(또는 최소) 쉐이딩 레이트는 단계(1808)에서 쉐이딩 레이트 이미지를 생성하기 위해 선택될 수 있고, 이는 단계(1810)에서 출력 렌더링된 이미지를 생성하기 위해 단계(1816)에서 입력될 수 있다. In some embodiments, based on the shading rates according to the heuristics (e.g., 1820, 1822, and 1824), the GPU determines a shading rate functional in step 1826 and shading in step 1808. Rate images can be computed or generated. In some embodiments, the maximum (or minimum) shading rate from the shading rates suggested by each of the heuristics (e.g., 1820, 1822, and 1824) is to generate a shading rate image in step 1808. May be selected, which may be input in step 1816 to produce an output rendered image in step 1810.

일부 실시 예들에서, 각각의 휴리스틱들(예를 들어, 1820, 1822, 및 1824)에 의해 제안된 쉐이딩 레이트들로부터의 2개 이상의 쉐이딩 레이트들은 이미지의 다른 부분들을 렌더링하기 위해 선택될 수 있다. 그러나, 최대 또는 최소 쉐이딩 레이트를 선택하는 결정은 세밀하고 상황에 의존적일 수 있다. 예를 들어, 렌더링된 이미지에서의 품질 저하가 DoF 또는 모션 블러에 의한 것이면, 각각의 휴리스틱들(예를 들어, 1820, 1822, 및 1824)에 의해 제안된 쉐이딩 레이트들로부터의 최소 쉐이딩 레이트는 단계(1808)에서 쉐이딩 레이트 이미지를 생성하기 위해 선택될 수 있다. 쉐이딩 레이트 고차 함수를 결정하는 방법은 도 20과 관련하여 논의된다. In some embodiments, two or more shading rates from the shading rates suggested by the respective heuristics (eg, 1820, 1822, and 1824) may be selected to render different portions of the image. However, the decision to choose a maximum or minimum shading rate can be detailed and context dependent. For example, if the quality degradation in the rendered image is due to DoF or motion blur, the minimum shading rate from the shading rates suggested by the respective heuristics (e.g., 1820, 1822, and 1824) is It may be selected to generate a shading rate image at 1808. The method of determining the shading rate higher order function is discussed in connection with FIG. 20.

도 20은 렌더링된 이미지에서 품질 감소를 검출하고 쉐이딩 레이트 이미지를 계산(예를 들어, 입력 이미지에 대한 쉐이딩 레이트 계산)하는 쉐이더 분석 방법을 도시한 예시적인 순서도이다. 쉐이더 분석 방법(2000)은 단계(1820)에서 쉐이딩 감소 파이프라인(1806)의 GPU에서 수행될 수 있다. FIG. 20 is an exemplary flow chart illustrating a shader analysis method for detecting quality reduction in a rendered image and calculating a shading rate image (eg, calculating a shading rate for an input image). The shader analysis method 2000 may be performed in the GPU of the shading reduction pipeline 1806 in step 1820.

도 18과 관련하여 전술된 바와 같이, 샘플링 레이트 단계(1820)는, GPU 상에 렌더링된 입력 텍스처들로부터 다수의 샘플들(또는 탭(tap)들)을 평균화하는 특징을 갖는, 예를 들어, DoF, 모션 블러, 스무딩 필터들, 또는 이와 유사한 품질 감소 필터들의 검출에 의존한다. 이러한 패턴이 검출되면, 도 20과 관련하여 후술되는 바와 같이, 패턴은 수용 가능한(또는 충분한) 품질(예를 들어, 출력 이미지의 품질)을 달성하기 위해 원하는(또는 필요한) 쉐이딩 레이트(예를 들어, 입력 이미지에 대한 쉐이딩 레이트)를 결정하는데 이용될 수 있다. As described above with respect to FIG. 18, the sampling rate step 1820 has the feature of averaging a number of samples (or taps) from input textures rendered on the GPU, for example, It relies on the detection of DoF, motion blur, smoothing filters, or similar quality reduction filters. If such a pattern is detected, as will be described later in connection with FIG. 20, the pattern is the desired (or required) shading rate (e.g., the required) to achieve an acceptable (or sufficient) quality (e.g., the quality of the output image). , May be used to determine the shading rate for the input image).

예를 들어, 도 20은 휴리스틱(1820)이 어떻게 동작하는지에 대한 상세한 설명을 제공한다. 예를 들어, 도 20의 예시적인 실시 예에서, 프로세서 또는 GPU는 이미지 또는 텍스처로부터의 입력 탭들을 평균화하는 것과 관련된 렌더링을 결정할 수 있다. 이러한 결정을 기반으로, 프로세서 또는 GPU는 탭들의 수가 많을수록, 프로세서 또는 GPU가 입력 이미지(예를 들어, 도 20의 2006에 관련되어 논의된 “입력컬러(inputcolor)”)에서 렌더링 할 필요가 있는 픽셀들이 적어 지는 것을 결정할 수 있다. For example, FIG. 20 provides a detailed description of how the heuristic 1820 operates. For example, in the exemplary embodiment of FIG. 20, the processor or GPU may determine the rendering associated with averaging the input taps from the image or texture. Based on this determination, the processor or GPU has the higher the number of taps, the more pixels the processor or GPU needs to render in the input image (eg, “inputcolor” discussed in connection with 2006 in FIG. 20). You can decide what is going to be fewer.

예를 들어, 2002에서, 프로세서 또는 GPU는 출력 이미지(예를 들어, 출력컬러(outputcolor))를 렌더링 할 때 쉐이딩 동작들을 분석할 수 있다. 일 실시 예에서, 2002에서, 픽셀 또는 프래그먼트 쉐이더들은 출력 이미지(예를 들어, 출력컬러)에서의 품질 감소의 존재를 결정하기 위해 분석될 수 있다. For example, in 2002, a processor or GPU may analyze shading operations when rendering an output image (eg, outputcolor). In one embodiment, in 2002, pixel or fragment shaders may be analyzed to determine the presence of a quality reduction in the output image (eg, output color).

만약 2002에서, 출력 이미지의 품질 감소가 없는 것으로 결정되면, 2004에서, GPU는 현재 쉐이딩 레이트 이미지(예를 들어, 입력 이미지에 대한 쉐이딩 레이트)를 수정하지 않는다. 그러나, 만약 2002에서, 출력 이미지의 품질 감소가 결정되면, 2006에서, GPU는 출력 이미지(예를 들어, 출력 컬러)의 쉐이딩 동작이 하나 이상의 입력 이미지로부터의 픽셀 데이터의 가중된 합에 대응(예를 들어, outputcolor (x,y) = sum_i (weight_i (x_i, y_i) * inputcolor_i (x_i, y_i)))하는지 여부를 결정할 수 있다. 예를 들어, 일 실시 예에서, 2006에서, GPU는 출력 이미지의 출력 컬러가 하나 이상의 입력 이미지로부터의 픽셀 데이터의 가중된 합에 대응하는지 여부를 결정할 수 있다. If, in 2002, it is determined that there is no reduction in the quality of the output image, then in 2004, the GPU does not modify the current shading rate image (eg, shading rate for the input image). However, if, in 2002, the quality reduction of the output image is determined, in 2006, the GPU responds to the weighted sum of pixel data from one or more input images in which the shading operation of the output image (e.g., output color) is determined. For example, it may be determined whether outputcolor (x,y) = sum_i (weight_i (x_i, y_i) * inputcolor_i (x_i, y_i))). For example, in one embodiment, in 2006, the GPU may determine whether the output color of the output image corresponds to a weighted sum of pixel data from one or more input images.

예를 들어, 일 실시 예에서, 2006에서, GPU는 최종 또는 출력 이미지(예를 들어, 출력컬러)에 대한 고차 함수를 결정할 수 있고, 최종 또는 출력 이미지(예를 들어, 출력컬러) 또는 최종 또는 출력 이미지(예를 들어, 출력컬러)의 픽셀 컬러 값을 결정하기 위해 픽셀 데이터의 가중된 합을 표현하는 공식은 다음과 같다. For example, in one embodiment, in 2006, the GPU may determine a higher order function for the final or output image (e.g., output color), and the final or output image (e.g., output color) or final or The formula expressing the weighted sum of pixel data to determine the pixel color value of the output image (for example, the output color) is as follows.

outputcolor (x,y) = sum_i (weight_i (x_i, y_i) * inputcolor_i (x_i, y_i))outputcolor (x,y) = sum_i (weight_i (x_i, y_i) * inputcolor_i (x_i, y_i))

이 때, i는 [0, … (N-1)]에 있고, N은 입력 값들의 수(예를 들어, 입력 픽셀들의 수)이며, 그리고 (x, y)는 최종 또는 출력 이미지에서의 픽셀 좌표이다. 본 개시의 일부 실시 예들에서, N은 4일 수 있다. 일 실시 예에서, 최종 또는 출력 이미지의 고차 함수로부터, 특정 픽셀에서의 “출력컬러”는 다른 픽셀들 좌표들의 가중된 합으로 결론지어질 수 있다. At this time, i is [0,… (N-1)], N is the number of input values (eg, the number of input pixels), and (x, y) is the pixel coordinate in the final or output image. In some embodiments of the present disclosure, N may be 4. In one embodiment, from a higher order function of the final or output image, the “output color” at a particular pixel can be concluded as a weighted sum of the coordinates of the other pixels.

본 개시의 예시적인 실시 예들에서, 고차 함수는 입력으로서 다른 함수를 수신하는 함수로 정의될 수 있다. 예를 들어, DoF(Depth-of-Field) 필터의 경우에서, 다른 애플리케이션들은 다른 방법들로 그것들의 DoF(Depth-of-Field) 필터들을 프로그램하거나 정의할 수 있다. 고차 함수는 입력으로서 블러 함수를 정의하는 애플리케이션을 수신한 다음, 출력으로서 쉐이딩 레이트(예를 들어, 입력 이미지에 대한 쉐이딩 레이트)를 계산할 수 있다. 특히, 고차 함수는 특정 필터(예를 들어, DoF(Depth-of-Field), 모션 블러, 등)를 기반으로 쉐이딩 레이트(예를 들어, 입력 이미지에 대한 쉐이딩 레이트)를 결정하는 로직일 수 있고, 고차 함수는 애플리케이션에 의해 필터가 프로그램되거나 정의된 방식에 무관하다.In example embodiments of the present disclosure, a higher-order function may be defined as a function that receives another function as an input. For example, in the case of a Depth-of-Field (DoF) filter, different applications can program or define their Depth-of-Field (DoF) filters in different ways. The higher-order function may receive an application defining a blur function as an input and then calculate a shading rate (eg, a shading rate for an input image) as an output. In particular, the higher-order function may be logic that determines a shading rate (eg, a shading rate for an input image) based on a specific filter (eg, Depth-of-Field (DoF), motion blur, etc.) However, higher-order functions are independent of how the filter is programmed or defined by the application.

2006에서, 프로세서 또는 GPU가 출력 이미지(예를 들어, 출력컬러)의 쉐이딩 동작이 하나 이상의 입력 이미지로부터의 데이터의 가중된 합에 대응(예를 들어, outputcolor (x,y) = sum_i (weight_i (x_i, y_i) * inputcolor_i (x_i, y_i)))한다고 결정하면, 2008에서, 입력 이미지(예를 들어, 입력 컬러)에서의 샘플들의 특정 서브셋을 쉐이딩하는 것은 출력 이미지(예를 들어, 출력 컬러)를 정확하게 재생성하기에 충분할 수 있기 때문에, 프로세서 또는 GPU는 입력 이미지(예를 들어, 입력 컬러)의 쉐이딩(예를 들어, 쉐이딩 레이트, 예를 들어, 입력 이미지에 대한 쉐이딩 레이트)을 감소(또는 새로운 쉐이딩 레이트 이미지를 생성)시킬 수 있다. In 2006, the processor or GPU corresponds to the shading operation of the output image (e.g., output color) to a weighted sum of data from one or more input images (e.g., outputcolor (x,y) = sum_i (weight_i ( If we determine that x_i, y_i) * inputcolor_i (x_i, y_i))), then in 2008, shading a specific subset of samples in the input image (e.g., input color) is the output image (e.g., output color). The processor or GPU reduces the shading (e.g., the shading rate, e.g. the shading rate for the input image) of the input image (e.g. Shading rate image can be generated).

일 실시 예에서, 2008에서, 프로세서 또는 GPU는 입력 이미지 또는 이미지들(예를 들어, 입력컬러)에 대한 쉐이딩 레이트 이미지(예를 들어, 입력 이미지에 대한 쉐이딩 레이트)를 생성할 수 있다. 예를 들어, 2008에서, GPU는 입력 이미지 또는 이미지들(예를 들어, 입력컬러)에 대한 쉐이딩 레이트 이미지를 생성하고, 쉐이딩된 샘플들의 수는 고차 함수에서의 입력 값들의 수 “N”에 반비례한다. 그러나, 쉐이딩된 샘플들의 수 및 고차 함수에서의 입력 값들의 수 “N” 사이의 비례 관계(또는 정확한 비례)는 GPU의 워크로드(workload) 및 2002에서 결정된 품질 감소를 기반으로 조정될 수 있다. 예를 들어, 일부 실시 예들에서, 가중된 합은, 입력 이미지(예를 들어, 입력컬러)가 특정 예시일 때의 N개의 입력 이미지 프레임들에 걸쳐, 입력 이미지들의 픽셀 데이터의 가중된 합을 포함할 수 있다.In an embodiment, in 2008, the processor or GPU may generate a shading rate image (eg, a shading rate for the input image) for the input image or images (eg, input color). For example, in 2008, the GPU generates a shading rate image for the input image or images (eg, input color), and the number of shaded samples is inversely proportional to the number “N” of input values in the higher order function. do. However, the proportional relationship (or exact proportion) between the number of shaded samples and the number “N” of input values in the higher-order function can be adjusted based on the workload of the GPU and the quality reduction determined in 2002. For example, in some embodiments, the weighted sum comprises a weighted sum of the pixel data of the input images over the N input image frames when the input image (e.g., input color) is a specific example. can do.

2010에서, GPU는 단계들(1826 또는 1808)을 사용하여 존재할 수 있는 임의의 다른 것들(예를 들어, 1822, 1824)과 쉐이딩 레이트 이미지(예를 들어, 2008로부터의, 예를 들어, 입력 이미지에 대한 쉐이딩 레이트)의 조합을 구현한다.In 2010, the GPU uses steps 1826 or 1808 to present any other (e.g., 1822, 1824) and shading rate image (e.g., input image from 2008). And shading rate).

일 실시 예에서, 도 20의 방법으로부터의 출력은 1806 내의 1825에서 max() 함수의 입력으로 쓰일 수 있다.In one embodiment, the output from the method of FIG. 20 may be used as the input of the max() function at 1825 in 1806.

일 실시 예에서, 쉐이딩(예를 들어, 입력 이미지에 대한 쉐이딩 레이트)에서의 감소는 동일한 이미지에 대한 것이 아닐 수 있다. 일부 실시 예들에서, 하나의 이미지(예를 들어, 출력 이미지)로부터의 정보를 사용하여, 이전 이미지(예를 들어, 입력 이미지)에서의 쉐이딩 레이트가 감소될 수 있다(이 때, “이전 이미지”는 시간적인 측면에서가 아니라, 현재 이미지와의 의존성 측면에서의 이미지이다).In one embodiment, the reduction in shading (eg, a shading rate for an input image) may not be for the same image. In some embodiments, using information from one image (eg, output image), the shading rate in the previous image (eg, input image) may be reduced (in this case “previous image”). Is an image not in terms of time, but in terms of dependency with the current image).

도 21은, 도 18과 관련하여 논의된 바와 같이, GPU에 의해 이미지 프레임들의 적응형 쉐이딩을 수행하는 예시적인 방법을 도시한다. FIG. 21 shows an exemplary method of performing adaptive shading of image frames by a GPU, as discussed in connection with FIG. 18.

2102에서, GPU는 하나 이상의 휴리스틱을 GPU 내의 픽셀 쉐이더에 적용함으로써 품질 감소 필터가 사용되었는지 여부를 결정한다(예를 들어, 도 18과 관련하여 논의된 바와 같이).At 2102, the GPU determines whether a quality reduction filter has been used by applying one or more heuristics to the pixel shaders in the GPU (eg, as discussed with respect to FIG. 18).

2104에서, GPU는 GPU에서 품질 감소 필터가 사용된다고 결정한 것을 기반으로 픽셀 쉐이더에서의 함수의 고차 함수를 결정한다(도 20과 관련하여 논의됨). At 2104, the GPU determines a higher-order function of the function in the pixel shader based on determining that the quality reduction filter is used in the GPU (discussed with respect to FIG. 20).

2106에서, GPU는 고차 함수를 기반으로 이전 이미지 프레임(예를 들어, 입력 이미지 프레임)(또는 이전 이미지 프레임의 적어도 일부)에 대한 제1 쉐이딩 레이트를 결정한다(도 20과 관련하여 논의된 것처럼). 일부 실시 예들에서, 이전 이미지 프레임(예를 들어, 입력 이미지 프레임)의 일부는 이전 이미지 프레임에서의 픽셀들의 렌더링된 서브셋(예를 들어, 제1 서브셋)이다. At 2106, the GPU determines a first shading rate for the previous image frame (e.g., the input image frame) (or at least a portion of the previous image frame) based on the higher order function (as discussed in connection with FIG. 20). . In some embodiments, a portion of the previous image frame (eg, input image frame) is a rendered subset (eg, first subset) of pixels in the previous image frame.

2108에서, GPU는 제1 쉐이딩 레이트를 기반으로 이전 이미지 프레임(예를 들어, 이전 이미지 프레임의 일부)를 렌더링한다(도 18 및 20과 관련하여 논의됨).At 2108, the GPU renders the previous image frame (eg, part of the previous image frame) based on the first shading rate (discussed with respect to FIGS. 18 and 20).

일부 실시 예들에서, 2110에서, GPU는 기초 자산들, 예를 들어, 카메라 및 뷰포트(예를 들어, 관측자에 의해 관측되는 장면을 변경하는 모든 것)에서의 변화들을 체크하고, 재투사를 사용하여 이전 이미지 또는 이전 이미지 프레임으로부터의 데이터를 재사용한다. 기하, 쉐이더들, 상태들, 또는 변형을 포함하는 임의의 다른 자산들이 변경되면, 이전 이미지를 현재 이미지에 재사용 또는 재투사하는 것이 불가능할 수 있다. 일 실시 예에서, 2110에서, GPU는 현재 이미지 프레임 및 이전 이미지 프레임 사이의 기초 자산들에서의 변화(도 18의 1822에 관하여 논의된 바와 같이, 예를 들어, 입력 이미지 프레임 및 출력 이미지 프레임 사이의 기초 자산들(예를 들어, 카메라 및 뷰포트)에서의 변화)를 결정한다.In some embodiments, at 2110, the GPU checks for changes in underlying assets, e.g., camera and viewport (e.g., anything that changes the scene observed by the observer), and uses reprojection. Reuse data from the previous image or previous image frame. If any other assets including geometry, shaders, states, or transformations change, it may be impossible to reuse or reproject the previous image onto the current image. In one embodiment, at 2110, the GPU changes in underlying assets between the current image frame and the previous image frame (as discussed with respect to 1822 in Figure 18, e.g., between the input image frame and the output image frame. Determine the underlying assets (eg, changes in camera and viewport).

일부 실시 예들에서, 2112에서, GPU는 현재 이미지 프레임 및 이전 이미지 프레임 사이의 기초 자산들에서의 변화가 제1 임계값 미만이라는 결정에 기반하여 제2 쉐이딩 레이트에 기반하거나 이전 이미지 프레임으로부터의 이미지 데이터를 재사용 또는 재투사함으로써 현재 이미지 프레임(또는 현재 이미지 프레임의 일부)을 렌더링한다(도 18, 19, 및 20에 관하여 논의됨). 일부 실시 예들에서, 현재 이미지 프레임의 일부는 현재 이미지 프레임에서의 픽셀들의 렌더링된 서브셋이다. In some embodiments, at 2112, the GPU is based on a determination that the change in underlying assets between the current image frame and the previous image frame is less than the first threshold, based on the second shading rate or image data from the previous image frame. Render the current image frame (or part of the current image frame) by reusing or reprojecting (discussed with respect to FIGS. 18, 19, and 20). In some embodiments, a portion of the current image frame is a rendered subset of pixels in the current image frame.

DoF(Depth-of-Field)에 기반한 품질 감소의 예시적인 실시 예An exemplary embodiment of quality reduction based on depth-of-field (DoF)

일부 예시적인 실시 예들에서, DoF(Depth-of-Field) 실험은 특정 함수 및 고차 함수가 사용될 수 있는 도 20의 실시 예의 예시일 수 있고, 그리고 쉐이딩 레이트 감소의 사용이 품질면에서 현저한 감소를 초래하지 않을 수 있다고(예를 들어, 산업 표준 메트릭(metric) PSNR(peak signal to noise ratio)에 의해 측정될 때) 결정될 수 있다. In some exemplary embodiments, the depth-of-field (DoF) experiment may be an example of the embodiment of FIG. 20 in which a specific function and a higher order function can be used, and the use of shading rate reduction results in a significant decrease in quality. It may be determined that it may not (eg, as measured by the industry standard metric peak signal to noise ratio (PSNR)).

일 실시 예에서, 쉐이더는 퍼-픽셀 깊이(per-pixel depth)를 사용하고 퍼-픽셀 깊이와 초점 길이(예를 들어, 카메라가 초점을 맞춘 거리, 예를 들어, 인간의 시각 시스템은 초점 거리가 50-55 mm인 렌즈와 근사함)를 비교할 수 있다. 퍼-픽셀 깊이 및 초점 길이 사이의 차이는 LOD(level of detail)를 사용하여 입력 이미지로부터 복사될 수 있는 결과의 흐릿함(blurry)의 정도를 결정할 수 있다. 퍼-픽셀 깊이가 초점 길이에 가까우면, 선택된 LOD는 0에 근접(예를 들어, 가장 상세함)할 수 있다. 그러나, 퍼-픽셀 깊이가 초점 길이로부터 멀면, 높은 LOD 레벨들이 선택될 수 있다. In one embodiment, the shader uses per-pixel depth and the per-pixel depth and focal length (e.g., the distance the camera is focused, e.g., the human visual system is Is approximate with a 50-55 mm lens). The difference between the per-pixel depth and focal length can determine the degree of blurry of the result that can be copied from the input image using the level of detail (LOD). If the per-pixel depth is close to the focal length, the selected LOD may be close to zero (eg, most detailed). However, if the per-pixel depth is far from the focal length, high LOD levels may be selected.

높은 LOD 레벨들은 낮은 LOD 레벨들의 평균일 수 있다(예를 들어, 레벨 1은 2M x 2N 레벨 0의 M x N 평균화된 버전이다). 따라서, 프로세서 또는 GPU는, outputcolor(x, y)가 (x, y)로부터의 거리를 기반으로 가중치가 방사적으로(radially) 감소하는 (x, y)로부터 특정 거리까지 입력컬러 레벨 0의 다수의 픽셀들의 가중된 합인, 함수를 생성할 수 있다. The high LOD levels may be the average of the low LOD levels (eg, level 1 is the M x N averaged version of 2M x 2N level 0). Therefore, the processor or GPU, the outputcolor (x, y) is a number of input color level 0 to a specific distance from (x, y) whose weight is radially decreased based on the distance from (x, y). We can create a function, which is the weighted sum of the pixels of.

그러므로, 일 실시 예에서, (x, y)의 퍼-픽셀 깊이가 초점 길이로부터 멀면, 프로세서 또는 GPU는 쉐이딩에서의 에러가 여러 픽셀들에 걸쳐 평균화될 수 있으므로 입력컬러 이미지 레벨 0의 쉐이딩 레이트를 감소시킬 수 있다. 입력컬러 레벨 0는 해당 특정 워크로드에서 렌더링하기에 가장 비싼 이미지일 수 있으므로, 절약(saving)은 관련 기술 분야에서의 방법들과 비교하여 상대적으로 높거나 중요할 수 있다. 그러므로, 품질(PSNR에 의해 측정됨)을 수용 가능한 레벨(예를 들어, 가시(visible) 레벨들 미만) 내에서 유지하면서도, 쉐이딩은 이러한 로직을 이용하여 감소(그리고 쉐이딩에 필요한 작업의 양도 감소될 수 있음)될 수 있다. Therefore, in one embodiment, if the per-pixel depth of (x, y) is far from the focal length, the processor or GPU can calculate the shading rate of the input color image level 0 since the error in shading can be averaged over several pixels. Can be reduced. Since input color level 0 may be the most expensive image to render in that particular workload, saving may be relatively high or significant compared to methods in the related art. Therefore, while keeping the quality (measured by PSNR) within acceptable levels (e.g., less than visible levels), shading is reduced (and the amount of work required for shading) is reduced using this logic. Can be) can be.

본 개시의 예시적인 실시 예들에서, PSNR은 평균-제곱(mean-square) 에러의 로그(log)일 수 있다(즉, 컬러 값들에서의 퍼-픽셀 차이를 (1) 풀-해상도 렌더링 및 (2) 상술된 예시적인 접근법을 사용한 렌더링으로부터 계산한다. 그런 다음, 이러한 퍼-픽셀 차이의 제곱을 합산하고, log(sqrt(sum))을 계산하여 PSNR을 결정한다). PSNR을 가능한 높게 하는 것이 바람직할 수 있다. In example embodiments of the present disclosure, PSNR may be the log of a mean-square error (i.e., the per-pixel difference in color values is (1) full-resolution rendering and (2)). ) Compute from rendering using the exemplary approach described above, then sum the squares of these per-pixel differences and calculate log(sqrt(sum)) to determine PSNR). It may be desirable to make the PSNR as high as possible.

일 실시 예에서, PSNR이 40보다 크거나 같으면, 이것은 대부분의 사람들에게 풀 해상도 렌더링과 시각적으로 구별되지 않을 수 있고, PSNR이 50 또는 60보다 크거나 같으면, 이것은 모든 사람들에게 시각적으로 구별되지 않을 수 있다. PSNR이 40보다 크거나 같으면, 입력 이미지(예를 들어, 입력컬러 레벨 0 이미지)에서 쉐이딩 작업의 약 25%의 절약이 달성될 수 있다. In one embodiment, if the PSNR is greater than or equal to 40, this may be visually indistinguishable from full resolution rendering for most people, and if the PSNR is greater than or equal to 50 or 60, it may not be visually distinguishable to all people. have. If the PSNR is greater than or equal to 40, savings of about 25% of shading operations in the input image (eg, input color level 0 image) can be achieved.

도 2와 관련하여 논의된 바와 같이, 적응형 디샘플링(AD; Adaptive Desampling)은, 최종 이미지 디테일의 공간 프로파일에 대한 정보가 그래픽 파이프라인의 초기 단계들로 역-전파(back-propagate)되어 적은 쿼드(quad)들을 렌더링하고 이들 사이의 보간하는 것이 최소한의 시각적 아티팩트들을 유발할 수 있는 영역들을 식별하는, 손실 쿼드 컬링(culling) 기법일 수 있다. 일부 경우들에서, LOD(level of detail) 특징들의 레벨의 파라미터 세팅들을 활용함으로써(예를 들어, 텍스처 샘플러들에서), 다른 것들보다 큰 렌더링 디테일이 필요한 영역들이 식별될 수 있고 렌더링 작업을 감소시키기 위해 AD에 의해 사용될 수 있다. As discussed in connection with FIG. 2, in the adaptive desampling (AD), information on the spatial profile of the final image detail is back-propagate to the initial stages of the graphics pipeline and thus is small. Rendering quads and interpolating between them may be a lossy quad culling technique, identifying areas that may cause minimal visual artifacts. In some cases, by utilizing the parameter settings of the level of the level of detail (LOD) features (e.g. in texture samplers), areas that require greater rendering detail than others can be identified and reduce rendering work. Can be used by AD for harm.

일부 경우들에서, 예를 들어, 렌더링이 필요하지 않을 수 있는 이미지의 영역에서, AD는 렌더링 작업을 절약하기 위해 사용될 수 있다. 예를 들어, 정밀한 스케일의 디테일들을 요구하지 않는 렌더링 타겟(예를 들어, 이미지)의 영역들은 눈에 띄는 시각적 아티팩트들을 겪지 않고 낮은 해상도로 종종 렌더링될 수 있다. 일부 경우들에서, AD는 풀 해상도 이미지를 드물게(sparsely) 렌더링할 수 있고, 그 사이의 픽셀들을 재구성할 수 있다. 이러한 결과들을 달성하기 위해, AD는 렌더링 스킴을 이용하여 쿼드들(예를 들어, 렌더링에서 기본 작업 단위인 2 x 2 픽셀들의 블록)이 4 x 4 블록(예를 들어, 멀티쿼드(MultiQuad)) 내에서 3 x 3 영역(예를 들어, 3 x 3 쿼드) 내에서의 4개의 픽셀들의 비-연속적(non-contiguous)인 세트로 재-맵핑(re-mapping)되게끔 한다. In some cases, for example in areas of the image where rendering may not be necessary, AD can be used to save rendering work. For example, areas of a rendering target (eg, an image) that do not require fine-scale details can often be rendered at low resolution without experiencing noticeable visual artifacts. In some cases, the AD can sparsely render a full resolution image and reconstruct the pixels in between. To achieve these results, AD uses a rendering scheme so that the quads (e.g., a block of 2 x 2 pixels, which is the basic unit of work in rendering) are 4 x 4 blocks (e.g., MultiQuad). To a non-contiguous set of 4 pixels within a 3 x 3 area (eg, a 3 x 3 quad) to be re-mapping.

일부 경우들에서, 도 22에서 도시된 바와 같이, 상술된 방식으로 픽셀들을 배열하는 것은 렌더링되지 않은 임의의 3 x 3 쿼드가 렌더링된 이웃하는 픽셀들로부터 재구성되게 할 수 있다. 일부 경우들에서, 쿼드들 중 적어도 하나가 렌더링되는 한, 임의의 렌더링되지 않은 픽셀들은 적어도 하나의 렌더링된 이웃을 가질 것이며 이 이웃으로부터 재구성된다. 일부 경우들에서, AD는 또한 렌더링 밀도가 증가함에 따라 어느 3 x 3 쿼드들이 점진적으로 렌더링되는지에 대한 순서를 식별한다. 일부 실시 예들에서, 이러한 순서는 아래의 렌더링된 쿼드들의 세트들과 같은 멀티쿼드에 대한 3개의 이용 가능한 렌더링 밀도들을 정의할 수 있고, 렌더링된 쿼드들의 세트들은 다음과 같다. {0}, {0, 3}, 및 {0, 1, 2, 3}. 일부 경우들에서, 이러한 밀도들 각각은 원-핫(one-hot) 인코딩된 4비트 값으로 표현될 수 있으며, 각각의 비트는 대응하는 쿼드가 렌더링되는지 여부를 가리킨다. 일부 경우들에서, 이러한 코드는 적응형 디샘플링 샘플 패턴(ADSP; adaptive desampling sample pattern) 코드로 불린다. 도 23은 멀티쿼드 내에서 이러한 코드들(예를 들어, ADSP 코드들) 및 렌더링된 픽셀들 사이의 대응관계를 도시한다. 일부 실시 예들에서, 다른 재구성 스킴들이 렌더링 밀도에 따라 사용될 수 있다. In some cases, as shown in FIG. 22, arranging the pixels in the manner described above may cause any 3 x 3 quad that has not been rendered to be reconstructed from the rendered neighboring pixels. In some cases, as long as at least one of the quads is rendered, any unrendered pixels will have at least one rendered neighbor and are reconstructed from this neighbor. In some cases, the AD also identifies the order of which 3 x 3 quads are progressively rendered as the rendering density increases. In some embodiments, this order may define three available rendering densities for a multiquad, such as the sets of rendered quads below, the sets of rendered quads as follows. {0}, {0, 3}, and {0, 1, 2, 3}. In some cases, each of these densities can be represented as a one-hot encoded 4-bit value, with each bit indicating whether the corresponding quad is rendered. In some cases, this code is called an adaptive desampling sample pattern (ADSP) code. Fig. 23 shows the correspondence between these codes (eg ADSP codes) and rendered pixels within a multiquad. In some embodiments, other reconstruction schemes may be used depending on the rendering density.

일부 실시 예들에서, AD는 정보 손실을 최소화하면서 어느 렌더링 밀도를 사용할지 그리고 렌더링 작업 절약들을 어디에서 최대화할지를 결정하는데 사용될 수 있다. 정보 손실을 최소화하면서 렌더링 작업 절약들을 최대화하는 것 사이의 트레이드-오프에서 바람직한 지점을 결정하기 위해, 일부 실시 예들에서, AD는 정보 손실이 적은 이미지의 영역들에서 드물게 렌더링할 수 있고(예를 들어, 최종 이미지에서 작은 디테일이 존재함), 세밀한 디테일이 바람직할 수 있는 영역들에서 짙게(densely) 렌더링할 수 있다. 일부 실시 예들에서, 정보 손실은 AD 렌더링된 이미지 및 이미지의 풀 렌더링된 버전 사이의 PSNR 메트릭(지표)을 사용하여 추정될 수 있다. 본 개시의 일부 실시 예들에서, AD는 최종 이미지 자체가 알려지지 않았을 때, 이미지의 어느 영역들이 많게 또는 적게 디테일을 요구하는지 식별할 수 있고, 이는 AD의 2개의 패스들을 수반할 수 있다. 예를 들어, 제1 패스에서, 깊이 에지들은 깊이 버퍼에서 에지 검출 동작들을 사용하여 식별될 수 있고, 렌더링 밀도는 에지들이 발생하는 경우 상대적으로 높은 값으로 설정될 수 있다. 이후, 이미지는 대응하는 밀도에서 렌더링될 수 있다. 제2 패스는 컬러 에지들을 식별할 수 있고(예를 들어, 제1 패스에서 렌더링된 이용 가능한 쿼드들을 사용하여), 필요한 경우 밀도를 증가시킬 수 있다. 일부 실시 예들에서, AD는 타겟 이미지가 다운 스트림(downstream)에서 샘플링되는 방법을 특정하는 사용자 특정 파라미터들을 검사함으로써 높은 또는 낮은 밀도가 필요한 영역들을 추론할 수 있다. In some embodiments, the AD can be used to determine which rendering density to use while minimizing information loss and where to maximize rendering operation savings. To determine a desirable point in the trade-off between maximizing rendering operation savings while minimizing information loss, in some embodiments, the AD may render infrequently in areas of the image with less information loss (e.g. , Small details are present in the final image), densely rendering can be achieved in areas where fine details may be desirable. In some embodiments, information loss may be estimated using a PSNR metric (metric) between an AD rendered image and a full rendered version of the image. In some embodiments of the present disclosure, the AD may identify which areas of the image require more or less detail when the final image itself is unknown, which may involve two passes of the AD. For example, in the first pass, depth edges can be identified using edge detection operations in the depth buffer, and the rendering density can be set to a relatively high value when edges occur. Then, the image can be rendered at the corresponding density. The second pass can identify the color edges (eg, using the available quads rendered in the first pass) and increase the density if necessary. In some embodiments, the AD can infer areas that require high or low density by examining user specific parameters that specify how the target image is sampled downstream.

2개의 패스 접근법의 대안으로서, AD의 다른 실시 예에서, 고차 함수 분석이 외부 의존성들의 함수로서 쉐이더 프로그램에서 이미지의 샘플링 함수를 추론하기 위해 사용(도 20과 관련하여 논의됨)될 수 있다. 예를 들어, 일부 실시 예들에서, glsl(예를 들어, OpenGL Shading Language)에서의 텍스처 방법은 타겟 텍스처가 샘플링되는 방법을 특정하는 LOD 파라미터를 수용할 수 있다. 더 높은 값들에 대해, 텍스처는 더 낮은 LOD로 샘플링될 수 있고, 요청된 좌표에서 텍스처의 블러(blurred) 버전이 반환될 수 있으며, 그 반대의 경우도 마찬가지이다.As an alternative to the two pass approach, in another embodiment of AD, higher-order function analysis can be used (discussed with respect to FIG. 20) to infer the sampling function of the image in the shader program as a function of external dependencies. For example, in some embodiments, the texture method in glsl (eg, OpenGL Shading Language) may accept an LOD parameter that specifies how the target texture is sampled. For higher values, the texture can be sampled with a lower LOD, and a blurred version of the texture can be returned at the requested coordinates, and vice versa.

비록, “제1”, “제2”, “제3”, 등의 용어들은 본 명세서에서 다양한 요소들, 성분들, 영역들, 층들, 및/또는 섹션들을 설명하기 위해 사용되지만, 이러한 요소들, 성분들, 영역들, 층들, 및/또는 섹션들은 이러한 용어들로 인해 제한되지 않는 것으로 이해될 것이다. 이러한 용어들은 하나의 요소, 성분, 영역, 층, 또는 섹션을 다른 요소, 성분, 영역, 층, 또는 섹션으로부터 구별하기 위해서 사용된다. 따라서, 본 명세서에서 기술된 제1 요소, 성분, 영역, 층, 또는 섹션은 본 발명의 기술적 사상의 범위를 벗어나지 않고, 제2 요소, 성분, 영역, 층, 또는 섹션으로 지칭될 수 있다. Although terms such as “first”, “second”, “third”, etc. are used herein to describe various elements, components, regions, layers, and/or sections, these elements It will be understood that, components, regions, layers, and/or sections are not limited by these terms. These terms are used to distinguish one element, component, region, layer, or section from another element, component, region, layer, or section. Accordingly, a first element, component, region, layer, or section described herein may be referred to as a second element, component, region, layer, or section without departing from the scope of the technical idea of the present invention.

본 명세서에서, “아래”, “하부의”, “낮은”, “위에”, “상부의”, 및 이와 유사한 용어들과 같은 공간적으로 상대적인 용어들은, 도면들에서 도시된 다른 구성 요소(들) 또는 특징(들)과의 관계에서, 하나의 구성 요소 또는 특징을 용이하게 설명하기 위해 사용될 수 있다. 공간적이고 상대적인 용어들은 도면에서 도시된 방향에 더해 사용 또는 동작에서 장치의 다른 방향들을 포함하는 것으로 의도된다. 예를 들어, 만약 도면들에서 장치가 뒤집어지면, 다른 구성 요소들 또는 특징들의 “아래” 또는 “하부의”로 묘사된 구성 요소들은 다른 구성 요소들 또는 특징들의 “위”를 향하게 된다. 즉, 예시적인 용어 “아래”는 위와 아래의 방향을 모두 포함할 수 있다. 장치는 다르게 향할 수 있고(90도 회전됨 또는 다른 방향을 향함) 본 명세서에서 사용된 공간적으로 상대적인 묘사들은 그에 따라 해석되어야 한다. 또한, 층이 2개의 층들 “사이”에 있는 것으로 언급될 때, 이는 2개의 층들 사이의 유일한 층일 수 있거나, 하나 이상의 개재 층이 존재할 수 있는 것으로도 이해될 것이다. In this specification, spatially relative terms such as “lower”, “lower”, “lower”, “above”, “upper”, and similar terms are used for other component(s) shown in the drawings. Or, in relation to the feature(s), it may be used to easily describe one component or feature. Spatial and relative terms are intended to include other directions of the device in use or operation in addition to the directions shown in the figures. For example, if the device is turned over in the drawings, components depicted as “below” or “bottom” of other components or features will face “above” the other components or features. That is, the exemplary term “below” may include both an upward and downward direction. The device may be oriented differently (rotated 90 degrees or oriented differently) and the spatially relative descriptions used herein should be interpreted accordingly. It will also be understood that when a layer is referred to as being “between” two layers, it may be the only layer between the two layers, or there may be more than one intervening layer.

본 명세서에서 사용된 용어는 단지 특정 실시 예들을 기술하려는 목적으로 사용되고, 본 개시를 제한하는 것으로 의도되지 않는다. 본 명세서에서 사용된 바와 같이, 용어들 “대체로”, “약”, 그리고 이와 유사한 용어들은 근사치의 용어들로서 사용되고, 정도의 용어들로서 사용되지 않으며, 본 발명이 속한 기술 분야에서의 통상의 기술자에 의해 식별되는 측정된 또는 계산된 값들의 고유한 변동을 고려하기 위한 것으로 의도된다. The terms used herein are only used for the purpose of describing specific embodiments, and are not intended to limit the present disclosure. As used herein, the terms “usually”, “about”, and similar terms are used as approximate terms, not used as terms of degree, and are not used by those skilled in the art to which the present invention belongs. It is intended to take into account the inherent variation of the measured or calculated values that are identified.

본 명세서에서 사용된 단수의 표현은 문맥상 명백히 다르게 지시하지 않는 한 복수의 표현도 포함하는 것으로 의도된다. 본 명세서에서 사용된, 용어들 “포함하다” 및/또는 “포함하는”은 언급된 특징들, 정수들, 단계들, 동작들, 요소들, 및/또는 성분들의 존재를 명시하는 것으로 이해되지만, 하나 이상의 다른 특징, 정수, 단계, 요소, 성분, 및/또는 그것들의 조합의 존재 또는 추가를 배제하지 않는 것으로 이해된다. 본 명세서에서 사용된, 용어 “및/또는”은 하나 이상의 관련하여 열거된 항목들의 임의의 모든 조합들을 포함한다. “적어도 하나”와 같은 표현들은, 요소들의 항목 앞에 사용되면, 요소들의 전체 항목을 수정하고 그리고 항목의 개별 요소들을 수정하지 않는다. 또한, “할 수 있다”의 사용은, 본 개시의 실시 예들을 기술할 때, “본 개시의 하나 이상의 실시 예”를 지칭한다. 또한, 용어 “예시적인”은 예 또는 예시를 지칭하는 것으로 의도된다. 본 명세서에서 사용된 바와 같이, 용어들 “사용”, “사용하는”, 및 “사용된”은 각각 “이용”, “이용하는”, 및 “이용된”의 동의어로 고려될 수 있다. Singular expressions used in this specification are intended to include plural expressions unless the context clearly dictates otherwise. As used herein, the terms “comprise” and/or “comprising” are understood to specify the presence of the recited features, integers, steps, actions, elements, and/or components, It is understood that it does not exclude the presence or addition of one or more other features, integers, steps, elements, components, and/or combinations thereof. As used herein, the term “and/or” includes any and all combinations of one or more related listed items. Expressions such as “at least one”, when used before an item of elements, modify the entire item of elements and do not modify individual elements of the item. In addition, the use of “can” refers to “one or more embodiments of the present disclosure” when describing embodiments of the present disclosure. Also, the term “exemplary” is intended to refer to an example or illustration. As used herein, the terms “use”, “use”, and “used” may be considered synonymous with “use”, “use”, and “used”, respectively.

요소 또는 층이 다른 요소 또는 층과의 관계에서 “그 위에”, “연결된”, “결합된”, 또는 “인접한”것으로 언급될 때, 그것은 다른 요소 또는 층과의 관계에서 직접 그 위에 위치한, 연결된, 결합된, 또는 인접한 것일 수 있거나, 하나 또는 그 이상의 개입 요소 또는 층이 존재하는 것일 수 있다. 이와 대조적으로, 요소 또는 층이 다른 요소 또는 층과의 관계에서 “그 위에 직접 위치한”, “직접 연결된”, “직접 결합된”, 또는 “바로 인접한”것으로 언급될 때, 개입 요소 또는 층이 존재하지 않는 것으로 이해되어야 한다. When an element or layer is referred to as “on it”, “connected”, “joined”, or “adjacent” in relation to another element or layer, it is a connected element or layer located directly on it in relation to another element or layer. , Combined, or contiguous, or the presence of one or more intervening elements or layers. In contrast, when an element or layer is referred to as “directly located”, “directly connected”, “directly joined”, or “directly adjacent” in relation to another element or layer, an intervening element or layer is present. It should be understood as not.

본 명세서에서 인용된 임의의 수치 범위는 인용된 범위 내에서 동일한 수치적 정밀도의 모든 하위-범위들의 추정을 포함하는 것으로 의도된다. 예를 들어, “1.0 내지 10.0”의 범위는 인용된 최소 값 1.0 및 인용된 최대 값 10.0 사이(경계 값도 포함)의 모든 하위 범위들을 포함하는 것으로 의도되고, 즉, 예를 들어, 2.4 내지 7.6과 같이, 1.0보다 크거나 같은 최소 값 및 10.0보다 작거나 같은 최대 값을 포함하는 것으로 의도된다. 본 명세서에서 인용된 임의의 최대 수치 제한은 이에 포함된 모든 낮은 수치 제한들을 포함하는 것으로 의도되고, 본 명세서에서 인용된 임의의 최소 수치 제한은 이에 포함된 모든 높은 수치 제한들을 포함하는 것으로 의도된다. Any numerical range recited herein is intended to include the estimation of all sub-ranges of the same numerical precision within the recited range. For example, a range of “1.0 to 10.0” is intended to include all subranges between the recited minimum value of 1.0 and the recited maximum value of 10.0, ie, from 2.4 to 7.6. As such, it is intended to include a minimum value greater than or equal to 1.0 and a maximum value less than or equal to 10.0. Any maximum numerical limit recited herein is intended to include all lower numerical limits recited therein, and any minimum numerical limit recited herein is intended to include all high numerical limits recited therein.

일부 실시 예들에서, 본 개시의 방법들 및 시스템들의 다른 실시 예들의 하나 이상의 출력은 본 개시의 방법들 및 시스템들의 다른 실시 예들의 하나 이상의 출력에 관한 하나 이상의 출력 또는 정보를 표시하기 위한 디스플레이 장치와 결합된 또는 이를 포함하는 전자 장치로 전송될 수 있다. In some embodiments, one or more outputs of other embodiments of the methods and systems of the present disclosure may include a display device for displaying one or more outputs or information regarding one or more outputs of other embodiments of the methods and systems of the present disclosure. It may be transmitted to a combined or electronic device including the same.

본 명세서에서 기술된 본 개시의 실시 예들에 따른 전자 또는 전기 장치들 및/또는 임의의 다른 관련된 장치들 또는 요소들은 임의의 적합한 하드웨어, 펌웨어(예를 들어, ASIC; an application-specific integrated circuit), 소프트웨어, 또는 소프트웨어, 펌웨어, 및 하드웨어의 조합을 이용하여 구현될 수 있다. 예를 들어, 이러한 장치들의 다양한 요소들은 하나의 집적 회로(IC; integrated circuit) 칩 또는 분리된 IC 칩들 상에 형성될 수 있다. 또한, 이러한 장치들의 다양한 요소들은 유연한 인쇄 회로 필름(Flexible Printed Circuit Film), TCP(Tape Carrier Package), 인쇄 회로 기판(Printed Circuit Board; PCB) 위에 구현되거나 하나의 기판 위에서 형성될 수 있다. 또한, 이러한 장치들의 다양한 요소들은, 본 명세서에서 기술된 다양한 기능들을 수행하기 위해 다른 시스템 요소들과 상호 작용하고 컴퓨터 프로그램 명령어들을 실행하는, 하나 이상의 컴퓨팅 장치 내에서, 하나 이상의 프로세서에서 실행되는, 프로세서 또는 스레드(thread)일 수 있다. 컴퓨터 프로그램 명령어들은, 예를 들어, RAM(random access memory)과 같은, 표준 메모리 장치를 사용하는 컴퓨팅 장치에서 구현될 수 있는 메모리 내에 저장된다. 또한, 컴퓨터 프로그램 명령어들은, 예를 들어, CD-ROM, 플래시 드라이브(flash drive), 또는 이와 유사한 다른 비-일시적 컴퓨터 판독 가능 매체(non-transitory computer readable media)에 저장될 수 있다. 또한, 본 발명이 속한 기술 분야에서의 통상의 기술자는 본 개시의 예시적인 실시 예들의 기술적 사상을 벗어나지 않고 다양한 컴퓨팅 장치들의 기능은 단일 컴퓨팅 장치와 결합될 수 있거나 단일 컴퓨팅 장치에 집적될 수 있고, 또는 특정 컴퓨팅 장치의 기능은 하나 이상의 다른 컴퓨팅 장치에 걸쳐 분산될 수 있음을 인식해야 한다.Electronic or electrical devices and/or any other related devices or elements according to embodiments of the present disclosure described herein may be any suitable hardware, firmware (e.g., an application-specific integrated circuit), It can be implemented using software, or a combination of software, firmware, and hardware. For example, the various elements of these devices may be formed on one integrated circuit (IC) chip or separate IC chips. In addition, various elements of these devices may be implemented on a flexible printed circuit film, a tape carrier package (TCP), a printed circuit board (PCB), or formed on a single substrate. In addition, various elements of these devices may be executed on one or more processors, within one or more computing devices, that interact with other system elements and execute computer program instructions to perform the various functions described herein. Or it may be a thread. Computer program instructions are stored in memory, which can be implemented in a computing device using standard memory devices, such as, for example, random access memory (RAM). Further, the computer program instructions may be stored in, for example, a CD-ROM, a flash drive, or similar non-transitory computer readable media. In addition, those of ordinary skill in the art to which the present invention pertains are not departing from the technical spirit of exemplary embodiments of the present disclosure, functions of various computing devices may be combined with a single computing device or integrated into a single computing device, Alternatively, it should be appreciated that the functionality of a particular computing device may be distributed across one or more other computing devices.

Claims

In a method of performing adaptive shading of image frames by a Graphics Processing Unit (GPU):
Determining, by the GPU, a first shading rate based on determining that a change in a plurality of underlying assets between a first image frame and a second image frame exceeds a first threshold value;
Determining, by the GPU, a second shading rate based on determining that one or more viewports in the second image frame are similar to one or more viewports in the first image frame;
Determining, by the GPU, a third shading rate based on determining that a quality reduction filter is used; And
And selecting, by the GPU, a shading rate from among the first shading rate, the second shading rate, and the third shading rate for the first image frame.

The method of claim 1,
Rendering, by the GPU, the second image frame based on determining that the change in the plurality of underlying assets between the first image frame and the second image frame is less than the first threshold. And reusing the pixel data from the first image frame for processing.

The method of claim 2,
Re-using, by the GPU, the pixel data from the first image frame to render the second image frame by reprojecting the pixel data from the first image frame to the second image frame How to further include.

The method of claim 1,
The method wherein the change in the plurality of underlying assets between the first image frame and the second image frame is associated with a camera and a viewport.

The method of claim 1,
Rendering the second image frame based on determining, by the GPU, that a change between the one or more viewports in the second image frame and the one or more viewports in the first image frame is less than a second threshold Reusing the pixel data from the first image frame to do so; And
Re-projecting, by the GPU, the pixel data from the first image frame to the second image frame, thereby reusing the pixel data from the first image frame to render the second image frame How to.

The method of claim 1,
Determining that the quality reduction filter is to be used is:
Determining, by the GPU, a decrease in quality of the second image frame;
Determining, by the GPU, whether the output color of the second image frame corresponds to a weighted sum of pixel data from the first image frame; And
The third shading rate for the first image frame is determined by the GPU based on determining that the output color of the second image frame corresponds to the weighted sum of the pixel data from the first image frame. A method that involves determining.

The method of claim 6,
Determining that the output color of the second image frame corresponds to the weighted sum of the pixel data from the first image frame:
And determining, by the GPU, a higher order function of the second image frame.

The method of claim 7,
The proportional relationship between a plurality of shaded samples in the first image frame and a plurality of input values for the higher-order function is based on a workload of the GPU.

The method of claim 7,
The higher order function is:
Receiving the quality reduction filter as input, and
A method configured to determine the third shading rate for the first image frame.

The method of claim 6,
The weighted sum comprises the weighted sum of the pixel data of the first image frame over a plurality of input image frames exemplifying the first image frame.