WO2022106016A1

WO2022106016A1 - High-order texture filtering

Info

Publication number: WO2022106016A1
Application number: PCT/EP2020/082790
Authority: WO
Inventors: Baoquan Liu
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2022-05-27
Also published as: CN115917606A

Abstract

A graphics processing device (800) for generating an image signal representing an image having a plurality of pixels, wherein the graphics processing device is configured to generate the image signal by performing the following operations for each of the plurality of pixels: determine (701) a plurality of texels corresponding to the respective pixel, each texel comprising a sub-pixel data point indicating a texture sample value in a texture space; and apply (702) a texture filtering function to the sub-pixel data points of the plurality of texels in dependence on a selected sampling location x, y to form a filtered value, where x and y are the fractional positions of a texture coordinate; wherein applying the texture filtering function comprises taking the difference between (i) an interpolation among the texels for the location x, y performed by dedicated hardware logic and at least (ii) a weighted sum of second-order derivative approximations among the texels in two orthogonal directions. The device may allow for rendering of a high-quality image with texture at higher performance with fewer texture fetches and weighted filtering calculations on a mobile GPU than conventional methods.

Description

HIGH-ORDER TEXTURE FILTERING

FIELD OF THE INVENTION

This invention relates to texture filtering in images, for example in video games.

BACKGROUND

The goal of image filtering is, given an input image A, to create a new image B. The transformation operation from source A to target B is via an image filter.

Rendering 3D models with texture is very important for video games. Texture can provide more details than geometry and can therefore increase the realism of a rendered image. Almost all 3D games need to access textures and perform texture filtering on the Graphics Processing Unit (GPU), for example via the texture unit hardware module or shader code, to determine the color for a texture mapped pixel, by filtering the colors of nearby texels.

GPU hardware can provide fast, filtered access to textures, but generally only for a few restrictive types of texture filtering methods. Both OpenGL and Direct3D provide two very simple types of texture filtering: nearest-neighbor sampling and linear filtering, corresponding to zeroth and first-order filter schemes respectively. Both types are natively supported by all GPUs. Other commonly used filtering methods on modern GPUs include bilinear filtering and bicubic filtering (supported by the Vulkan API: VK_IMG_filter_cubic, G L_l M G_text u re_f i Ite r_cu b i c) .

Nearest-neighbor filtering is the simplest filtering algorithm, which fetches one single nearest texel sample from a texture image for any target sampling location. This requires one texture fetch and no weighted calculations. However, as shown at 101 in Figure 1 , the resulting image quality is not good, with jagged, unrealistic features which are not life-like.

Bilinear filtering can achieve a smoother transition across different texels and is a simple texture filter method which needs four taps of texel samples at integer texel locations. A 2 x 2 grid of neighboring texels have to be fetched and also a weighted sum is calculated by using sub-texel coordinate offset as weights to generate a smooth transition. Figure 2(a) shows an example of the result of bilinear filtering. Compared to nearest-neighbor filtering, the image quality is improved. However, images may still look unrealistic with jagged edges and zigzag aliasing. Figure 2(b) shows the 2 x 2 sampling pattern (sampling a 2 x 2 grid of texels surrounding the target UV coordinate).

A filter familiar to most users of imaging programs with "high-quality" resizing is commonly called the cubic filter. When applied in both x and y directions, the result is referred to as bicubic texture filtering. A typical cubic filter kernel function, which is used to calculate the polynomial weights, is shown in Figure 3. The y value of this function shows the relative weight that should be assigned to the texels that are distant from the center of a given texture sampling coordinate x. Texels more than two texels from that center are ignored due to their zero weight value, while texels at the center are given the largest weight. Note that for this particular filter, some weights may be negative.

In 1 D, the cubic filtering algorithm can be expressed as:

where fi is the indexed neighboring texel values at four taps of integer sampling locations, which are multiplied by the corresponding cubic polynomial weights w,(x) from the convolution kernel. The weighted sum is the final result of the filtering. This weighted sum-based filtering algorithm is illustrated in Figure 4 for cubic interpolation.

Bicubic filtering is a 2D extension of the 1 D cubic filtering for interpolating data points on a two-dimensional regular grid. The 2D interpolation function is a separable extension of the 1 D interpolation function. The 2D interpolation can be accomplished by two 1D interpolation with respect to each coordinate direction. The bicubic interpolated result is smoother than the corresponding results obtained by bilinear interpolation or nearest-neighbor interpolation.

Keys, R.G., ‘Cubic convolution interpolation for digital image processing’, IEEE Trans. Acoust. Speech Signal Process., 1981 , ASSP-29, (6), pp. 1153-1160 describes the cubic convolution filtering equation with third-order approximation accuracy. The equation involves a convolution kernel with cubic polynomials which has spatial support of four taps of texels. The third-order convolution kernel is as below:

Regarding the quality of the final rendering images, Keys’ cubic convolution interpolation method performs much better than linear interpolation and has become a standard in the image interpolation field. However, regarding rendering speed, it is complex and expensive to calculate at runtime for each rendering pixel.

Figures 5(a)-5(c) illustrate the rendering result of the above-described filtering methods. Figure 5(a) shows the result for nearest-neighbor filtering, Figure 5(b) for bilinear filtering and Figure 5(c) for bicubic filtering.

Higher-order filtering modes (such as third-order or fourth-order functions) often lead to superior image quality. Moreover, higher-order schemes are necessary to compute continuous derivatives of texture data. In some 3D applications, high-quality texture filtering is crucial. Images resampled with bicubic interpolation are smoother and have fewer interpolation artifacts, while bilinear filtering often produces diamond artifacts or aliasing, because the first-order derivative of the bilinear function is not continuous. Resampled images obtained by simple methods such as nearest-neighbour sampling and bilinear filtering typically incur common artefacts, such as blurring, blocking and ringing, especially in the edge regions. The bicubic algorithm is frequently used for scaling images or videos for display. It preserves fine detail better than the common bilinear algorithm, due to its high-order continuous derivatives.

However, third-order bicubic filtering is very demanding for mobile GPUs, due to the very high requirements of bandwidth and computing cost. In particular, it requires 16 texture fetching instructions and it requires computation of the dynamic cubic weights via a third-order polynomial and 4 x 4 weighted sum calculations along x and y. Altogether, it involves 32 multiplications (16+4+4+8) and many more arithmetic operations.

The bicubic filtering method described in US 10102181 B2 requires 4 x 4 data sampling and complex weighted calculations. The method described in US 7106326 B2 can perform filtering operations (such as linear, bilinear, trilinear, cubic or bicubic filtering) on the data values of a neighbourhood (Np x Np ), where Np x Np is the size of the neighbourhood in texels. However, the method requires Np x Np data sampling taps and very complex weighted filtering calculations based on the multiple samples and dynamic weights. Therefore, such higher- order filtering methods generally require a much greater computational cost and texturememory bandwidth than simpler filtering methods.

Thus, rendering high-quality texture-mapped 3D objects using high-order texture filtering algorithms is very demanding, especially for modern mobile GPUs, which have limited compute power and memory bandwidth. At the same time, mobile devices require realtime rendering performance (higher frame-rates, lower latency), long battery life (low power consumption) and low heat dissipation.

In 2005, Sigg and Hadwiger (see book chapter in GPU gems2: Fast Third-Order Texture Filtering, https://developer.nvidia.com/gpugems/gpugems2/part-iii-high-quality- rendering/chapter-20-fast-third-order-texture-filtering) developed an efficient evaluation of bicubic B-spline filtering on the GPU, which can reduce texture fetches from sixteen to only four times. As this approach drastically reduces the number of texture fetches, which is the bottleneck in GPU implementations in general, a significant speed-up can be achieved. However, this method can only support filter kernels with positive weights. It is not trivial to adapt this method to filter kernels that also take negative weight values, which is common for cubic filtering kernels, as defined by Khronos in Vulkan API (VK_FILTER_CUBIC_EXT and VK_IMG_filter_cubic), https://www.khronos.Org/registry/vulkan/specs/1 .2- extensions/man/html/VkFilter.html. This requires a pre-processing pass at runtime, which is not suitable when filtering dynamic textures that are very common in game rendering, such as shadow texture or other FBO textures.

In summary, some key technical problems of the high-order texture filtering algorithm for high- quality rendering are that high-order algorithms require many texture data fetches, requiring many texture fetching instructions which can be a memory bandwidth burden for the GPU. High-order filtering calculations also require many arithmetic operations (with corresponding ALU instructions) for each single texture filtering operation. This could be a dilemma for high- quality texture filtering on a mobile GPU.

It is desirable to develop a method with lower computational cost to produce high-quality texture filtering results with third-order or fourth-order approximation accuracy. SUMMARY OF THE INVENTION

According to a first aspect there is provided a graphics processing device for generating an image signal representing an image having a plurality of pixels, wherein the graphics processing device is configured to generate the image signal by performing the following operations for each of the plurality of pixels: determine a plurality of texels corresponding to the respective pixel, each texel comprising a sub-pixel data point indicating a texture sample value in a texture space; and apply a texture filtering function to the sub-pixel data points of the plurality of texels in dependence on a selected sampling location x, y to form a filtered value, where x and y are the fractional positions of a texture coordinate; wherein applying the texture filtering function comprises taking the difference between (i) an interpolation among the texels for the location x, y performed by dedicated hardware logic and at least (ii) a weighted sum of second-order derivative approximations among the texels in two orthogonal directions.

The device may allow for rendering of a high-quality image with texture at higher performance with fewer texture fetches and weighted filtering calculations than conventional methods. This may allow the application of a filtering function having at least third-order approximation accuracy.

Applying the texture filtering function may comprise taking the difference between (i) an interpolation among the texels for the location x, y and (ii) a weighted sum of second-order derivative approximations among the texels in the two orthogonal directions and a weighted sum of third-order derivative approximations among the texels in the two orthogonal directions. This may allow the application of a filtering function having fourth-order approximation accuracy.

The second-order and/or third-order derivative approximations among the texels in the two orthogonal directions may be weighted in dependence on the fractional position of the location x, y between two consecutive integer locations. For example, the derivative approximations may be weighted in dependence on parameters a and p, as described herein.

The second-order and/or third-order derivatives may be calculated using the dedicated hardware logic. The second-order and/or third-order derivatives may be calculated in dependence on the interpolation among the texels. The interpolation may be determined using a bilinear interpolation function. The bilinear interpolation function fbilin(x,y) is supported by GPU hardware, and so may be easily determined.

The device may comprise a texture cache configured to store the bilinear interpolation function at the location x, y. This may further reduce the processing required.

The two orthogonal directions may be directions in a texture space. This may allow rendering to a texture.

The filtering function may be determined according to a third-order or fourth-order approximation. This may produce high-quality images.

The device may be configured to implement the texture filtering function in one of the GLSL, HLSL and Spir-V languages. The device is therefore compatible with filtering algorithms and languages used in many modern image processing systems and video games.

The device may be configured to implement the texture filtering function in a single instruction, or fixed function hardware unit. This may reduce the processing cost required to render the image.

For each pixel, the device may be configured to perform fewer than sixteen texture fetches. Thus, the device may perform fewer texture fetches than for traditional bicubic rendering.

For each pixel, the device may be configured to perform five texture fetches or nine texture fetches. Five texture fetches may allow for third-order approximation accuracy (bicubic filtering) and nine texture fetches may allow for fourth-order approximation accuracy. Fewer texture fetches for the GPU can result in longer battery life, reduced latency and improved frame-rate for complex and demanding game rendering.

At least some of the pixels may represent a shadow of an object in said image and the texture filtering function may be a shadow filtering function. The filtered value may be a filtered shadow value. This may allow for faster shadow filtering in 2D image applications to produce a smooth image quality result for soft shadows. The device may be implemented by a mobile graphics processing unit. This may allow a mobile device to efficiently perform texture rendering.

According to a second aspect there is provided a method for generating an image signal representing an image having a plurality of pixels, wherein the method comprises, for each of the plurality of pixels: determining a plurality of texels corresponding to the respective pixel, each texel comprising a sub-pixel data point indicating a texture sample value in an texture space; and applying a texture filtering function to the sub-pixel data points in dependence on a selected sampling location x, y to form a filtered value, where x and y are the fractional positions of a texture coordinate; wherein applying the texture filtering function comprises taking the difference between (i) an interpolation among the texels for the location x, y performed by dedicated hardware logic and (ii) a weighted sum of second-order derivative approximations among the texels in two orthogonal directions.

The method may allow for rendering of a high-quality image with texture at higher performance with fewer texture fetches and weighted filtering calculations than conventional methods. This may allow the application of a filtering function having at least third-order approximation accuracy.

According to a third aspect there is provided a computer program which, when executed by a computer, causes the computer to perform the method described above. The computer program may be provided on a non-transitory computer readable storage medium.

BRIEF DESCRIPTION OF THE FIGURES

The present invention will now be described by way of example with reference to the accompanying drawings.

In the drawings:

Figure 1 illustrates the rendering result of nearest-neighbor filtering for a shadow texture.

Figure 2a shows the rendering result when 2 x 2 bilinear filtering is used to produce a soft shadow.

Figure 2b illustrates sampling a 2 x 2 grid of texels surrounding the target UV coordinate. Figure 3 shows a typical cubic filter kernel function.

Figure 4 illustrates the weighted sum based filtering algorithm for cubic interpolation.

Figures 5(a)-5(c) illustrate the rendering result of the traditional filtering methods. Figure 5(a) shows the result for nearest-neighbor, Figure 5(b) for bilinear filtering, and Figure 5(c) for bicubic filtering.

Figures 6(a)-6(b) schematically illustrate 1 D linear and cubic filtering at position x by calculating the weighted sum of the neighbouring texel-values (at integer locations) where the weights are polynomials based on the sub-texel offsets.

Figure 7 illustrates an example of a method for generating an image signal representing an image having a plurality of pixels.

Figure 8 shows an example of a device configured to implement the method described herein.

DETAILED DESCRIPTION OF THE INVENTION

Described herein is a graphics processing device for generating an image signal representing an image having a plurality of pixels. The graphics processing device may be a graphics processor, such as a mobile GPU. The graphics processing device is configured to generate the image signal by performing the operations described herein for each of a plurality of pixels of the image. The plurality of pixels may comprise a subset of all pixels of the image, or the method may be performed for every pixel in the image.

The image signal may be used, for example, for rendering high-quality soft shadows or other textures. Herein, rendering refers to any form of generating a visible image, for example displaying the image on a computer screen, printing, or projecting.

The device and method can implement high-order texture filtering algorithms (for example, third-order and fourth-order) on a GPU to produce a high-quality rendering image, at the same time requiring very low cost of computation and texture-memory bandwidth than other prior art implementations. The solution described herein uses fewer weighted sum calculations on the GPU than conventional methods. The solution involves fewer texture sampling instructions and fewer ALU calculations than previous methods. This may allow for faster high-order texture filtering with third-order and fourth-order approximation accuracy.

As will be described in more detail below, the device applies a texture filtering operation at a sampling location x, y in a 2D image (where x and y are the fractional positions of a texture coordinate) comprising taking the difference between (i) an interpolation among the texels for the location x, y performed by dedicated hardware logic and at least (ii) a weighted sum of second-order derivative approximations among the texels in two orthogonal directions.

High-order texture filtering algorithms usually need to perform texture sampling in a N x N neighboring area from a texture data in DDR to get N x N texel data samples at integer texel location, and then perform a filtering operation on these N x N data samples to get a high- quality filtered value to shade the final pixel in the pixel shader.

In 1 D the cubic filtering algorithm can be expressed as the following weighted sum equation:

where fi is the indexed neighboring texel values at four taps of integer sampling locations, which are multiplied by the corresponding cubic polynomial weights Wi(x) from the convolution kernel. The weighted sum is the final result of the filtering.

The third-order filtering equation used by the device and method described herein will first be discussed. For the sake of clarity, the third-order filtering method is derived first in 1 D, and then extended to 2D. In these implementations, the directions x and y are directions in a texture space.

Without a loss of generality, assume that the samples of a continuous function f(t) are known only at integer texel locations, but at other arbitrary sampling locations x, the function value f(x) needs to be approximately reconstructed from these discrete texel-values by calculating the weighted sum of these discrete samples. The analysis below is based on the Taylor series expansion of the continuous function f(t). f(t) denotes a continuous function (the signal) which is sampled into the discrete texels f(i) = fi , where i is an integer. The values fi represent the samples of the original function f(t) at the integer location /. In computer imaging, the continuous function f(t) is not known except those texel values fi at the discrete integer grid locations.

The general 1 D linear i nterpolation/fi Itering is a method for estimating a specific function value f(x) at an arbitrary continuous sampling point x by calculating the weighted sum of two taps of the known functional values f(i) and f(i+1) at two integer grid locations (i and i+1), where x = i + α, α ∈[0, 1) , i.e., 1> α≥ 0, and i ∈Z being the integer and fractional parts of x, respectively. Here fi and. f_i+1 are the two discrete function values at integer grid location i and i+1 .

The general 1 D cubic interpolation/f ilteri ng is a method for estimating a specific function value f(x) at an arbitrary continuous sampling point x by calculating the weighted sum of four taps of known functional values f(i) , f(i+1) and f(i+2) at four integer grid locations (from i-1 to i+2), where x = / + α, α ∈[0, 1) , i.e., 1> a > 0, and i ∈Z being the integer and fractional parts of x, respectively.

Piecewise linear and cubic function reconstructions are shown in Figure 6(a) and Figure 6(b) respectively.

The 1 D linear and cubic filtering function at position x is therefore determined by calculating the weighted sum of the neighbouring texel-values (at integer locations), where the weights are polynomials based on the sub-texel offsets.

In mathematics, the Taylor series of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. Assuming that the first three derivatives of f(t) exist at t=x, f(t) can be expanded as a Taylor series at t=x. According to Keys (Keys, 1981), the cubic interpolation function agrees with the Taylor series expansion of the image function being interpolated, and all conditions can be satisfied for image filtering.

If f(t) has at least three continuous derivatives at t=x in the interval [i, i+1] then according to Taylor’s theorem (see Keys, 1981), the third-order Taylor series approximation of the real- valued function f(t) at a specific real-valued sampling location x is given by:

where

is the error term (remainder or residual ) which goes to zero at a rate proportional to

This is because

So, this Taylor series expansion is a third-order approximation for function f(t).

Therefore, at the integer sampling location t= /, the third-order approximation for is as below:

Similarly, at the next integer sampling location t= i+1, the third-order approximation for f_i+1 is as below:

f_lin(x) is defined as the GPU’s hardware linear interpolation function at sampling point x between the interval [i, i+1], where x = i + α, α ∈[0, 1), given by:

where

and where

Adding these two terms together gives:

,herefore, in 1 D, the third-order cubic interpolation/filtering function may be expressed as:

where second-order central difference (see https://en.wikipedia.org/wiki/Finite_difference) can be used to calculate f" (x) as below:

For h=1, the second-order derivative of f(x) at sampling location x is given by:

Extending the above equations to 2D, the bicubic interpolation/filtering function (i.e., a third- order approximation of the original function) may be approximated as below:

where the second-order derivatives can be calculated using second-order central difference as below:

where x = i + α, y =j + β, for i,j ∈Z being the integer parts, and α, β ∈[0, 1) being the fractional parts of x, y between the two consecutive integer locations. fbiiin(x_’y) is the hardware bilinear interpolation function at an arbitrary 2D sampling point (x,y), which is already supported by GPU hardware.

From this derivation, it can be seen that the derived 2D third-order filtering equation is much simpler than the original bicubic filtering equation (Keys, 1981), which involves cubic polynomial weighted sum calculations and 16 texture fetches, while the 2D Equation (13) above involves only five texture fetches and much simpler arithmetic calculation without any cubic polynomial weighting calculations. Only seven multiplications are needed, instead of 32 using the former method. This may result in increased rendering performance and reduced power consumption for mobile GPUs.

Experimental results have shown that the method can produce the same good quality texture filtering results as the original bicubic filtering equation (Keys, 1981), but the rendering performance of the former is approximately 91% faster than the latter for rendering frame-rate when tested on exactly the same testing platform.

The filtering equation for fourth-order approximation accuracy of texture filtering will now be described.

The convergence rate of the cubic convolution interpolation function (see Keys, 1981) is 0(h3), which yields a third-order approximation of the original function f(t). Therefore, any interpolation function whose interpolation kernel satisfies the conditions outlined by Keys (first part of Keys, R.G., ‘Cubic convolution interpolation for digital image processing’, IEEE Trans. Acoust. Speech Signal Process., 1981 , ASSP-29, (6), pp. 1153-1160) will have at most a third-order convergence rate. Nevertheless, interpolation functions with fourth-order convergence rates are possible and can be derived. Keys also derived a fourth-order approximation function (with remainder term 0(h4) ) at the cost of a much larger spatial filter kernel support, i.e., this fourth-order convolution kernel is composed of piecewise polynomials defined on the unit subintervals between [-3, 3], which means that it needs to fetch six taps of texels to perform the weighted-sum calculation of all six taps of texels in 1 D. This is on the contrary to the third-order approximation kernel defined on the unit subintervals between [-2, 2] with only four taps of texels, as the third-order function approximation described above.

In 1D, the fourth-order filtering algorithm can be expressed as the following weighted sum equation:

where fi is the indexed neighboring texel values at six taps of integer sampling locations, which are multiplied by the corresponding polynomial weights w_i(x) from the convolution kernel. The weighted sum is the final result of the filtering.

In 2D, this convolution kernel will involve 6x6=36 texture fetches, and also involve the complex weighted-sum calculation using polynomials on all of the 36 taps of texels, which will introduce a very high cost of memory bandwidth and many weighted sum calculations for a mobile GPU.

The derivation below introduces a simplified equation for texture filtering with fourth-order approximation accuracy, which is much cheaper than the original fourth-order equation introduced by Keys (i.e., Equation (16)).

If function f (t) has at least four continuous derivatives at location x then according to Taylor’s theorem, the fourth-order approximation of the real-valued function f(t) at a real-valued sampling location x in the interval [i, i+1] is given by :

where 0( (t - x)⁴) is the fourth-order error term (remainder or residual ) which goes to zero at a rate proportional to (t - x)⁴.

Now, the fourth-order approximation will be derived at two integer sampling locations: t= i and t= /+1.

Firstly, at the integer sampling location ti /, we have fourth-order approximation for f_i as below. Note that since

Secondly, at the next integer sampling location t= i+1, the fourth-order approximation for f_i+1 is given as below. Note that fa-bj ^Λ3 =a^Λ3 - 3a^Λ2b + 3ab^Λ2 - b^Λ3 and note that since x = i +α, 0 ≤ α <1, i+1-x = 1-α.

Since f_iin(x) is the hardware linear Interpolation function at sampling point x between [fi f i+ 1], we have:

where

Adding these two terms together gives:

Therefore, the fourth-order interpolation equation in 1 D is given by:

Where second-order central difference (see https://en.wikipedia.org/wiki/Finite_difference) can be used to calculate f" (x) as :

Using h=1, the second-order derivative of f(x) at sampling location x is given by:

Similarly, the third-order derivative (see https://en.wikipedia.org/wiki/Finite_difference_coefficient) can be calculated as below:

where h_x represents a uniform grid spacing between each integer finite difference interval. Using h=1, the third-order derivative of f(t) at sampling location x is given by:

Extending the above 1 D equation to 2D, the fourth-order interpolation equation is given by:

where

and x = i + α, y = j + β, for i, j ∈ Z being the integer parts, and α, β ∈ [0, 1) being the fractional parts of x, y. f_bilin(x,y) is the hardware bilinear Interpolation function at an arbitrary 2D sampling point (x,y), which is supported by GPU hardware.

From this derivation, it can be seen that this 2D fourth-order filtering equation is much simpler than the original fourth-order filtering equation (Keys, 1981) which involves a complex polynomial weighted sum calculation with 36 texture fetches, while the function shows in Equation (29) above involves only nine texture fetches and much simpler arithmetic calculation. This can result in increased rendering performance and reduced power consumption for a mobile GPU.

Experimental results have shown that the fourth-order filtering function above can produce the same good quality texture filtering results as the original fourth-order filtering equation (Keys, 1981), but the rendering performance of the former is much faster than the latter in rendering frame-rate when testing on exactly the same testing platform.

Figure 7 summarises a method for rendering an image having a plurality of pixels, wherein the method comprises, for each of at least some of the plurality of pixels. The method comprises the following steps for each of at least one of the plurality of pixels. At step 701 , the method comprises determining a plurality of texels corresponding to the respective pixel, each texel comprising a sub-pixel data point indicating a texture sample value in a texture space. At step 702, the method comprises applying a texture filtering function to the sub-pixel data points of the plurality of texels in dependence on a selected sampling location x, y to form a filtered value, where x and y are the fractional positions of a texture coordinate; wherein applying the texture filtering function comprises taking the difference between (i) an interpolation among the texels for the location x, y performed by dedicated hardware logic and at least (ii) a weighted sum of second-order derivative approximations among the texels in two orthogonal directions.

For fourth-order approximation accuracy, applying the texture filtering function comprises taking the difference between (i) an interpolation among the texels for the location x, y and (ii) a weighted sum of second-order derivative approximations among the texels in the two orthogonal directions and a weighted sum of third-order derivative approximations among the texels in the two orthogonal directions.

As described above, the second-order and/or third-order derivative approximations among the texels in two orthogonal directions are weighted in dependence on the fractional position of the location x, y between two consecutive integer locations (in dependence on a and β, where x = i + α, y = j + β). The second-order and/or third-order derivatives are calculated using the dedicated hardware logic. The second-order and/or third-order derivatives are calculated in dependence on the interpolation among the texels, which is determined using a bilinear interpolation function f_bilin(x,y). f_bilin(x,y) ^is supported by GPU hardware, and so may be easily determined.

In some embodiments, the device may comprise a texture cache configured to store the bilinear interpolation function at the location x, y. As this value is re-used in the equations, this may reduce the computation required further. In some implementations, the texture cache may be exploited to store processed sampling results to be re-used by neighbouring pixels, in order to further accelerate the computation.

For shadow filtering applications, at least some of the pixels in the image can represent a shadow of an object in said image. In these cases, the texture filtering function may be a shadow filtering function. Without any filtering, a data sample’s value from a shadow map texture is either 1.0 or 0.0. As a result, the rendered shadow in a final image shows strong aliasing. This is because if a data sample’s value is equal to 1.0, it means that the pixel is completely outside of the shadow, while if it is 0.0, it means that the pixel is completely in shadow. After filtering, a data sample’s value could be a floating point, which lies between 1 .0 and 0.0. This is to achieve continuous transitions and thus the pixel can be rendered as a soft shadow in the final image. The functions for high-order texture filtering (both third- and fourth-order) described herein (with preferred 2D examples as defined in Equations (13) and (29)) involve fewer texture fetches and much simpler weighted sum calculation, especially for 2D texture filtering. These functions require fewer GPU instructions, in term of both texture fetching instructions and ALU instructions, than conventional methods.

Using the method described herein, the device can be configured to perform fewer texture fetches than the original equations found in Keys (Keys, 1981). For example, for each pixel, the device can be configured to perform five texture fetches for the third-order (bicubic) approximation and nine texture fetches for the fourth-order approximation.

Fewer GPU instructions can result in longer mobile battery life, reduced latency and improved frame-rate for complex and demanding game rendering.

The solution may be implemented by the GPU software using shader code. The simplified fucntions can be implemented by a few lines of GPU shader code using shading languages such as GLSL, HLSL, or Spir-V.

Alternatively, by modifying the texture unit module of the GPU hardware, the filtering functions can be implemented using fixed function hardware via a single GPU instruction, instead of multiple lines of shader code. For example, one ISA intrinsics call can be used to complete 2D high-order texture filtering in a pixel shader.

The method described herein may therefore allow for faster and cheaper high-order texture filtering.

Figure 8 is a schematic representation of a device 800 configured to perform the methods described herein. The device 800 may be implemented on a device, such as a laptop, tablet, smart phone, TV or any other device in which graphics data is to be processed.

The device 800 comprises a graphics processor 801 configured to process data. For example, the processor 801 may be a GPU. Alternatively, the processor 801 may be implemented as a computer program running on a programmable device such as a GPU or a Central Processing Unit (CPU). The device 800 comprises a memory 802 which is arranged to communicate with the graphics processor 801 . Memory 802 may be a non-volatile memory. The graphics processor 801 may also comprise a cache (not shown in Figure 8), which may be used to temporarily store data from memory 802. The device may comprise more than one processor and more than one memory. The memory may store data that is executable by the processor. The processor may be configured to operate in accordance with a computer program stored in non-transitory form on a machine readable storage medium. The computer program may store instructions for causing the processor to perform its methods in the manner described herein.

The device may allow for rendering a high-quality image with texture at higher performance with fewer texture fetches and weighted filtering calculations on a mobile GPU than conventional methods.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

Claims

1. A graphics processing device (800) for generating an image signal representing an image having a plurality of pixels, wherein the graphics processing device is configured to generate the image signal by performing the following operations for each of the plurality of pixels: determine (701) a plurality of texels corresponding to the respective pixel, each texel comprising a sub-pixel data point indicating a texture sample value in a texture space; and apply (702) a texture filtering function to the sub-pixel data points of the plurality of texels in dependence on a selected sampling location x, y to form a filtered value, where x and y are the fractional positions of a texture coordinate; wherein applying the texture filtering function comprises taking the difference between (i) an interpolation among the texels for the location x, y performed by dedicated hardware logic and at least (ii) a weighted sum of second-order derivative approximations among the texels in two orthogonal directions.

2. The device (800) of claim 1 , wherein applying the texture filtering function comprises taking the difference between (i) an interpolation among the texels for the location x, y and (ii) a weighted sum of second-order derivative approximations among the texels in the two orthogonal directions and a weighted sum of third-order derivative approximations among the texels in the two orthogonal directions.

3. The device (800) of claim 1 or claim 2, wherein the second-order and/or third-order derivative approximations among the texels in the two orthogonal directions are weighted in dependence on the fractional position of the location x, y between two consecutive integer locations.

4. The device (800) of any preceding claim, wherein the second-order and/or third-order derivatives are calculated using the dedicated hardware logic.

5. The device (800) of any preceding claim, wherein the second-order and/or third-order derivatives are calculated in dependence on the interpolation among the texels.

6. The device (800) of any preceding claim, wherein the interpolation is determined using a bilinear interpolation function.

7. The device (800) of claim 6, wherein the device comprises a texture cache configured to store the bilinear interpolation function at the location x, y.

8. The device (800) of any preceding claim, wherein the two orthogonal directions are directions in a texture space.

9. The device (800) of any preceding claim, wherein the filtering function is determined according to a third-order or fourth-order approximation.

10. The device (800) of any preceding claim, wherein the device is configured to implement the texture filtering function in one of the GLSL, HLSL and Spir-V languages.

11. The device (800) of any preceding claim, wherein the device is configured to implement the texture filtering function in a single instruction, or fixed function hardware unit.

12. The device (800) of any preceding claim, wherein, for each pixel, the device is configured to perform fewer than sixteen texture fetches.

13. The device (800) of claim 12, wherein, for each pixel, the device is configured to perform five texture fetches or nine texture fetches.

14. The device (800) of any preceding claim, wherein at least some of the pixels represent a shadow of an object in said image and wherein the texture filtering function is a shadow filtering function.

15. The device (800) of any preceding claim, wherein the device is implemented by a mobile graphics processing unit.

16. A method (700) for generating an image signal representing an image having a plurality of pixels, wherein the method comprises, for each of the plurality of pixels: determining (701) a plurality of texels corresponding to the respective pixel, each texel comprising a sub-pixel data point indicating a texture sample value in an texture space; and applying (702) a texture filtering function to the sub-pixel data points in dependence on a selected sampling location x, y to form a filtered value, where x and y are the fractional positions of a texture coordinate; wherein applying the texture filtering function comprises taking the difference between (i) an interpolation among the texels for the location x, y performed by dedicated hardware logic and (ii) a weighted sum of second-order derivative approximations among the texels in two orthogonal directions.

17. A computer program which, when executed by a computer, causes the computer to perform the method of claim 16.