WO2020249026A1 - 一种基于双Buffer架构下的纹理贴图硬件加速器 - Google Patents

一种基于双Buffer架构下的纹理贴图硬件加速器 Download PDF

Info

Publication number
WO2020249026A1
WO2020249026A1 PCT/CN2020/095464 CN2020095464W WO2020249026A1 WO 2020249026 A1 WO2020249026 A1 WO 2020249026A1 CN 2020095464 W CN2020095464 W CN 2020095464W WO 2020249026 A1 WO2020249026 A1 WO 2020249026A1
Authority
WO
WIPO (PCT)
Prior art keywords
ratio
inte
data
mode
coordinate
Prior art date
Application number
PCT/CN2020/095464
Other languages
English (en)
French (fr)
Inventor
吴兴涛
王磊
Original Assignee
华夏芯(北京)通用处理器技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华夏芯(北京)通用处理器技术有限公司 filed Critical 华夏芯(北京)通用处理器技术有限公司
Priority to JP2021573726A priority Critical patent/JP7227404B2/ja
Priority to KR1020227000824A priority patent/KR20220019791A/ko
Priority to US17/617,596 priority patent/US20220327759A1/en
Priority to EP20823324.7A priority patent/EP3982255A4/en
Publication of WO2020249026A1 publication Critical patent/WO2020249026A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/40Filling a planar surface by adding surface attributes, e.g. colour or texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/28Indexing scheme for image data processing or generation, in general involving image processing hardware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/36Level of detail

Definitions

  • the present disclosure relates to the technical field of GPU chip design, in particular to a texture mapping hardware accelerator based on a dual Buffer architecture.
  • Texture mapping operations are widely used in GPUs, not only as the computing unit in the general computing field of GPGPU, but also as the executor of the fetch and sample of texture data in the graphics rendering pipeline. Therefore, the performance of the texture mapping unit directly affects the internal execution efficiency of the graphics processor, and directly affects the speed of data lookup and transfer in the field of general computing. Therefore, the design of an efficient texture mapping unit is particularly critical in GPU design.
  • the purpose of the present disclosure is to provide a texture mapping hardware accelerator based on the dual-buffer architecture to solve the problem of poor internal execution efficiency of the graphics processor proposed in the background art, which directly affects the speed of data lookup and transfer in the field of general computing. .
  • a hardware accelerator for texture mapping based on a dual-buffer architecture including:
  • Image U0 unit Store the basic information of the image.
  • the target and different map layers are used as addresses to store the mode, width, height, depth, border, inte_format, format, type, and base of the corresponding image;
  • the target and different layer layers are used as addresses to store the mode, width, height, depth, border, inte_format, format, type, and base values of the corresponding layer.
  • cubemap one of the mipmap layer is stored The address is subdivided into 6, representing different face information of 0,1,2,3,4,5.
  • the mode, width, height, depth, and border of the different layer layers , Inte_format, format, type are the same, but the base is different; when the layer is enabled and the map layer is enabled, the mode, width, height, depth, border, inte_format, format, type are the same, but the base is different; support 1D, 2D, 3D, rectangle , Cubemap, 1D_ARRAY, 2D_ARRAY, cubemap_array, 2D_multisample, 2D_multisample, 2D_multisample_array mode register configuration;
  • LOD U1 unit complete the calculation of the level value in different filtering modes, and combine the access to the target address to get the address of the access image unit; before calculating the level value, first need to read the image unit through the target and base_level values of level0 to obtain the basic image The information is used as a reference for subsequent level calculation, and then the calculation of the level value considers two cases: when lod is enabled, if the image is in layer mode, at this time, the width and height information of different layers are equal, regardless of whether the filtering mode is mag_filter or min_filter , Are rounded to the nearest base_level direction, the level value is level0 to read the offset of the image information, and the filter_type size matches the requested filter size; when lod is enabled, if the image is in mipmap mode, at this time, different layers Width, height, and depth are not equal, consider that mag_filter is rounded in near and linear mode, and the offset closest to the base_level value is read.
  • lod value is min_lod, level0 and level1 are the same at this time, so take the fiter_type as near_mipmap_near and linear_mipmap_near filtering respectively; the same applies when partial derivatives are enabled
  • the lod value is min_lod, level0 and level1 are the same at this time, so take the fiter_type as near_mipmap_near and linear_mipmap_near filtering respectively; the same applies when partial derivatives are enabled
  • the image is in layer mode, at this time, the width and height information of different layers are equal, no matter whether the filter mode is mag_filter or min_filter, round to the nearest base
  • a level value in the _level direction is the offset for reading the image information from level0, and the filter_type size matches the requested filter size; if the image is in mipmap mode, the width, height, and depth of different layers are not equal, consider that the mag_filter is
  • Level0 plus 1 is level1. If the lod value is min_lod, then level0 is the same as level1, so take the fiter_type as near_mipmap_near and linear_mipmap_near filter respectively. If level0 and level1 are enabled, it is considered as a trilinear filtering method, with the following filtering methods: trilinear isotropic (near_mipmap_linear, linear_mipmap_linear), trilinear anisotropic; if only level 0 is valid, the following filtering methods are available: point isotropic (near, near_mipmap_near), bilinear isotropic(linear, linear_mipmap_near), bilinear anisotropic;
  • CoordinateU2 unit complete the coordinate and address conversion of s, t, r, q in fetch and sampler mode; when cubemap_array is enabled, the Q coordinate at this time is not 0, indicating the layer row number, s, t, r respectively Represents the size in the x, y, and z directions, and the s and t coordinates in the plane coordinates are obtained through the mapping relationship; when the rectangle mode is enabled, the s and t coordinates do not need to be denormalized; if s, t , R coordinates exceed their respective representation ranges, and different wrap modes are used to constrain the coordinates; when level0 and level1 are enabled, the respective width, height, and depth values of level0 and level1 are obtained from the image unit, which are compared with s, t, and Multiply r to get the denormalized texture coordinates u0, v0, w0 and u1, v1, w1.
  • ratio_u0, ratio_v0, ratio_w0 are the fractional parts of u0, v0, and w0, respectively
  • ratio_u1, ratio_v1, ratio_w1 are u1, v1, and w1, respectively
  • the fractional part of inte_u0, inte_v0, and inte_w0 are the integer parts of u0, v0, and w0, respectively.
  • Inte_u1, inte_v1, and inte_w1 are the integer parts of u1, v1, and w1 respectively; when the wrap operation is performed, if the borde value in the image content has a value , And the address has overflowed at this time, disable request texels at this time, and enable border_color value as the input of the final pixel stage;
  • Coordinate controller U3 unit when level0 and level1 are enabled, filter_type is point mode, and mode is 1D, the data written to coordinate bufferu0 is inte_u0, the data written to coordinate bufferu1 is inte_u1; when mode is 2D, coordinate is written The data of bufferu0 is inte_u0, the data written to coordinate bufferv0 is inte_v0; the data written to coordinate bufferu1 is inte_u1, and the integer part written to coordinate bufferv1 is inte_v1; when the mode is 3D, the data written to coordinate bufferu0 is inte_u0, write The data of coordinate bufferv0 is inte_v0, the data written to coordinate bufferw0 is inte_w0, the data written to coordinate bufferu1 is inte_u1, the data written to coordinate bufferv1 is inte_v1, and the data written to coordinatew1 is inte_w1; when filter_type is linear mode, the mode is When 1D, the data written to coordinate bufferu1 is inte_u1, the data written to
  • address controller U4 unit first complete the calculation of texture coordinates to texture offset address; when level0 is valid, the mode is 1D, and the offset when the address calculation does not overflow is size*u0; the mode is 2D, and there is no overflow in the address calculation When the offset is size*(width0*u0+v0); mode is 3D, the offset when the address calculation does not overflow is size*(width0*u0+v0)+w0*width0*height0; get the final access
  • the address of texel cache is base0+offset.
  • the number of addresses under different inte_format conditions is obtained, and the end data is stored in offset0buffer; because level1 is invalid, when requesting texel cache, follow the double buffer operation mode, odd number Two addresses request texel cache addresses to access cache0, and an even number of addresses request access to cache1 to achieve parallel access to addresses.
  • the mode is 1D, the offset when the address calculation does not overflow is size*u0, size*u1; the mode is 2D, the offset when the address calculation does not overflow is size*(width0 *u0+v0), size*(width1*u1+v1); mode is 3D, the offset when the address calculation does not overflow is size*(width0*u0+v0)+w0*width0*height0,size*( width1*u1+v1)+w1*width1*height1.
  • the address of the final access to the texel cache is the offset of base0+level0 and the offset of base1+level1. At this time, cache0 and cache1 are requested in parallel.
  • the LOD U1 unit includes two directly connected caches, which complete the indexing of the cache line where different texels are located, and the store and replace operations of the cache line; when level0 and level1 are valid at the same time, read cache0 and cache1 in parallel Operation request, when only level0 is valid, the odd cache line is stored in cache0, and the even cache line is stored in cache1.
  • the CoordinateU2 unit includes:
  • data controllerU0 unit when level0 and level1 are valid at the same time, according to different inte_formats, combining off0 and off1, complete the data splicing task from a cache line, obtain the texture data corresponding to the texture address, and write the respective data simultaneously To data buffer0 and data buffer1, and data buffer0 and data buffer1 respectively store the data of their respective levels; when only level0 is valid, the same is true.
  • the data of each cache line is read from cache0 and cache1 according to different inte_format and off0, Get odd data and even data, write data buffer0 and data buffer1 in a dual manner, at this time, data buffer0 and data buffer1 store texel data of the same level;
  • FilterU1 unit first complete the truncation operation, truncate the r, g, b, a values of different bit widths for different inte_formats, and then use independent methods to perform filtering calculations separately, and the bit width interception methods are executed according to different inte_format;
  • the filtering methods of filter_type include NAF (non-anisotropic) (near_mipmap_linear isotropic, linear_mipmap_linear isotropic), BAF (bilinear-anisotropic) (invalid), TAF (trilinear-anisotropic), when level0 is valid, level1
  • the filtering methods of filter_type at this time are NAF (non-anisotropic) (near, near_mipmap_near, linear_mipmap_near), BAF (bilinear anisotropic), TAF (trilinear-anisotropic) (invalid); when level0 and level1 are valid at
  • the output results of the filter are assigned to texel_r, texel_g, texel_b, and texel_a. If the format is color, when inte_format only has a value for r, then texel_r is the filtering result. texel_g and texel_b are both 0 and texel_a is 1. If the format is depth and stencil, the result is assigned to the texel_r and texel_g components, and the texel_b and texel_a components are 0;
  • Pixel unit U2 When border_color is enabled, border_color data is used as the input data of the pixel phase. When swizzle operation is not enabled, pixel_r, pixel_g, pixel_b, pixel_a are equal to border_color_r, border_color_g, border_color_b, border_color_a in border_color. If the swizzle operation is enabled, the respective channel data is converted according to the swizzle mode, and finally 4 color components pixel_r, pixel_g, pixel_b, pixel_a are output in parallel.
  • it also supports the conversion of different shaping and floating point types under RGB/BGR format, and different shaping and floating point types under RGBA/BGRA format.
  • the invention uses double buffers to improve the calculation efficiency of texture index addresses.
  • the calculation can be started in parallel at the same time; when one layer of data needs to be calculated, the parity method will be used to index the texels in parallel. Ensure data parallel calculation, thereby shortening the indexing time of texel data and improving the efficiency of texel calculation;
  • Using dual buffers can improve the efficiency of texel data calculation.
  • the texels can be read out according to two separate pipelines to achieve parallel calculation; when one layer of data needs to be calculated, it will be used Parallel access in double Buffer mode, which can improve parallel access efficiency and reduce texel calculation time;
  • Trilinear is a bilinear filtering method, which reduces the computational complexity and reduces the hardware calculation power consumption
  • the border_color user setting data is used to avoid address and data calculation, thereby saving texture access time and reducing texture map calculation power consumption.
  • FIG. 1 is a design diagram of a texture mapping hardware accelerator based on a dual Buffer architecture provided in an embodiment of the present disclosure
  • FIG. 2 is a texture coordinate value diagram of 2D texture coordinates provided in an embodiment of the present disclosure in bilienar mode
  • FIG. 3 is a mapping relationship diagram of 3D texture coordinates provided in an embodiment of the present disclosure in bilinear mode
  • FIG. 4 is a diagram of the corresponding relationship between the texture address and the cache line in the dual operation provided in the embodiment of the present disclosure
  • FIG. 5 is a diagram of a calculation model in a 1D bilinear mode provided in an embodiment of the present disclosure
  • FIG. 6 is a diagram of a calculation model in 2D bilinear mode provided in an embodiment of the present disclosure.
  • FIG. 7 is a diagram of a calculation model in 3D bilinear mode provided in an embodiment of the present disclosure.
  • FIG. 8 is a diagram of a calculation model in 1D bilinear mode provided in an embodiment of the present disclosure.
  • FIG. 9 is a diagram of a calculation model in a 2D bilinear mode provided in an embodiment of the present disclosure.
  • FIG. 10 is a diagram of a calculation model in a 3D bilinear mode provided in an embodiment of the present disclosure.
  • the present disclosure provides a texture mapping hardware accelerator design based on the dual Buffer architecture, which can well solve the time course texture in the texture address calculation and data calculation process, and reduce the color and depth of the texture map. ⁇ Filtering processing in different modes of stencil.
  • a hardware accelerator design for texture mapping based on the dual-buffer architecture includes: address calculation U0 unit: image unit U0: stores the basic information of the image, when mipmap texture is enabled, the target and different map layers are used as Address, store the mode, width, height, depth, border, inte_format, format, type, base of the corresponding image; when the layer layer is enabled, use the target and different layer layers as addresses to store the mode, width, height of the corresponding layer , Depth, border, inte_format, format, type, base value, when cubemap is enabled, an address of the mipmap layer is subdivided into 6, representing different face information of 0,1,2,3,4,5 .
  • the mode, width, height, depth, border, inte_format, format, and type of different layers are the same, but the base is different; when the layer is enabled and the map layer is enabled, the mode, width, Height, depth, border, inte_format, format, type are the same, but base is different; support register configuration in 1D, 2D, 3D, rectangle, cubemap, 1D_ARRAY, 2D_ARRAY, cubemap_array, 2D_multisample, 2D_multisample, 2D_multisample_array mode;
  • LOD U1 unit complete the calculation of the level value in different filtering modes, and combine the access to the target address to get the address of the access image unit; before calculating the level value, first need to read the image unit through the target and base_level values of level0 to obtain the basic image The information serves as a reference for subsequent level calculations.
  • the calculation of the level value considers two cases: when lod is enabled, if the image is in layer mode, at this time, the width and height information of different layers are equal, no matter the filtering mode is mag_filter or min_filter, rounding to the nearest base_level A level value of the direction is the offset of reading image information at level0, and the filter_type size matches the requested filter size; when lod is enabled, if the image is in mipmap mode, the width, height, and depth of different layers are not equal at this time ,
  • mag_filter in the near and linear modes to round off the offset closest to the base_level value to read the image information
  • the min_filter mode near, linear, near_mipmap_near, linear_mipmap_near to take the offset that is closest to the base_level value of level0 to read the image
  • filter_type Respectively match the requested filtering mode, consider that min_filter in the near_mipmap_linear and linear_mipmap_linear modes is rounded to the nearest
  • lod value is min_lod
  • level0 is the same as level1, so take the fiter_type as near_mipmap_near and linear_mipmap_near respectively.
  • the partial derivative is enabled as lod, according to the primitive type primitive, dux, duy, dvx, dvy, dwx, dwy, delt_x, delt_y passed by raster, it is divided into two cases: polygon/point and line, respectively Calculate the lod of polygon/point and line. If the image is in layer mode, the width and height information of different layers are equal.
  • the filter mode is mag_filter or min_filter
  • the filter_type size matches the requested filter size; if the image is in mipmap mode, at this time, the width, height, and depth of different layers are not equal, consider that mag_filter is rounded to the nearest and linear mode
  • Read the offset of the image information close to the base_level value consider the near, linear, near_mipmap_near, linear_mipmap_near in the min_filter mode, and take the offset that is the closest to the base_level value of level0 to read the image, and the filter_type matches the requested filter mode respectively, consider the min_filter in the near_mipmap_linear , In linear_mipmap_linear mode, round the two adjacent layers as the offset for reading the image information, ratio_l is the lod value minus the decimal part of the level value, at this time the integer part called lod is level0, level0 plus
  • level0 is the same as level1, so the fiter_type is near_mipmap_near and linear_mipmap_near filtering respectively.
  • level0 and level1 are enabled, it is considered as a trilinear filtering method, with the following filtering methods: trilinear isotropic (near_mipmap_linear, linear_mipmap_linear), trilinear anisotropic; if only level 0 is valid, the following filtering methods are available: point isotropic (near, near_mipmap_near), bilinear isotropic (linear, linear_mipmap_near), bilinear anisotropic.
  • CoordinateU2 unit complete the coordinate and address conversion of s, t, r, q in fetch and sampler mode.
  • the Q coordinate at this time is not 0, which represents the layer row number, s, t, r represent the size in the x, y, and z directions respectively, and the s and t coordinates in the plane coordinates are obtained through the mapping relationship;
  • the rectangle mode is enabled, the s and t coordinates do not need to be denormalized at this time; if the s, t, and r coordinates exceed their respective representation ranges, different wrap modes are used to constrain the coordinates;
  • level0 is enabled With level1, get the respective width, height, and depth values of level0 and level1 from the image unit, and multiply them with s, t, and r to get the denormalized texture coordinates u0, v0, w0 and u1, v1, w1 ,
  • level0 is enabled With level1
  • level0 get the respective width, height, and depth values of level
  • Coordinate controller U3 unit when level0 and level1 are enabled, filter_type is point mode, and mode is 1D, the data written to coordinate bufferu0 is inte_u0, the data written to coordinate bufferu1 is inte_u1; when mode is 2D, coordinate is written The data of bufferu0 is inte_u0, the data written to coordinate bufferv0 is inte_v0; the data written to coordinate bufferu1 is inte_u1, and the integer part written to coordinate bufferv1 is inte_v1; when the mode is 3D, the data written to coordinate bufferu0 is inte_u0, write The data of coordinate bufferv0 is inte_v0, the data written to coordinate bufferw0 is inte_w0, the data written to coordinate bufferu1 is inte_u1, the data written to coordinate bufferv1 is inte_v1, and the data written to coordinatew1 is inte_w1; when filter_type is linear mode, the mode is At 1D, the data written to coordinate bufferu1 is inte_u1, the data written to
  • address controller U4 unit first complete the calculation of texture coordinates to texture offset address; when level0 is valid, the mode is 1D, and the offset when the address calculation does not overflow is size*u0; the mode is 2D, and there is no overflow in the address calculation When the offset is size*(width0*u0+v0); mode is 3D, the offset when the address calculation does not overflow is size*(width0*u0+v0)+w0*width0*height0; get the final access
  • the address of texel cache is base0+offset.
  • the number of addresses under different inte_format conditions is obtained, and the end data is stored in offset0buffer; because level1 is invalid, when requesting texel cache, follow the double buffer operation mode, odd number Two addresses request texel cache addresses to access cache0, and an even number of addresses request access to cache1 to achieve parallel access to addresses.
  • the mode is 1D, the offset when the address calculation does not overflow is size*u0, size*u1; the mode is 2D, the offset when the address calculation does not overflow is size*(width0 *u0+v0), size*(width1*u1+v1); mode is 3D, the offset when the address calculation does not overflow is size*(width0*u0+v0)+w0*width0*height0,size*( width1*u1+v1)+w1*width1*height1.
  • the address of the final access to the texel cache is the offset of base0+level0 and the offset of base1+level1. At this time, cache0 and cache1 are requested in parallel.
  • the texel cache U1 unit includes two directly connected caches, which complete the index of the cache line where different texels are located, and the store and replace operations of the cache line.
  • level0 and level1 are valid at the same time, the read operation requests for cache0 and cache1 are completed in parallel.
  • the odd cache line is stored in cache0, and the even cache line is stored in cache1.
  • the data calculationU2 unit includes: data controllerU0: similar to the address controller unit, when level0 and level1 are valid at the same time, they are combined with off0 and off1 according to different inte_formats to complete the data splicing task from a cache line to obtain the corresponding texture address Texture data, and write the respective data to data buffer0 and data buffer1 at the same time, and data buffer0 and data buffer1 respectively store the data of their respective levels; when only level0 is valid, the same is true, read the respective caches from cache0 and cache1 respectively According to different inte_format and off0, the data of the line gets odd data and even data, and writes data buffer0 and data buffer1 in a dual way.
  • data buffer0 and data buffer1 store texel data of the same level, as shown in Figure 4.
  • Filter unit U1 first complete the truncation operation, truncate the r, g, b, and a values of different bit widths for different inte_formats, and then use independent methods to perform filtering calculations, and the bit width interception methods are executed according to different inte_formats .
  • the inte_format under the OGL standard supported by the filter unit is color: r8_norm, r8_snorm, r8I, r8UI, r3_g3_b2_norm (large and small end), rgba2_norm, rgba4 (large and small end), rgb5_a1_norm (large, small end), rgb4_norm, rgb4 rgb5_norm, rgb565_norm (large and small end), r16_norm, r16_snorm, r16f, r16UI, r16I, rg8_norm, rg8_snorm, rg8UI, rg8I, srgb8 (non-linear), rgb8I, rgb_norm_norm, b8_norm_norm,
  • the filtering methods of filter_type include NAF (non-anisotropic) (near_mipmap_linear isotropic, linear_mipmap_linear isotropic), BAF (bilinear-anisotropic) (invalid), TAF (trilinear-anisotropic), when level0 is valid, level1 When it is invalid, the filtering methods of filter_type at this time include NAF (non-anisotropic) (near, near_mipmap_near, linear_mipmap_near), BAF (bilinear anisotropic), TAF (trilinear-anisotropic) (invalid).
  • filter_type is TAF (near_mipmap_linear)
  • the mode is 1D, 2D, or 3D
  • the filtering result is data0*( 1.0-ratio_l)+data1*ratio_l
  • the filtering method is TAF (linear_mipmap_linear) and the mode is 1D
  • the intermediate result is data0*(1.0-ratio_u0)+data2*ratio_u0, data1*(1.0-ratio_u1)+data3*ratio_u1
  • the final filtering result is (data0*(1.0-ratio_u0)+data2*ratio_u0)*(1.0-ratio_l )+(data1*(1.0-ratio_u1)+data
  • the output results of the filter are assigned to texel_r, texel_g, texel_b, and texel_a. If the format is color, when inte_format only has a value for r, then texel_r is the filtering result. texel_g and texel_b are both 0 and texel_a is 1. If the format is depth and stencil, the result is assigned to the texel_r and texel_g components, and the texel_b and texel_a components are 0. Pixel unit U2: When border_color is enabled, border_color data is used as the input data of the pixel phase.
  • pixel_r, pixel_g, pixel_b, pixel_a are equal to border_color_r, border_color_g, border_color_b, border_color_a in border_color. If the swizzle operation is enabled, the respective channel data is converted according to the swizzle mode, and finally 4 color components pixel_r, pixel_g, pixel_b, pixel_a are output in parallel.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Generation (AREA)
  • Image Processing (AREA)

Abstract

一种基于双Buffer架构下的纹理贴图硬件加速器,其包括:地址计算单元根据不同的纹理地址请求计算得到访问texel cache的地址;texel cache单元根据不同的请求地址从memory中得到对应的cache line的纹素;数据计算单元根据不同的isotropic和anisotropic滤波方式的filter处理以及针对border_color和swizzle操作的pixel处理,双Buffer可以提高纹理索引地址的计算效率,当同时存在两层数据需要计算时,可以同时平行开始计算。当使能一层数据需要计算时,将采用奇偶式并行地索引纹素以保证数据并行计算,从而缩短了纹素数据的索引时间,并且提高了纹素计算效率。

Description

一种基于双Buffer架构下的纹理贴图硬件加速器
本公开以2019年6月10日递交的、申请号为201910495890.0且名称为“一种基于双Buffer架构下的纹理贴图硬件加速器”的专利文件为优先权文件,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及GPU芯片设计技术领域,具体为一种基于双Buffer架构下的纹理贴图硬件加速器。
背景技术
纹理贴图操作在GPU的应用非常广泛,不仅可以作为GPGPU通用计算领域内的计算单元,同时作为图形渲染管线对纹理数据fetch、sample的执行者。所以纹理贴图单元性能的好坏直接影响了图形处理器内部执行效率,在通用计算领域直接影响了数据lookup、transfer的快慢,所以设计高效的纹理贴图单元在GPU设计中尤为关键。
发明内容
本公开的目的在于提供一种基于双Buffer架构下的纹理贴图硬件加速器,以解决上述背景技术中提出的图形处理器内部执行效率差,在通用计算领域直接影响了数据Lookup、transfer的快慢的问题。
为实现上述目的,本公开提供如下技术方案:一种基于双Buffer架构下的纹理贴图硬件加速器,包括:
Image U0单元:对image基本信息进行存储,当使能mipmap纹理时,通过target以及不同map层作为地址,存储对应image的mode、width、height、depth、border、inte_format、format、type、base;当使能layer层时,通过target以及不同的layer层作为地址,存储对应层的mode、width、height、depth、border、inte_format、format、type、base值,当使能cubemap时,将mipmap层的一个地址进行细分为6个,分别代表0,1,2,3,4,5不同的face信息,当使能layer层没有map层信息时,不同layer层的mode、width、height、depth、border、inte_format、format、type相同,base不同;当使能layer并使能map层时的mode、width、height、depth、border、inte_format、format、type相同,base不同;支持1D、2D、3D、rectangle、cubemap、1D_ARRAY、2D_ARRAY、cubemap_array、2D_multisample、2D_multisample、2D_multisample_array模式下的寄存器配置;
LOD U1单元:完成对不同滤波模式下的level值计算,结合访问target地址得到访问image单元的地址;在计算level值之前,首先需要通过target和base_level值为level0读取image单元,获得image的基本信息作为后续level计算时的参考,然后level值的计算考虑两种情况:当使能lod时,如果image为layer模式,此时,不同层的width、height信息相等,无论滤波模式为mag_filter还是min_filter,都取整取最靠近base_level方向的一个level值为level0读取image信息的offset,而filter_type大小与请求的滤波大小相匹配;当使能lod时,如果image为mipmap模式,此时,不同层的width、height、depth不相等,考虑mag_filter在near和linear模式下取整取最靠近base_level值读取image信息的offset,考虑min_filter模式下near、linear、near_mipmap_near、linear_mipmap_near取最不靠近base_level值为level0读取image的offset,而filter_type分别和请求的滤波模式相匹配,考虑min_filter在near_mipmap_linear、linear_mipmap_linear模式下取整取临近的两层作为读取image信息的offset,ratio_l为lod值减去level值的小数部分,此时称为lod的整数部分为level0,level0加1为level1,如果lod值为min_lod,此时level0与level1相 同,所以取fiter_type分别为near_mipmap_near和linear_mipmap_near滤波;同理,当使能偏导数作为lod时,按照raster传递过来的图元类型primitive、dux、duy、dvx、dvy、dwx、dwy、delt_x、delt_y,分为polygon/point和line两种情况,分别计算得到polygon/point和line的lod,如果image为layer模式,此时,不同层的width、height信息相等,无论滤波模式为mag_filter还是min_filter,都取整取最靠近base_level方向的一个level值为level0读取image信息的offset,而filter_type大小与请求的滤波大小相匹配;如果image为mipmap模式,此时,不同层的width、height、depth不相等,考虑mag_filter在near和linear模式下取整取最靠近base_level值读取image信息的offset,考虑min_filter模式下near、linear、near_mipmap_near、linear_mipmap_near取最不靠近base_level值为level0读取image的offset,而filter_type分别和请求的滤波模式相匹配,考虑min_filter在near_mipmap_linear、linear_mipmap_linear模式下取整取临近的两层作为读取image信息的offset,ratio_l为lod值减去level值的小数部分,此时称为lod的整数部分为level0,level0加1为level1,如果lod值为min_lod,此时level0与level1相同,所以取fiter_type分别为near_mipmap_near和linear_mipmap_near滤波。如果使能了level0和level1,则考虑为trilinear滤波方式,有以下滤波方式:trilinear isotropic(near_mipmap_linear、linear_mipmap_linear)、trilinear anisotropic;如果只有level0有效,有以下滤波方式:point isotropic(near、near_mipmap_near)、bilinear isotropic(linear、linear_mipmap_near)、bilinear anisotropic;
CoordinateU2单元:完成对fetch、sampler模式下的s、t、r、q的坐标、地址转换;当使能cubemap_array时,此时的Q坐标不为0,表示layer行号,s、t、r分别表示x,y,z方向上的大小,通过映射关系得到平面坐标内的s、t坐标;当使能rectangle模式,此时的s、t坐标不需要进行解归一化处理;如果s、t、r坐标超出了各自的表示范围,采用不同的wrap模式对坐标进行约束;当使能level0和level1时,从image单元得到level0和level1各自的width、height、depth值,分别与s、t、r相乘,得到解归一化后的纹理坐标u0,v0,w0和u1,v1,w1,当只有level0有效时,从image单元得到level0的width、height、depth值,分别与s、t、r相乘,得到解归一化后的纹理坐标u0,v0,w0;此时的ratio_u0、ratio_v0、ratio_w0分别为u0、v0、w0的小数部分,ratio_u1、ratio_v1、ratio_w1分别为u1、v1、w1的小数部分,inte_u0,inte_v0,inte_w0分别为u0、v0、w0的整数部分,inte_u1,inte_v1,inte_w1分别为u1、v1、w1的整数部分;执行wrap操作时,如果image内容中的borde值有值,并且此时地址已经溢出,此时disable请求纹素,并使能border_color值作为最终pixel阶段的输入;
Coordinate controller U3单元:当使能level0和level1时,filter_type为point模式时,mode为1D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferu1的数据为inte_u1;mode为2D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferv0的数据为inte_v0;写入coordinate bufferu1的数据为inte_u1,写入coordinate bufferv1的整数部分为inte_v1;mode为3D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferv0的数据为inte_v0,写入coordinate bufferw0的数据为inte_w0,写入coordinate bufferu1的数据为inte_u1,写入coordinate bufferv1的数据为inte_v1,写入coordinatew1的数据为inte_w1;filter_type为linear模式时,mode为1D时,写入coordinate bufferu1的数据为inte_u1,写入coordinate bufferu1的数据为inte_u1+1;写入coordinate bufferu0的数据为inte_u0,写入 coordinate bufferu0的数据为inte_u0+1;mode为2D时,写入coordinate bufferu0、coordinate bufferv0的数据依次为:(inte_u0,inte_v0)、(inte_u0+1,inte_v0)、(inte_u0,inte_v0+1)、(inte_u0+1,inte_v0+1);写入coordinate bufferu1、coordinate bufferv1的数据依次为:(inte_u1,inte_v1)、(inte_u1+1,inte_v1)、(inte_u1,inte_v1+1)、(inte_u1+1,inte_v1+1);mode为3D时,写入coordinate bufferu0、coordinate bufferv0、coordinate bufferw0的数据依次为:(inte_u0,inte_v0,inte_w0)、(inte_u0+1,inte_v0,inte_w0)、(inte_u0,inte_v0+1,inte_w0)、(inte_u0+1,inte_v0+1,inte_w0)、(inte_u0,inte_v0,inte_w0+1)、(inte_u0+1,inte_v0,inte_w0+1)、(inte_u0,inte_v0+1,inte_w0+1)、(inte_u0+1,inte_v0+1,inte_w0+1);写入coordinate bufferu1、coordinate bufferv1、coordinate bufferw1的数据依次为:(inte_u1,inte_v1,inte_w1)、(inte_u1+1,inte_v1,inte_w1)、(inte_u1,inte_v1+1,inte_w1)、(inte_u1+1,inte_v1+1,inte_w1)、(inte_u1,inte_v1,inte_w1+1)、(inte_u1+1,inte_v1,inte_w1+1)、(inte_u1,inte_v1+1,inte_w1+1)、(inte_u1+1,inte_v1+1,inte_w1+1);当使能level0时,filter_type为point模式时,mode为1D时,写入coordinate bufferu0的数据为inte_u0;mode为2D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferv0的数据为inte_v0;mode为3D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferv0的数据为inte_v0,写入coordinate bufferw0的数据为inte_w0;filter_type为linear模式时,mode为1D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferu0的数据为inte_u0+1;mode为2D时,写入coordinate bufferu0、coordinate bufferv0的数据依次为:(inte_u0,inte_v0)、(inte_u0+1,inte_v0)、(inte_u0,inte_v0+1)、(inte_u0+1,inte_v0+1);mode为3D时,写入coordinate bufferu0、coordinate bufferv0、coordinate bufferw0的数据依次为:(inte_u0,inte_v0,inte_w0)、(inte_u0+1,inte_v0,inte_w0)、(inte_u0,inte_v0+1,inte_w0)、(inte_u0+1,inte_v0+1,inte_w0)、(inte_u0,inte_v0,inte_w0+1)、(inte_u0+1,inte_v0,inte_w0+1)、(inte_u0,inte_v0+1,inte_w0+1)、(inte_u0+1,inte_v0+1,inte_w0+1)。
address controller U4单元:首先完成纹理坐标到纹理偏移地址的计算;对于level0有效时,mode为1D,在地址计算没有溢出时的偏移量为size*u0;mode为2D,在地址计算没有溢出时的偏移量为size*(width0*u0+v0);mode为3D,在地址计算没有溢出时的偏移量为size*(width0*u0+v0)+w0*width0*height0;得到最终访问texel cache的地址为base0+偏移量。然后根据偏移量的末尾与4字节对齐方式,得到不同inte_format条件下的地址个数,并将末尾数据保存于offset0buffer中;由于level1无效,所以请求texel cache时,按照双buffer操作方式,奇数个地址请求texel cache地址访问cache0,偶数个地址请求访问cache1,实现地址的并行性访问。对于level0和level1均有效时,mode为1D,在地址计算没有溢出时的偏移量为size*u0,size*u1;mode为2D,在地址计算没有溢出时的偏移量为size*(width0*u0+v0),size*(width1*u1+v1);mode为3D,在地址计算没有溢出时的偏移量为size*(width0*u0+v0)+w0*width0*height0,size*(width1*u1+v1)+w1*width1*height1。得到最终访问texel cache的地址为base0+level0偏移量,base1+level1偏移量。此时,并行请求cache0和cache1。
可选地,所述LOD U1单元包括两个直接相连cache,完成对不同texel所在cache行的索引以及cache line的store、replace操作;当level0和level1同时有效时,并行完成对cache0和cache1 的读操作请求,当只有level0有效时,cache0中存储odd cache line,cache1中存储even cache line。
可选地,所述CoordinateU2单元包括:
data controllerU0单元:当level0和level1同时有效时,分别按照不同的inte_format,结合off0和off1,从一个cache line中完成数据的拼接任务,得到与纹理地址对应的纹理数据,并将各自数据同时写入到data buffer0和data buffer1中,并且data buffer0、data buffer1分别存放各自level的数据;当只有level0有效时,同理,从cache0和cache1中分别读出各自cache line的数据根据不同的inte_format以及off0,得到odd data和even data,按照双方式写入data buffer0和data buffer1,此时data buffer0和data buffer1中存放的是同一个level的纹素数据;
filterU1单元:首先完成截位运算,针对不同的inte_format,截位出不同位宽的r、g、b、a值,然后采用独立方式分别进行滤波计算,位宽的截取方法按照不同的inte_format执行;当level0和level1均有效,此时filter_type的滤波方式有NAF(non-anisotropic)(near_mipmap_linear isotropic、linear_mipmap_linear isotropic)、BAF(bilinear-anisotropic)(无效)、TAF(trilinear-anisotropic),当level0有效,level1无效时,此时filter_type的滤波方式有NAF(non-anisotropic)(near、near_mipmap_near、linear_mipmap_near)、BAF(bilinear anisotropic)、TAF(trilinear-anisotropic)(无效);当level0和level1同时有效,而filter_type为TAF(near_mipmap_linear)时,无论mode为1D、2D、3D,均同时从data buffer0、data buffer1中同时读取一个数据data0和data1,得到滤波结果为data0*(1.0-ratio_l)+data1*ratio_l;如果滤波方式为TAF(linear_mipmap_linear),mode为1D时,首先从data buffer0、data buffer1中同时依次读取2个数据,分别为data0、data1和data2、data3,滤波中间结果为data0*(1.0-ratio_u0)+data2*ratio_u0、data1*(1.0-ratio_u1)+data3*ratio_u1,得到滤波最终结果为(data0*(1.0-ratio_u0)+(data2*ratio_u0)*(1.0-ratio_l)+(data1*1.0-ratio_u1)+(data3*ratio_u1)*ratio_l;mode为2D,从data buffer0、data buffer1中同时依次读取4个数据data0、data1、data2、data3以及data4、data5、data6、data7,那么首先通过前4个数据data0、data2、data4、data6得到滤波的中间结果为data0*(1.0-ratio_u0)+data2*ratio_u0、data4*(1.0-ratio_u1)+data6*ratio_u1;然后再通过后4个数据data1、data3、data5、data7得到滤波的中间结果为data1*(1.0-ratio_u0)+data3*ratio_u0、data5*(1.0-ratio_u1)+data7*ratio_u1,最后分别得到level0和level1的最终结果为(data0*(1.0-ratio_u0)+data1*ratio_u0)*(1.0-ratio_v0)+(data2*(1.0-ratio_u0)+data3*ratio_u0*ratio_v0、(data4*(1.0-ratio_u1)+data5*ratio_u1)*(1.0-ratio_v1)+(data6*(1.0-ratio_u1)+data7*ratio_u1)*ratio_v1,最后的滤波结果为((data0*(1.0-ratio_u0)+data1*ratio_u0)*(1.0-ratio_v0)+(data2*(1.0-ratio_u0)+data3*ratio_u0)*ratio_v0)*(1.0-ratio_l)+((data4*(1.0-ratio_u1)+data5*ratio_u1)*(1.0-ratio_v1)+(data6*(1.0-ratio_u1)+data7*ratio_u1)*ratio_v1)*ratio_l;mode为3D时,从data buffer0、data buffer1同时依次读取8个数据,分别为data0、data1、data2、data3、data4、data5、data6、data7和data8、data9、data10、data11、data12、data13、data14、data15;首先通过前8个数据data0、data1、data2、data3、data8、data9、data10、data11得到滤波中间结果分别为:((data0* (1.0-ratio_u0)+data2*ratio_u0)*(1.0-ratio_v0)+((data1*(1.0-ratio_u0)+data3*ratio_u0)*ratio_v0、((data8*(1.0-ratio_u1)+data9*ratio_u1)*(1.0-ratio_v1)+((data10*(1.0-ratio_u1)+data11*ratio_u1)*ratio_v1,然后再通过后8个数据data4、data5、data6、data7、data12、data13、data14、data15得到滤波中间结果分别为:((data4*(1.0-ratio_u0)+data5*ratio_u0)*(1.0-ratio_v0)+((data6*(1.0-ratio_u0)+data7*ratio_u0)*ratio_v0、((data12*(1.0-ratio_u1)+data13*ratio_u1)*(1.0-ratio_v1)+((data14*(1.0-ratio_u1)+data15*ratio_u1)*ratio_v1;最后得到滤波的最终结果为:((((data0*(1.0-ratio_u0)+data2*ratio_u0)*(1.0-ratio_v0)+((data1*(1.0-ratio_u0)+data3*ratio_u0)*ratio_v0)*(1.0-ratio_w0)+(((data4*(1.0-ratio_u0)+data5*ratio_u0)*(1.0-ratio_v0)+((data6*(1.0-ratio_u0)+data7*ratio_u0)*ratio_v0)*ratio_w0)*(1.0-ratio_l)+((((data8*(1.0-ratio_u1)+data9*ratio_u1)*(1.0-ratio_v1)+((data10*(1.0-ratio_u1)+data11*ratio_u1)*ratio_v1)*(1.0-ratio_w1)+(((data12*(1.0-ratio_u1)+data13*ratio_u1)*(1.0-ratio_v1)+((data14*(1.0-ratio_u1)+data15*ratio_u1)*ratio_v1)*ratio_w1)*ratio_l,当使能了anisotropic时,此时对data buffer0、data buffer1中的数据分别进行anisotropic计算后得到data0、data1的中间滤波结果,那么最终的滤波结果为data0*(1.0-ratio_l)+data1*ratio_l;当只有level0有效,而filter_type为near、near_mipmap_near时,无论mode为1D、2D、3D,均同时从data buffer0、data buffer1中同时读取一个数据data0和data1,对data0、data1格式进行转换之后直接输出,不经过滤波;如果滤波方式为BAF,当mode为1D时,首先从data buffer0、data buffer1中同时依次读取1个数据,分别为data0、data1,最终的滤波结果为data0*(1.0-ratio_u0)+data2*ratio_u0;当mode为2D,从data buffer0、data buffer1中同时依次读取2个数据data0、data1、data2、data3,那么首先通过前2个数据data0、data2、得到滤波的中间结果为data0*(1.0-ratio_u0)+data2*ratio_u0;然后再通过后2个数据data1、data3得到滤波的中间结果为data1*(1.0-ratio_u0)+data3*ratio_u0,最后得到滤波的最终结果为(data0*(1.0-ratio_u0)+data2*ratio_u0)*(1.0-ratio_l)+(data1*(1.0-ratio_u0)+data3*ratio_u0)*ratio_l;mode为3D时,从data buffer0、data buffer1同时依次读取4个数据,分别为data0、data1、data2、data3和data4、data5、data6、data7。首先通过前4个数据data0、data1、data4、data5得到滤波中间结果为:(data0*(1.0-ratio_u0)+data1*ratio_u0)*(1.0-ratio_v0)+(data4*(1.0-ratio_u0)+data5*ratio_u0)*ratio_v0,然后再通过后4个数据data2、data3、data6、data7得到滤波中间结果分别为:(data2*(1.0-ratio_u0)+data3*ratio_u0)*(1.0-ratio_v0)+(data6*(1.0-ratio_u0)+data7*ratio_u0)*ratio_v0;最后得到滤波的最终结果为:((data0*(1.0-ratio_u0)+data1*ratio_u0)*(1.0-ratio_v0)+(data4*(1.0-ratio_u0)+data5*ratio_u0)*ratio_v0)*(1.0-ratio_w0)+((data2*(1.0-ratio_u0)+data3*ratio_u0)*(1.0-ratio_v0)+(data6*(1.0-ratio_u0)+data7*ratio_u0)*ratio_v0)*ratio_w0。当执行滤波操作后,按照不同的inte_format格式,对filter的输出结果为texel_r、texel_g、texel_b、texel_a进行赋值操作,如果format为color,当inte_format只有其中的r有值,此时texel_r为滤波结果,texel_g、texel_b均为0,texel_a为1;如果format为depth、stencil,此时在不执行滤波的添加下,将结果赋值到texel_r、texel_g分量,texel_b、texel_a分量为0;
pixel单元U2:当使能border_color时,采用border_color数据作为pixel阶段的输入数据,当 没有使能swizzle操作时,pixel_r、pixel_g、pixel_b、pixel_a与border_color中的border_color_r、border_color_g、border_color_b、border_color_a相等,如果使能了swizzle操作,则按照swizzle模式分别对各自通道数据进行转换操作,最终并行输出4路颜色分量pixel_r、pixel_g、pixel_b、pixel_a。
可选地,支持Color、depth、stencil、depth_stencil模式下的FP32、FP16、FP11、FP10、INT32数据类型。
可选地,还支持RGB/BGR format下不同整形、浮点类型type、RGBA/BGRA format下不同整形、浮点类型type的转换。
可选地,还支持depth、stencil、depth_stencil对深度纹理的比较以及stencil index计算
与现有技术相比,本实用新型的有益效果是:
该发明采用双Buffer可以提高纹理索引地址的计算效率,当同时存在两层数据需要计算时,可以同时平行开始计算;当使能一层数据需要计算时,将采用奇偶方式并行地索引纹素以保证数据并行计算,从而缩短了纹素数据的索引时间,并且提高了纹素计算效率;
采用双Buffer可以提高纹素数据计算效率,当同时存在两层数据需要计算时,可以分别按照两个各自的管线读出纹素以实现平行计算;当使能一层数据需要计算时,将采用双Buffer方式并行访问,从而可以提高并行访问效率,降低纹素计算时间;
当使能mipmap纹理时,如果出现计算lod为设置的max_level,可以使能其中的一个buffer管线地址、数据计算,trilinear为bilinear滤波方式,从而降低了计算复杂度,并且降低硬件计算功耗;
当存在border值,并且wrap操作后的(U,V)坐标溢出时,采用border_color用户设置数据,避免地址和数据计算,从而节省纹理访问时间,降低纹理贴图计算功耗。
附图说明
图1是本公开实施例中提供的基于双Buffer架构下的纹理贴图硬件加速器设计图;
图2是本公开实施例中提供的2D纹理坐标在bilienar模式下的纹理坐标取值图;
图3是本公开实施例中提供的3D纹理坐标在bilinear模式下的映射关系图;
图4是本公开实施例中提供的双操作中纹理地址与cache line的对应关系图;
图5是本公开实施例中提供的1D bilinear模式下的计算模型图;
图6是本公开实施例中提供的2Dbilinear模式下的计算模型图;
图7是本公开实施例中提供的3Dbilinear模式下的计算模型图;
图8是本公开实施例中提供的1D bilinear模式下的计算模型图;
图9是本公开实施例中提供的2Dbilinear模式下的计算模型图;
图10是本公开实施例中提供的3Dbilinear模式下的计算模型图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
请参阅图1-10,本公开提供一种基于双Buffer架构下的纹理贴图硬件加速器设计可以很好地解决纹理地址计算和数据计算过程中的时间过程纹理,并且降低纹理贴图中对color、depth、stencil不同 模式下的滤波处理。如图1所示:一种基于双Buffer架构下的纹理贴图硬件加速器设计包括:address calculationU0单元:image单元U0:对image基本信息进行存储,当使能mipmap纹理时,通过target以及不同map层作为地址,存储对应image的mode、width、height、depth、border、inte_format、format、type、base;当使能layer层时,通过target以及不同的layer层作为地址,存储对应层的mode、width、height、depth、border、inte_format、format、type、base值,当使能cubemap时,将mipmap层的一个地址进行细分为6个,分别代表0,1,2,3,4,5不同的face信息。当使能layer层没有map层信息时,不同layer层的mode、width、height、depth、border、inte_format、format、type相同,base不同;当使能layer并使能map层时的mode、width、height、depth、border、inte_format、format、type相同,base不同;支持1D、2D、3D、rectangle、cubemap、1D_ARRAY、2D_ARRAY、cubemap_array、2D_multisample、2D_multisample、2D_multisample_array模式下的寄存器配置;
LOD U1单元:完成对不同滤波模式下的level值计算,结合访问target地址得到访问image单元的地址;在计算level值之前,首先需要通过target和base_level值为level0读取image单元,获得image的基本信息作为后续level计算时的参考。然后,level值的计算考虑两种情况:当使能lod时,如果image为layer模式,此时,不同层的width、height信息相等,无论滤波模式为mag_filter还是min_filter,都取整取最靠近base_level方向的一个level值为level0读取image信息的offset,而filter_type大小与请求的滤波大小相匹配;当使能lod时,如果image为mipmap模式,此时,不同层的width、height、depth不相等,考虑mag_filter在near和linear模式下取整取最靠近base_level值读取image信息的offset,考虑min_filter模式下near、linear、near_mipmap_near、linear_mipmap_near取最不靠近base_level值为level0读取image的offset,而filter_type分别和请求的滤波模式相匹配,考虑min_filter在near_mipmap_linear、linear_mipmap_linear模式下取整取临近的两层作为读取image信息的offset,ratio_l为lod值减去level值的小数部分,此时称为lod的整数部分为level0,level0加1为level1,如果lod值为min_lod,此时level0与level1相同,所以取fiter_type分别为near_mipmap_near和linear_mipmap_near滤波。同理,当使能偏导数作为lod时,按照raster传递过来的图元类型primitive、dux、duy、dvx、dvy、dwx、dwy、delt_x、delt_y,分为polygon/point和line两种情况,分别计算得到polygon/point和line的lod,如果image为layer模式,此时,不同层的width、height信息相等,无论滤波模式为mag_filter还是min_filter,都取整取最靠近base_level方向的一个level值为level0读取image信息的offset,而filter_type大小与请求的滤波大小相匹配;如果image为mipmap模式,此时,不同层的width、height、depth不相等,考虑mag_filter在near和linear模式下取整取最靠近base_level值读取image信息的offset,考虑min_filter模式下near、linear、near_mipmap_near、linear_mipmap_near取最不靠近base_level值为level0读取image的offset,而filter_type分别和请求的滤波模式相匹配,考虑min_filter在near_mipmap_linear、linear_mipmap_linear模式下取整取临近的两层作为读取image信息的offset,ratio_l为lod值减去level值的小数部分,此时称为lod的整数部分为level0,level0加1为level1,如果lod值为min_lod,此时level0与level1相同,所以取fiter_type分别为near_mipmap_near和linear_mipmap_near滤波。如果使能了level0和level1,则考虑为trilinear滤波方式,有以下滤波方式:trilinear isotropic(near_mipmap_linear、linear_mipmap_linear)、trilinear anisotropic;如果只有level0有效,有以下滤波方式:point isotropic(near、 near_mipmap_near)、bilinear isotropic(linear、linear_mipmap_near)、bilinear anisotropic。
CoordinateU2单元:完成对fetch、sampler模式下的s、t、r、q的坐标、地址转换。当使能cubemap_array时,此时的Q坐标不为0,表示layer行号,s、t、r分别表示x,y,z方向上的大小,通过映射关系得到平面坐标内的s、t坐标;当使能rectangle模式,此时的s、t坐标不需要进行解归一化处理;如果s、t、r坐标超出了各自的表示范围,采用不同的wrap模式对坐标进行约束;当使能level0和level1时,从image单元得到level0和level1各自的width、height、depth值,分别与s、t、r相乘,得到解归一化后的纹理坐标u0,v0,w0和u1,v1,w1,当只有level0有效时,从image单元得到level0的width、height、depth值,分别与s、t、r相乘,得到解归一化后的纹理坐标u0,v0,w0;此时的ratio_u0、ratio_v0、ratio_w0分别为u0、v0、w0的小数部分,ratio_u1、ratio_v1、ratio_w1分别为u1、v1、w1的小数部分,inte_u0,inte_v0,inte_w0分别为u0、v0、w0的整数部分,inte_u1,inte_v1,inte_w1分别为u1、v1、w1的整数部分。执行wrap操作时,如果image内容中的border有效,并且此时地址已经溢出,此时disable请求纹素,并使能border_color值作为最终pixel阶段的输入;
Coordinate controller U3单元:当使能level0和level1时,filter_type为point模式时,mode为1D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferu1的数据为inte_u1;mode为2D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferv0的数据为inte_v0;写入coordinate bufferu1的数据为inte_u1,写入coordinate bufferv1的整数部分为inte_v1;mode为3D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferv0的数据为inte_v0,写入coordinate bufferw0的数据为inte_w0,写入coordinate bufferu1的数据为inte_u1,写入coordinate bufferv1的数据为inte_v1,写入coordinatew1的数据为inte_w1;filter_type为linear模式时,mode为1D时,写入coordinate bufferu1的数据为inte_u1,写入coordinate bufferu1的数据为inte_u1+1;写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferu0的数据为inte_u0+1;mode为2D时,取周围的4个点坐标,如下图2所示,写入coordinate bufferu0、coordinate bufferv0的数据依次为:(inte_u0,inte_v0)、(inte_u0+1,inte_v0)、(inte_u0,inte_v0+1)、(inte_u0+1,inte_v0+1);写入coordinate bufferu1、coordinate bufferv1的数据依次为:(inte_u1,inte_v1)、(inte_u1+1,inte_v1)、(inte_u1,inte_v1+1)、(inte_u1+1,inte_v1+1);mode为3D时,去周围8个点坐标,如下图3所示,写入coordinate bufferu0、coordinate bufferv0、coordinate bufferw0的数据依次为:(inte_u0,inte_v0,inte_w0)、(inte_u0+1,inte_v0,inte_w0)、(inte_u0,inte_v0+1,inte_w0)、(inte_u0+1,inte_v0+1,inte_w0)、(inte_u0,inte_v0,inte_w0+1)、(inte_u0+1,inte_v0,inte_w0+1)、(inte_u0,inte_v0+1,inte_w0+1)、(inte_u0+1,inte_v0+1,inte_w0+1);写入coordinate bufferu1、coordinate bufferv1、coordinate bufferw1的数据依次为:(inte_u1,inte_v1,inte_w1)、(inte_u1+1,inte_v1,inte_w1)、(inte_u1,inte_v1+1,inte_w1)、(inte_u1+1,inte_v1+1,inte_w1)、(inte_u1,inte_v1,inte_w1+1)、(inte_u1+1,inte_v1,inte_w1+1)、(inte_u1,inte_v1+1,inte_w1+1)、(inte_u1+1,inte_v1+1,inte_w1+1);当使能level0时,filter_type为point模式时,mode为1D时,写入coordinate bufferu0的数据为inte_u0;mode为2D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferv0的数据为inte_v0;mode为3D时,写入coordinate bufferu0的数据为inte_u0,写入 coordinate bufferv0的数据为inte_v0,写入coordinate bufferw0的数据为inte_w0;filter_type为linear模式时,mode为1D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferu0的数据为inte_u0+1;mode为2D时,写入coordinate bufferu0、coordinate bufferv0的数据依次为:(inte_u0,inte_v0)、(inte_u0+1,inte_v0)、(inte_u0,inte_v0+1)、(inte_u0+1,inte_v0+1);mode为3D时,写入coordinate bufferu0、coordinate bufferv0、coordinate bufferw0的数据依次为:(inte_u0,inte_v0,inte_w0)、(inte_u0+1,inte_v0,inte_w0)、(inte_u0,inte_v0+1,inte_w0)、(inte_u0+1,inte_v0+1,inte_w0)、(inte_u0,inte_v0,inte_w0+1)、(inte_u0+1,inte_v0,inte_w0+1)、(inte_u0,inte_v0+1,inte_w0+1)、(inte_u0+1,inte_v0+1,inte_w0+1)。
address controller U4单元:首先完成纹理坐标到纹理偏移地址的计算;对于level0有效时,mode为1D,在地址计算没有溢出时的偏移量为size*u0;mode为2D,在地址计算没有溢出时的偏移量为size*(width0*u0+v0);mode为3D,在地址计算没有溢出时的偏移量为size*(width0*u0+v0)+w0*width0*height0;得到最终访问texel cache的地址为base0+偏移量。然后根据偏移量的末尾与4字节对齐方式,得到不同inte_format条件下的地址个数,并将末尾数据保存于offset0buffer中;由于level1无效,所以请求texel cache时,按照双buffer操作方式,奇数个地址请求texel cache地址访问cache0,偶数个地址请求访问cache1,实现地址的并行性访问。对于level0和level1均有效时,mode为1D,在地址计算没有溢出时的偏移量为size*u0,size*u1;mode为2D,在地址计算没有溢出时的偏移量为size*(width0*u0+v0),size*(width1*u1+v1);mode为3D,在地址计算没有溢出时的偏移量为size*(width0*u0+v0)+w0*width0*height0,size*(width1*u1+v1)+w1*width1*height1。得到最终访问texel cache的地址为base0+level0偏移量,base1+level1偏移量。此时,并行请求cache0和cache1。texel cache U1单元包括两个直接相连cache,完成对不同texel所在cache行的索引以及cache line的store、replace操作。当level0和level1同时有效时,并行完成对cache0和cache1的读操作请求,当只有level0有效时,cache0中存储odd cache line,cache1中存储even cache line。data calculationU2单元包括:data controllerU0:与address controller单元相似,当level0和level1同时有效时,分别按照不同的inte_format,结合off0和off1,从一个cache line中完成数据的拼接任务,得到与纹理地址对应的纹理数据,并将各自数据同时写入到data buffer0和data buffer1中,并且data buffer0、data buffer1分别存放各自level的数据;当只有level0有效时,同理,从cache0和cache1中分别读出各自cache line的数据根据不同的inte_format以及off0,得到odd data和even data,按照双方式写入data buffer0和data buffer1,此时data buffer0和data buffer1中存放的是同一个level的纹素数据,如图4所示,按照顺时针方向完成对cache的读取操作。filter单元U1:首先完成截位运算,针对不同的inte_format,截位出不同位宽的r、g、b、a值,然后采用独立方式分别进行滤波计算,位宽的截取方法按照不同的inte_format执行。filter单元支持的OGL标准下的inte_format为color的有:r8_norm、r8_snorm、r8I、r8UI、r3_g3_b2_norm(大、小端)、rgba2_norm、rgba4(大、小端)、rgb5_a1_norm(大、小端)、rgb4_norm、rgb5_norm、rgb565_norm(大、小端)、r16_norm、r16_snorm、r16f、r16UI、r16I、rg8_norm、rg8_snorm、rg8UI、rg8I、srgb8(非线性)、rgb8I、rgb8UI、rgb8_snorm、rgb8_norm、rgb10_norm、rgb10_a2_norm(大、小端)、rgb10_a2UI(大、小端)、srgb8_a8_norm(大、小端)、r11f_g11f_b10f、rgb9_e5(共享)、rgba8_norm (大、小端)、rgba8_snorm(大、小端)、rgba8UI(大、小端)、rgba8I(大、小端)、rg16、rg16_snorm、rg16I、rg16UI、rg16f、r32I、r32UI、r32f;filter单元支持的inte_format为depth和stencil的有:depth16、depth24、depth32、depth32f、stencil_index1、stencil_index4、stencil_index8、stencil_index16、depth24_stencil8、depth32f_stencil8。针对integer数据类型(有符号和无符号)两种、float数据类型(normalized、unnormalized、非线性、共享数据类型)四种情况,在执行滤波运算之前均需要将snorm、norm、srgb、rgbae在不同的filter_type下完成滤波计算。当level0和level1均有效,此时filter_type的滤波方式有NAF(non-anisotropic)(near_mipmap_linear isotropic、linear_mipmap_linear isotropic)、BAF(bilinear-anisotropic)(无效)、TAF(trilinear-anisotropic),当level0有效,level1无效时,此时filter_type的滤波方式有NAF(non-anisotropic)(near、near_mipmap_near、linear_mipmap_near)、BAF(bilinear anisotropic)、TAF(trilinear-anisotropic)(无效)。当level0和level1同时有效,而filter_type为TAF(near_mipmap_linear)时,无论mode为1D、2D、3D,均同时从data buffer0、data buffer1中同时读取一个数据data0和data1,得到滤波结果为data0*(1.0-ratio_l)+data1*ratio_l;如果滤波方式为TAF(linear_mipmap_linear),mode为1D时,首先从data buffer0、data buffer1中同时依次读取2个数据,分别为data0、data1和data2、data3,滤波中间结果为data0*(1.0-ratio_u0)+data2*ratio_u0、data1*(1.0-ratio_u1)+data3*ratio_u1,得到滤波最终结果为(data0*(1.0-ratio_u0)+data2*ratio_u0)*(1.0-ratio_l)+(data1*(1.0-ratio_u1)+data3*ratio_u1)*ratio_l,如图8所示;mode为2D,从data buffer0、data buffer1中同时依次读取4个数据data0、data1、data2、data3以及data4、data5、data6、data7,那么首先通过前4个数据data0、data2、data4、data6得到滤波的中间结果为data0*(1.0-ratio_u0)+data2*ratio_u0、data4*(1.0-ratio_u1)+data6*ratio_u1;然后再通过后4个数据data1、data3、data5、data7得到滤波的中间结果为data1*(1.0-ratio_u0)+data3*ratio_u0、data5*(1.0-ratio_u1)+data7*ratio_u1,最后分别得到level0和level1的最终结果为(data0*(1.0-ratio_u0)+data1*ratio_u0)*(1.0-ratio_v0)+(data2*(1.0-ratio_u0)+data3*ratio_u0)*ratio_v0、(data4*(1.0-ratio_u1)+data5*ratio_u1)*(1.0-ratio_v1)+(data6*(1.0-ratio_u1)+data7*ratio_u1)*ratio_v1,最后的滤波结果为((data0*(1.0-ratio_u0)+data1*ratio_u0)*(1.0-ratio_v0)+(data2*(1.0-ratio_u0)+data3*ratio_u0)*ratio_v0)*(1.0-ratio_l)+((data4*(1.0-ratio_u1)+data5*ratio_u1)*(1.0-ratio_v1)+(data6*(1.0-ratio_u1)+data7*ratio_u1)*ratio_v1)*ratio_l,如图9所示;mode为3D时,从data buffer0、data buffer1同时依次读取8个数据,分别为data0、data1、data2、data3、data4、data5、data6、data7和data8、data9、data10、data11、data12、data13、data14、data15。首先通过前8个数据data0、data1、data2、data3、data8、data9、data10、data11得到滤波中间结果分别为:((data0*(1.0-ratio_u0)+data2*ratio_u0)*(1.0-ratio_v0)+((data1*(1.0-ratio_u0)+data3*ratio_u0)*ratio_v0、((data8*(1.0-ratio_u1)+data9*ratio_u1)*(1.0-ratio_v1)+((data10*(1.0-ratio_u1)+data11*ratio_u1)*ratio_v1,然后再通过后8个数据data4、data5、data6、data7、data12、data13、data14、data15得到滤波中间结果分别为:((data4*(1.0-ratio_u0)+data5*ratio_u0)* (1.0-ratio_v0)+((data6*(1.0-ratio_u0)+data7*ratio_u0)*ratio_v0、((data12*(1.0-ratio_u1)+data13*ratio_u1)*(1.0-ratio_v1)+((data14*(1.0-ratio_u1)+data15*ratio_u1)*ratio_v1;最后得到滤波的最终结果为:((((data0*(1.0-ratio_u0)+data2*ratio_u0)*(1.0-ratio_v0)+((data1*(1.0-ratio_u0)+data3*ratio_u0)*ratio_v0)*(1.0-ratio_w0)+(((data4*(1.0-ratio_u0)+data5*ratio_u0)*(1.0-ratio_v0)+((data6*(1.0-ratio_u0)+data7*ratio_u0)*ratio_v0)*ratio_w0)*(1.0-ratio_l)+((((data8*(1.0-ratio_u1)+data9*ratio_u1)*(1.0-ratio_v1)+((data10*(1.0-ratio_u1)+data11*ratio_u1)*ratio_v1)*(1.0-ratio_w1)+(((data12*(1.0-ratio_u1)+data13*ratio_u1)*(1.0-ratio_v1)+((data14*(1.0-ratio_u1)+data15*ratio_u1)*ratio_v1)*ratio_w1)*ratio_l,如图10所示,当使能了anisotropic时,此时对data buffer0、data buffer1中的数据分别进行anisotropic计算后得到data0、data1的中间滤波结果,那么最终的滤波结果为data0*(1.0-ratio_l)+data1*ratio_l;当只有level0有效,而filter_type为near、near_mipmap_near时,无论mode为1D、2D、3D,均同时从data buffer0、data buffer1中同时读取一个数据data0和data1,对data0、data1格式进行转换之后直接输出,不经过滤波;如果滤波方式为BAF,当mode为1D时,首先从data buffer0、data buffer1中同时依次读取1个数据,分别为data0、data1,最终的滤波结果为data0*(1.0-ratio_u0)+data2*ratio_u0;当mode为2D,从data buffer0、data buffer1中同时依次读取2个数据data0、data1、data2、data3,那么首先通过前2个数据data0、data2、得到滤波的中间结果为data0*(1.0-ratio_u0)+data2*ratio_u0,如下图5所示;然后再通过后2个数据data1、data3得到滤波的中间结果为data1*(1.0-ratio_u0)+data3*ratio_u0,最后得到滤波的最终结果为(data0*(1.0-ratio_u0)+data2*ratio_u0)*(1.0-ratio_l)+(data1*(1.0-ratio_u0)+data3*ratio_u0)*ratio_l,如下图6所示;mode为3D时,从data buffer0、data buffer1同时依次读取4个数据,分别为data0、data1、data2、data3和data4、data5、data6、data7。首先通过前4个数据data0、data1、data4、data5得到滤波中间结果为:(data0*(1.0-ratio_u0)+data1*ratio_u0)*(1.0-ratio_v0)+(data4*(1.0-ratio_u0)+data5*ratio_u0)*ratio_v0,然后再通过后4个数据data2、data3、data6、data7得到滤波中间结果分别为:(data2*(1.0-ratio_u0)+data3*ratio_u0)*(1.0-ratio_v0)+(data6*(1.0-ratio_u0)+data7*ratio_u0)*ratio_v0;最后得到滤波的最终结果为:((data0*(1.0-ratio_u0)+data1*ratio_u0)*(1.0-ratio_v0)+(data4*(1.0-ratio_u0)+data5*ratio_u0)*ratio_v0)*(1.0-ratio_w0)+((data2*(1.0-ratio_u0)+data3*ratio_u0)*(1.0-ratio_v0)+(data6*(1.0-ratio_u0)+data7*ratio_u0)*ratio_v0)*ratio_w0,如下图7所示。当执行滤波操作后,按照不同的inte_format格式,对filter的输出结果为texel_r、texel_g、texel_b、texel_a进行赋值操作,如果format为color,当inte_format只有其中的r有值,此时texel_r为滤波结果,texel_g、texel_b均为0,texel_a为1;如果format为depth、stencil,此时在不执行滤波的添加下,将结果赋值到texel_r、texel_g分量,texel_b、texel_a分量为0。pixel单元U2:当使能border_color时,采用border_color数据作为pixel阶段的输入数据,当没有使能swizzle操作时,pixel_r、pixel_g、pixel_b、pixel_a与border_color中的border_color_r、border_color_g、border_color_b、border_color_a相等,如果使能了swizzle操作,则按照swizzle模式分别对各自通道数据进行转换操作,最终并行输出4路颜色分量pixel_r、pixel_g、pixel_b、pixel_a。
虽然在上文中已经参考实施例对本公开进行了描述,然而在不脱离本公开的范围的情况下,可以对其进行各种改进并且可以用等效物替换其中的部件。尤其是,只要不存在结构冲突,本公开所披露的实施例中的各项特征均可通过任意方式相互结合起来使用,在本说明书中未对这些组合的情况进行穷举性的描述仅仅是出于省略篇幅和节约资源的考虑。因此,本公开并不局限于文中公开的特定实施例,而是包括落入权利要求的范围内的所有技术方案。

Claims (3)

  1. 一种基于双Buffer架构下的纹理贴图硬件加速器,其特征在于,包括:
    Image U0单元:对image基本信息进行存储,当使能mipmap纹理时,通过target以及不同map层作为地址,存储对应image的mode、width、height、depth、border、inte_format、format、type、base;当使能layer层时,通过target以及不同的layer层作为地址,存储对应层的mode、width、height、depth、border、inte_format、format、type、base值,当使能cubemap时,将mipmap层的一个地址进行细分为6个,分别代表0,1,2,3,4,5不同的face信息,当使能layer层没有map层信息时,不同layer层的mode、width、height、depth、border、inte_format、format、type相同,base不同;当使能layer并使能map层时的mode、width、height、depth、border、inte_format、format、type相同,base不同;支持1D、2D、3D、rectangle、cubemap、1D_ARRAY、2D_ARRAY、cubemap_array、2D_multisample、2D_multisample、2D_multisample_array模式下的寄存器配置;
    LOD U1单元:完成对不同滤波模式下的level值计算,结合访问target地址得到访问image单元的地址;在计算level值之前,首先需要通过target和base_level值为level0读取image单元,获得image的基本信息作为后续level计算时的参考,然后level值的计算考虑两种情况:当使能lod时,如果image为layer模式,此时,不同层的width、height信息相等,无论滤波模式为mag_filter还是min_filter,都取整取最靠近base_level方向的一个level值为level0读取image信息的offset,而filter_type大小与请求的滤波大小相匹配;当使能lod时,如果image为mipmap模式,此时,不同层的width、height、depth不相等,考虑mag_filter在near和linear模式下取整取最靠近base_level值读取image信息的offset,考虑min_filter模式下near、linear、near_mipmap_near、linear_mipmap_near取最不靠近base_level值为level0读取image的offset,而filter_type分别和请求的滤波模式相匹配,考虑min_filter在near_mipmap_linear、linear_mipmap_linear模式下取整取临近的两层作为读取image信息的offset,ratio_l为lod值减去level值的小数部分,此时称为lod的整数部分为level0,level0加1为level1,如果lod值为min_lod,此时level0与level1相同,所以取fiter_type分别为near_mipmap_near和linear_mipmap_near滤波;同理,当使能偏导数作为lod时,按照raster传递过来的图元类型primitive、dux、duy、dvx、dvy、dwx、dwy、delt_x、delt_y,分为polygon/point和line两种情况,分别计算得到polygon/point和line的lod,如果image为layer模式,此时,不同层的width、height信息相等,无论滤波模式为mag_filter还是min_filter,都取整取最靠近base_level方向的一个level值为level0读取image信息的offset,而filter_type大小与请求的滤波大小相匹配;如果image为mipmap模式,此时,不同层的width、height、depth不相等,考虑mag_filter在near和linear模式下取整取最靠近base_level值读取image信息的offset,考虑min_filter模式下near、linear、near_mipmap_near、linear_mipmap_near取最不靠近base_level值为level0读取image的offset,而filter_type分别和请求的滤波模式相匹配,考虑min_filter在near_mipmap_linear、linear_mipmap_linear模式下取整取临近的两层作为读取image信息的offset,ratio_l为lod值减去level值的小数部分,此时称为lod的整数部分为level0,level0加1为level1,如果lod值为min_lod,此时level0与level1相同,所以取fiter_type分别为near_mipmap_near和linear_mipmap_near滤波;如果使能了level0和level1,则考虑为trilinear滤波方式,有以下滤波方式:trilinear isotropic(near_mipmap_linear、linear_mipmap_linear)、trilinear anisotropic;如果只有level0有效,有以下滤波方式:point isotropic(near、 near_mipmap_near)、bilinear isotropic(linear、linear_mipmap_near)、bilinear anisotropic;
    CoordinateU2单元:完成对fetch、sampler模式下的s、t、r、q的坐标、地址转换;当使能cubemap_array时,此时的Q坐标不为0,表示layer行号,s、t、r分别表示x,y,z方向上的大小,通过映射关系得到平面坐标内的s、t坐标;当使能rectangle模式,此时的s、t坐标不需要进行解归一化处理;如果s、t、r坐标超出了各自的表示范围,采用不同的wrap模式对坐标进行约束;当使能level0和level1时,从image单元得到level0和level1各自的width、height、depth值,分别与s、t、r相乘,得到解归一化后的纹理坐标u0,v0,w0和u1,v1,w1,当只有level0有效时,从image单元得到level0的width、height、depth值,分别与s、t、r相乘,得到解归一化后的纹理坐标u0,v0,w0;此时的ratio_u0、ratio_v0、ratio_w0分别为u0、v0、w0的小数部分,ratio_u1、ratio_v1、ratio_w1分别为u1、v1、w1的小数部分,inte_u0,inte_v0,inte_w0分别为u0、v0、w0的整数部分,inte_u1,inte_v1,inte_w1分别为u1、v1、w1的整数部分;执行wrap操作时,如果image内容中的borde值有值,并且此时地址已经溢出,此时disable请求纹素,并使能border_color值作为最终pixel阶段的输入;
    Coordinate controller U3单元:当使能level0和level1时,filter_type为point模式时,mode为1D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferu1的数据为inte_u1;mode为2D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferv0的数据为inte_v0;写入coordinate bufferu1的数据为inte_u1,写入coordinate bufferv1的整数部分为inte_v1;mode为3D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferv0的数据为inte_v0,写入coordinate bufferw0的数据为inte_w0,写入coordinate bufferu1的数据为inte_u1,写入coordinate bufferv1的数据为inte_v1,写入coordinatew1的数据为inte_w1;filter_type为linear模式时,mode为1D时,写入coordinate bufferu1的数据为inte_u1,写入coordinate bufferu1的数据为inte_u1+1;写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferu0的数据为inte_u0+1;mode为2D时,写入coordinate bufferu0、coordinate bufferv0的数据依次为:(inte_u0,inte_v0)、(inte_u0+1,inte_v0)、(inte_u0,inte_v0+1)、(inte_u0+1,inte_v0+1);写入coordinate bufferu1、coordinate bufferv1的数据依次为:(inte_u1,inte_v1)、(inte_u1+1,inte_v1)、(inte_u1,inte_v1+1)、(inte_u1+1,inte_v1+1);mode为3D时,写入coordinate bufferu0、coordinate bufferv0、coordinate bufferw0的数据依次为:(inte_u0,inte_v0,inte_w0)、(inte_u0+1,inte_v0,inte_w0)、(inte_u0,inte_v0+1,inte_w0)、(inte_u0+1,inte_v0+1,inte_w0)、(inte_u0,inte_v0,inte_w0+1)、(inte_u0+1,inte_v0,inte_w0+1)、(inte_u0,inte_v0+1,inte_w0+1)、(inte_u0+1,inte_v0+1,inte_w0+1);写入coordinate bufferu1、coordinate bufferv1、coordinate bufferw1的数据依次为:(inte_u1,inte_v1,inte_w1)、(inte_u1+1,inte_v1,inte_w1)、(inte_u1,inte_v1+1,inte_w1)、(inte_u1+1,inte_v1+1,inte_w1)、(inte_u1,inte_v1,inte_w1+1)、(inte_u1+1,inte_v1,inte_w1+1)、(inte_u1,inte_v1+1,inte_w1+1)、(inte_u1+1,inte_v1+1,inte_w1+1);当使能level0时,filter_type为point模式时,mode为1D时,写入coordinate bufferu0的数据为inte_u0;mode为2D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferv0的数据为inte_v0;mode为3D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferv0的数据为 inte_v0,写入coordinate bufferw0的数据为inte_w0;filter_type为linear模式时,mode为1D时,写入coordinate bufferu0的数据为inte_u0,写入coordinate bufferu0的数据为inte_u0+1;mode为2D时,写入coordinate bufferu0、coordinate bufferv0的数据依次为:(inte_u0,inte_v0)、(inte_u0+1,inte_v0)、(inte_u0,inte_v0+1)、(inte_u0+1,inte_v0+1);mode为3D时,写入coordinate bufferu0、coordinate bufferv0、coordinate bufferw0的数据依次为:(inte_u0,inte_v0,inte_w0)、(inte_u0+1,inte_v0,inte_w0)、(inte_u0,inte_v0+1,inte_w0)、(inte_u0+1,inte_v0+1,inte_w0)、(inte_u0,inte_v0,inte_w0+1)、(inte_u0+1,inte_v0,inte_w0+1)、(inte_u0,inte_v0+1,inte_w0+1)、(inte_u0+1,inte_v0+1,inte_w0+1)。
    address controller U4单元:首先完成纹理坐标到纹理偏移地址的计算;对于level0有效时,mode为1D,在地址计算没有溢出时的偏移量为size*u0;mode为2D,在地址计算没有溢出时的偏移量为size*(width0*u0+v0);mode为3D,在地址计算没有溢出时的偏移量为size*(width0*u0+v0)+w0*width0*height0;得到最终访问texel cache的地址为base0+偏移量。然后根据偏移量的末尾与4字节对齐方式,得到不同inte_format条件下的地址个数,并将末尾数据保存于offset0 buffer中;由于level1无效,所以请求texel cache时,按照双buffer操作方式,奇数个地址请求texel cache地址访问cache0,偶数个地址请求访问cache1,实现地址的并行性访问。对于level0和level1均有效时,mode为1D,在地址计算没有溢出时的偏移量为size*u0,size*u1;mode为2D,在地址计算没有溢出时的偏移量为size*(width0*u0+v0),size*(width1*u1+v1);mode为3D,在地址计算没有溢出时的偏移量为size*(width0*u0+v0)+w0*width0*height0,size*(width1*u1+v1)+w1*width1*height1。得到最终访问texel cache的地址为base0+level0偏移量,base1+level1偏移量。此时,并行请求cache0和cache1。
  2. 根据权利要求1所述的基于双Buffer架构下的纹理贴图硬件加速器,其特征在于:所述LOD U1单元包括两个直接相连cache,完成对不同texel所在cache行的索引以及cache line的store、replace操作;当level0和level1同时有效时,并行完成对cache0和cache1的读操作请求,当只有level0有效时,cache0中存储odd cache line,cache1中存储even cache line。
  3. 根据权利要求1所述的基于双Buffer架构下的纹理贴图硬件加速器,其特征在于:所述CoordinateU2单元包括:
    data controllerU0单元:当level0和level1同时有效时,分别按照不同的inte_format,结合off0和off1,从一个cache line中完成数据的拼接任务,得到与纹理地址对应的纹理数据,并将各自数据同时写入到data buffer0和data buffer1中,并且data buffer0、data buffer1分别存放各自level的数据;当只有level0有效时,同理,从cache0和cache1中分别读出各自cache line的数据根据不同的inte_format以及off0,得到odd data和even data,按照双方式写入data buffer0和data buffer1,此时data buffer0和data buffer1中存放的是同一个level的纹素数据;
    filterU1单元:首先完成截位运算,针对不同的inte_format,截位出不同位宽的r、g、b、a值,然后采用独立方式分别进行滤波计算,位宽的截取方法按照不同的inte_format执行;当level0和level1均有效,此时filter_type的滤波方式有NAF(non-anisotropic)(near_mipmap_linear isotropic、linear_mipmap_linear isotropic)、BAF(bilinear-anisotropic)(无效)、TAF(trilinear-anisotropic),当level0有效,level1无效时,此时filter_type的滤波方式有NAF (non-anisotropic)(near、near_mipmap_near、linear_mipmap_near)、BAF(bilinear anisotropic)、TAF(trilinear-anisotropic)(无效);当level0和level1同时有效,而filter_type为TAF(near_mipmap_linear)时,无论mode为1D、2D、3D,均同时从data buffer0、data buffer1中同时读取一个数据data0和data1,得到滤波结果为data0*(1.0-ratio_l)+data1*ratio_l;如果滤波方式为TAF(linear_mipmap_linear),mode为1D时,首先从data buffer0、data buffer1中同时依次读取2个数据,分别为data0、data1和data2、data3,滤波中间结果为data0*(1.0-ratio_u0)+data2*ratio_u0、data1*(1.0-ratio_u1)+data3*ratio_u1,得到滤波最终结果为(data0*(1.0-ratio_u0)+(data2*ratio_u0)*(1.0-ratio_l)+(data1*1.0-ratio_u1)+(data3*ratio_u1)*ratio_l;mode为2D,从data buffer0、data buffer1中同时依次读取4个数据data0、data1、data2、data3以及data4、data5、data6、data7,那么首先通过前4个数据data0、data2、data4、data6得到滤波的中间结果为data0*(1.0-ratio_u0)+data2*ratio_u0、data4*(1.0-ratio_u1)+data6*ratio_u1;然后再通过后4个数据data1、data3、data5、data7得到滤波的中间结果为data1*(1.0-ratio_u0)+data3*ratio_u0、data5*(1.0-ratio_u1)+data7*ratio_u1,最后分别得到level0和level1的最终结果为(data0*(1.0-ratio_u0)+data1*ratio_u0)*(1.0-ratio_v0)+(data2*(1.0-ratio_u0)+data3*ratio_u0*ratio_v0、(data4*(1.0-ratio_u1)+data5*ratio_u1)*(1.0-ratio_v1)+(data6*(1.0-ratio_u1)+data7*ratio_u1)*ratio_v1,最后的滤波结果为((data0*(1.0-ratio_u0)+data1*ratio_u0)*(1.0-ratio_v0)+(data2*(1.0-ratio_u0)+data3*ratio_u0)*ratio_v0)*(1.0-ratio_l)+((data4*(1.0-ratio_u1)+data5*ratio_u1)*(1.0-ratio_v1)+(data6*(1.0-ratio_u1)+data7*ratio_u1)*ratio_v1)*ratio_l;mode为3D时,从data buffer0、data buffer1同时依次读取8个数据,分别为data0、data1、data2、data3、data4、data5、data6、data7和data8、data9、data10、data11、data12、data13、data14、data15;首先通过前8个数据data0、data1、data2、data3、data8、data9、data10、data11得到滤波中间结果分别为:((data0*(1.0-ratio_u0)+data2*ratio_u0)*(1.0-ratio_v0)+((data1*(1.0-ratio_u0)+data3*ratio_u0)*ratio_v0、((data8*(1.0-ratio_u1)+data9*ratio_u1)*(1.0-ratio_v1)+((data10*(1.0-ratio_u1)+data11*ratio_u1)*ratio_v1,然后再通过后8个数据data4、data5、data6、data7、data12、data13、data14、data15得到滤波中间结果分别为:((data4*(1.0-ratio_u0)+data5*ratio_u0)*(1.0-ratio_v0)+((data6*(1.0-ratio_u0)+data7*ratio_u0)*ratio_v0、((data12*(1.0-ratio_u1)+data13*ratio_u1)*(1.0-ratio_v1)+((data14*(1.0-ratio_u1)+data15*ratio_u1)*ratio_v1;最后得到滤波的最终结果为:((((data0*(1.0-ratio_u0)+data2*ratio_u0)*(1.0-ratio_v0)+((data1*(1.0-ratio_u0)+data3*ratio_u0)*ratio_v0)*(1.0-ratio_w0)+(((data4*(1.0-ratio_u0)+data5*ratio_u0)*(1.0-ratio_v0)+((data6*(1.0-ratio_u0)+data7*ratio_u0)*ratio_v0)*ratio_w0)*(1.0-ratio_l)+((((data8*(1.0-ratio_u1)+data9*ratio_u1)*(1.0-ratio_v1)+((data10*(1.0-ratio_u1)+data11*ratio_u1)*ratio_v1)*(1.0-ratio_w1)+(((data12*(1.0-ratio_u1)+data13*ratio_u1)*(1.0-ratio_v1)+((data14*(1.0-ratio_u1)+data15*ratio_u1)*ratio_v1)*ratio_w1)*ratio_l,当使能了anisotropic时,此时对data buffer0、data buffer1 中的数据分别进行anisotropic计算后得到data0、data1的中间滤波结果,那么最终的滤波结果为data0*(1.0-ratio_l)+data1*ratio_l;当只有level0有效,而filter_type为near、near_mipmap_near时,无论mode为1D、2D、3D,均同时从data buffer0、data buffer1中同时读取一个数据data0和data1,对data0、data1格式进行转换之后直接输出,不经过滤波;如果滤波方式为BAF,当mode为1D时,首先从data buffer0、data buffer1中同时依次读取1个数据,分别为data0、data1,最终的滤波结果为data0*(1.0-ratio_u0)+data2*ratio_u0;当mode为2D,从data buffer0、data buffer1中同时依次读取2个数据data0、data1、data2、data3,那么首先通过前2个数据data0、data2、得到滤波的中间结果为data0*(1.0-ratio_u0)+data2*ratio_u0;然后再通过后2个数据data1、data3得到滤波的中间结果为data1*(1.0-ratio_u0)+data3*ratio_u0,最后得到滤波的最终结果为(data0*(1.0-ratio_u0)+data2*ratio_u0)*(1.0-ratio_l)+(data1*(1.0-ratio_u0)+data3*ratio_u0)*ratio_l;mode为3D时,从data buffer0、data buffer1同时依次读取4个数据,分别为data0、data1、data2、data3和data4、data5、data6、data7。首先通过前4个数据data0、data1、data4、data5得到滤波中间结果为:(data0*(1.0-ratio_u0)+data1*ratio_u0)*(1.0-ratio_v0)+(data4*(1.0-ratio_u0)+data5*ratio_u0)*ratio_v0,然后再通过后4个数据data2、data3、data6、data7得到滤波中间结果分别为:(data2*(1.0-ratio_u0)+data3*ratio_u0)*(1.0-ratio_v0)+(data6*(1.0-ratio_u0)+data7*ratio_u0)*ratio_v0;最后得到滤波的最终结果为:((data0*(1.0-ratio_u0)+data1*ratio_u0)*(1.0-ratio_v0)+(data4*(1.0-ratio_u0)+data5*ratio_u0)*ratio_v0)*(1.0-ratio_w0)+((data2*(1.0-ratio_u0)+data3*ratio_u0)*(1.0-ratio_v0)+(data6*(1.0-ratio_u0)+data7*ratio_u0)*ratio_v0)*ratio_w0。当执行滤波操作后,按照不同的inte_format格式,对filter的输出结果为texel_r、texel_g、texel_b、texel_a进行赋值操作,如果format为color,当inte_format只有其中的r有值,此时texel_r为滤波结果,texel_g、texel_b均为0,texel_a为1;如果format为depth、stencil,此时在不执行滤波的添加下,将结果赋值到texel_r、texel_g分量,texel_b、texel_a分量为0;
    pixel单元U2:当使能border_color时,采用border_color数据作为pixel阶段的输入数据,当没有使能swizzle操作时,pixel_r、pixel_g、pixel_b、pixel_a与border_color中的border_color_r、border_color_g、border_color_b、border_color_a相等,如果使能了swizzle操作,则按照swizzle模式分别对各自通道数据进行转换操作,最终并行输出4路颜色分量pixel_r、pixel_g、pixel_b、pixel_a。
    根据权利要求3所述的基于双Buffer架构下的纹理贴图硬件加速器,其特征在于:支持Color、depth、stencil、depth_stencil模式下的FP32、FP16、FP11、FP10、INT32数据类型。
    根据权利要求3所述的基于双Buffer架构下的纹理贴图硬件加速器,其特征在于:还支持RGB/BGR format下不同整形、浮点类型type、RGBA/BGRA format下不同整形、浮点类型type的转换。
    根据权利要求3所述的基于双Buffer架构下的纹理贴图硬件加速器,其特征在于:还支持depth、stencil、depth_stencil对深度纹理的比较以及stencil index计算。
PCT/CN2020/095464 2019-06-10 2020-06-10 一种基于双Buffer架构下的纹理贴图硬件加速器 WO2020249026A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2021573726A JP7227404B2 (ja) 2019-06-10 2020-06-10 デュアルBufferアーキテクチャに基づくテクスチャマッピング用ハードウェアアクセラレータ
KR1020227000824A KR20220019791A (ko) 2019-06-10 2020-06-10 듀얼 Buffer 아키텍처에 기반한 텍스처 매핑 하드웨어 가속기
US17/617,596 US20220327759A1 (en) 2019-06-10 2020-06-10 Texture Mapping Hardware Accelerator Based on Double Buffer Architecture
EP20823324.7A EP3982255A4 (en) 2019-06-10 2020-06-10 DUAL-BUFFER ARCHITECTURE-BASED TEXTURE MAP HARDWARE ACCELERATOR

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910495890.0A CN112070651A (zh) 2019-06-10 2019-06-10 一种基于双Buffer架构下的纹理贴图硬件加速器
CN201910495890.0 2019-06-10

Publications (1)

Publication Number Publication Date
WO2020249026A1 true WO2020249026A1 (zh) 2020-12-17

Family

ID=73657991

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/095464 WO2020249026A1 (zh) 2019-06-10 2020-06-10 一种基于双Buffer架构下的纹理贴图硬件加速器

Country Status (6)

Country Link
US (1) US20220327759A1 (zh)
EP (1) EP3982255A4 (zh)
JP (1) JP7227404B2 (zh)
KR (1) KR20220019791A (zh)
CN (1) CN112070651A (zh)
WO (1) WO2020249026A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116433464A (zh) * 2023-06-14 2023-07-14 北京象帝先计算技术有限公司 存储地址偏移量计算装置、方法、电子组件及电子设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20230122931A (ko) 2022-02-15 2023-08-22 연세대학교 산학협력단 심방세동 진단을 위한 엑소좀 내 긴 비암호화 RNA(long non-coding RNA) 바이오마커 및 이의 용도

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020171672A1 (en) * 2001-05-18 2002-11-21 Sun Microsystems, Inc. Graphics data accumulation for improved multi-layer texture performance
CN102903146A (zh) * 2012-09-13 2013-01-30 中国科学院自动化研究所 用于场景绘制的图形处理方法
CN109064535A (zh) * 2018-07-19 2018-12-21 芯视图(常州)微电子有限公司 Gpu中一种纹理贴图的硬件加速实现方法
CN109634666A (zh) * 2018-12-11 2019-04-16 华夏芯(北京)通用处理器技术有限公司 一种预取机制下融合btb的方法

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5594854A (en) * 1995-03-24 1997-01-14 3Dlabs Inc. Ltd. Graphics subsystem with coarse subpixel correction
US6331856B1 (en) * 1995-11-22 2001-12-18 Nintendo Co., Ltd. Video game system with coprocessor providing high speed efficient 3D graphics and digital audio signal processing
US6002410A (en) * 1997-08-25 1999-12-14 Chromatic Research, Inc. Reconfigurable texture cache
US6246422B1 (en) * 1998-09-01 2001-06-12 Sun Microsystems, Inc. Efficient method for storing texture maps in multi-bank memory
US6766281B1 (en) * 2000-05-12 2004-07-20 S3 Graphics Co., Ltd. Matched texture filter design for rendering multi-rate data samples
US7057623B1 (en) * 2000-11-15 2006-06-06 Micron Technology, Inc. Texture addressing circuit and method
US6778181B1 (en) * 2000-12-07 2004-08-17 Nvidia Corporation Graphics processing system having a virtual texturing array
JP4264530B2 (ja) * 2002-07-19 2009-05-20 ソニー株式会社 画像処理装置およびその方法
US6987517B1 (en) * 2004-01-06 2006-01-17 Nvidia Corporation Programmable graphics processor for generalized texturing
JP2006244426A (ja) * 2005-03-07 2006-09-14 Sony Computer Entertainment Inc テクスチャ処理装置、描画処理装置、およびテクスチャ処理方法
US8300059B2 (en) * 2006-02-03 2012-10-30 Ati Technologies Ulc Method and apparatus for selecting a mip map level based on a min-axis value for texture mapping
US20100091018A1 (en) * 2008-07-11 2010-04-15 Advanced Micro Devices, Inc. Rendering Detailed Animated Three Dimensional Characters with Coarse Mesh Instancing and Determining Tesselation Levels for Varying Character Crowd Density
US10430169B2 (en) * 2014-05-30 2019-10-01 Apple Inc. Language, function library, and compiler for graphical and non-graphical computation on a graphical processor unit
KR102329475B1 (ko) * 2014-08-27 2021-11-19 삼성전자주식회사 렌더링 퀄리티 제어 장치 및 방법
US9811875B2 (en) * 2014-09-10 2017-11-07 Apple Inc. Texture state cache
KR102264163B1 (ko) * 2014-10-21 2021-06-11 삼성전자주식회사 텍스쳐를 처리하는 방법 및 장치
US10388058B2 (en) * 2017-02-16 2019-08-20 Microsoft Technology Licensing, Llc Texture residency hardware enhancements for graphics processors

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020171672A1 (en) * 2001-05-18 2002-11-21 Sun Microsystems, Inc. Graphics data accumulation for improved multi-layer texture performance
CN102903146A (zh) * 2012-09-13 2013-01-30 中国科学院自动化研究所 用于场景绘制的图形处理方法
CN109064535A (zh) * 2018-07-19 2018-12-21 芯视图(常州)微电子有限公司 Gpu中一种纹理贴图的硬件加速实现方法
CN109634666A (zh) * 2018-12-11 2019-04-16 华夏芯(北京)通用处理器技术有限公司 一种预取机制下融合btb的方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3982255A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116433464A (zh) * 2023-06-14 2023-07-14 北京象帝先计算技术有限公司 存储地址偏移量计算装置、方法、电子组件及电子设备
CN116433464B (zh) * 2023-06-14 2023-11-17 北京象帝先计算技术有限公司 存储地址偏移量计算装置、方法、电子组件及电子设备

Also Published As

Publication number Publication date
CN112070651A (zh) 2020-12-11
EP3982255A1 (en) 2022-04-13
JP2022536738A (ja) 2022-08-18
US20220327759A1 (en) 2022-10-13
KR20220019791A (ko) 2022-02-17
JP7227404B2 (ja) 2023-02-21
EP3982255A4 (en) 2023-07-12

Similar Documents

Publication Publication Date Title
US7042462B2 (en) Pixel cache, 3D graphics accelerator using the same, and method therefor
US8149243B1 (en) 3D graphics API extension for a packed float image format
US8059144B2 (en) Generating and resolving pixel values within a graphics processing pipeline
US7724263B2 (en) System and method for a universal data write unit in a 3-D graphics pipeline including generic cache memories
WO2020249026A1 (zh) 一种基于双Buffer架构下的纹理贴图硬件加速器
CN102096897B (zh) 基于分块渲染的gpu中块存储策略的实现
US20110216068A1 (en) Edge processing techniques
US7528843B1 (en) Dynamic texture fetch cancellation
WO2013101150A1 (en) A sort-based tiled deferred shading architecture for decoupled sampling
US11508109B2 (en) Methods and apparatus for machine learning rendering
US7782334B1 (en) Pixel shader-based data array resizing
CA2558657A1 (en) Embedded system with 3d graphics core and local pixel buffer
EP3251081B1 (en) Graphics processing unit with bayer mapping
US9007389B1 (en) Texture map component optimization
US10037590B2 (en) Low-power graphics processing using fixed-function unit in graphics processing unit
CN107993184A (zh) 一种图形处理器深度值提前测试电路
US20080211823A1 (en) Three-dimensional graphic accelerator and method of reading texture data
US11257277B2 (en) Methods and apparatus to facilitate adaptive texture filtering
US11610372B2 (en) Methods and apparatus for multiple lens distortion correction
US7868902B1 (en) System and method for pixel data row forwarding in a 3-D graphics pipeline
Mody et al. Efficient Mapping of Graphic Software on DSP
KR20230130157A (ko) 인트라-웨이브 텍스처 루핑
CN112598566A (zh) 基于gpu的cuda的图像处理方法、装置和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20823324

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021573726

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20227000824

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020823324

Country of ref document: EP

Effective date: 20220110