CN106683036A - Storing and encoding method of frame buffer for efficient GPU drawing - Google Patents

Storing and encoding method of frame buffer for efficient GPU drawing Download PDF

Info

Publication number
CN106683036A
CN106683036A CN201611139601.6A CN201611139601A CN106683036A CN 106683036 A CN106683036 A CN 106683036A CN 201611139601 A CN201611139601 A CN 201611139601A CN 106683036 A CN106683036 A CN 106683036A
Authority
CN
China
Prior art keywords
units
tile
block
supertile
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611139601.6A
Other languages
Chinese (zh)
Inventor
田泽
郑新建
任向隆
卢俊
韩立敏
张骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN201611139601.6A priority Critical patent/CN106683036A/en
Publication of CN106683036A publication Critical patent/CN106683036A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/40Specific encoding of data in memory or cache
    • G06F2212/401Compressed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/455Image or video data

Abstract

The invention provides a storing and encoding method of a frame buffer for efficient GPU drawing. The method comprises the following steps: during encoding, image data in each Tile unit or Super Tile unit is encoded according to a normal encoding order; the encoding order of Tile units or Super Tile units in each Block unit is consistent with the rasterization direction, the Tile units or Super Tile units are encoded in a zigzag shape, the first Tile unit or Super Tile unit in the lower left corner is encoded first in each Block unit, and encoding is carried out in a left-to-right and then bottom-to-top order; and the encoding order of the Block units in each encoding and storing object is consistent with the rasterization direction, the Block units are encoded in a zigzag shape, the first Block unit in the lower left corner is encoded first in each encoding and storing object, and encoding is carried out in a left-to-right and then bottom-to-top order. According to the invention, the space locality of a frame buffer in graph drawing can be utilized to the maximum, the absence rate of color, depth and texture Caches can be reduced, GPU drawing can be speeded up, and the bandwidth requirement of DDR can be reduced.

Description

It is a kind of to store coded method towards the frame buffer zone that GPU efficiently draws
Technical field
The present invention relates to computer hardware technology field, more particularly to the frame buffer zone storage coded method of GPU.
Background technology
GPU needs great memory bandwidth during 3D graphic plottings, mainly due to data texturing, color data and depth number According to access, often alleviated using texture Cache, color Cache and depth Cache and corresponding compression algorithm in design DDR memory bandwidth pressure.The access of Cache is accessed by Block, while compression algorithm is to be compressed by Tile and decompress Contracting, frame buffer zone storage coded system different in DDR can greatly affect the hit rate of the efficiency and Cache of compression algorithm Efficiency is updated with Cache.
The content of the invention
The purpose of the present invention is:
The present invention describes a kind of frame buffer zone storage coded method efficiently drawn towards GPU, can be maximum Using the spatial locality of frame buffer zone during graphic plotting, the miss rate of color, depth and texture Cache is reduced, accelerate GPU's Draw, and reduce the bandwidth demand of DDR.
The technical scheme is that:
It is a kind of to store coded method towards the frame buffer zone that GPU efficiently draws, including:
Code storage object is pressed into stress and strain model and presses grid in the big Block units of several grades, each Block unit It is divided into the big Tile units of several grades or SuperTile units;Bag in each Tile unit or SuperTile units View data containing equal number;
During coding, view data is entered according to normal coded sequence in each Tile unit or SuperTile units Row coding;
The coded sequence of Tile units or SuperTile units in each Block unit is consistent with rasterization direction, Encode according to "the" shape, in each Block unit from the beginning of first, lower left corner Tile units or SuperTile units, According to from left to right, then sequential encoding from top to bottom;
The coded sequence of Block units is consistent with rasterization direction in each code storage object, according to "the" shape Coding, in each code storage object from the beginning of the Block units of first, the lower left corner, according to from left to right, then from top to bottom Sequential encoding.
The code storage object is data texturing either color data or depth data.
Described image data arrange totally 16 texels for 4 rows 4, or 4 rows 16 arrange totally 64 pixels, or 8 rows 8 arrange totally 64 Depth data.
It is an advantage of the invention that:
The code storage mode of the data texturing can ensure that the Block data that texture Cache is accessed every time Spatial locality is optimum;The Block that the code storage mode of the color data can ensure access every time with color Cache The optimum balance of bandwidth and buffering when the spatial locality of data, data compression scheme and pixel buffer show;The depth The code storage mode of degrees of data can ensure that spatial locality, the data of the Block data that depth Cache is accessed every time The optimum balance of compress mode.
Description of the drawings
Fig. 1 is a kind of texture buffer storage coded system schematic diagram efficiently drawn towards GPU in the present invention;
Fig. 2 is a kind of color buffer storage coded system schematic diagram efficiently drawn towards GPU in the present invention;
Fig. 3 is a kind of depth buffer storage coded system schematic diagram efficiently drawn towards GPU in the present invention.
Specific embodiment
Below in conjunction with the accompanying drawings and specific embodiment, technical scheme is clearly and completely stated.Obviously, The embodiment stated only is a part of embodiment of the invention, rather than the embodiment of whole, based on the embodiment in the present invention, Those skilled in the art are not making the every other embodiment that creative work premise is obtained, and belong to the guarantor of the present invention Shield scope.
It is a kind of to store coded method towards the frame buffer zone that GPU efficiently draws, including:
Code storage object is pressed into stress and strain model and presses grid in the big Block units of several grades, each Block unit It is divided into the big Tile units of several grades or SuperTile units;Bag in each Tile unit or SuperTile units View data containing equal number;
During coding, view data is entered according to normal coded sequence in each Tile unit or SuperTile units Row coding;
The coded sequence of Tile units or SuperTile units in each Block unit is consistent with rasterization direction, Encode according to "the" shape, in each Block unit from the beginning of first, lower left corner Tile units or SuperTile units, According to from left to right, then sequential encoding from top to bottom;
The coded sequence of Block units is consistent with rasterization direction in each code storage object, according to "the" shape Coding, in each code storage object from the beginning of the Block units of first, the lower left corner, according to from left to right, then from top to bottom Sequential encoding.
The code storage object is data texturing either color data or depth data.
Described image data arrange totally 16 texels for 4 rows 4, or 4 rows 16 arrange totally 64 pixels, or 8 rows 8 arrange totally 64 Depth data.
Embodiment
As shown in figure 1, a kind of store coded system, the stricture of vagina of texture buffer towards the texture buffer that GPU efficiently draws According to the difference of pinup picture pattern when reason pinup picture is accessed, pixel may need spatially adjacent 2,4,8 even More texel points are fitted, so the spatial locality of the texture Cache of texel data is provided for texture mapping just especially It is important.Using the "the" shape code storage mode based on Tile in design, the Block data of a Cache are in two-dimensional space On contain 16 Tile, 16 rows 16 arrange totally 256 texel datas, and its spatial locality can be optimal.
Texture Cache is read-only Cache, is read according to the adjacent mode of two-dimensional space from Cache when texture mapping Multiple texel datas, if texture Cache there occurs disappearance, need the data that a Block is once obtained from DDR.Adopt With the "the" shape code storage mode based on Tile, the adjacent Block data of two-dimensional space are Coutinuous store in DDR , the acquisition of DDR data can be completed by a burst transfer, can be on the basis of texel space locality optimum be ensured Reduce the access bandwidth demand of DDR.
As shown in Fig. 2 a kind of store coded system towards the color buffer that GPU efficiently draws, color buffer is in GPU The final result of fragment is drawn in storage when carrying out graphic plotting, is read and is shown on screen by display module.So face The coded format of color relief area will not only consider the access characteristics of color Cache during graphic plotting, it is also contemplated that display module is read Display characteristic when taking, while the bandwidth demand in order to reduce DDR, color buffer generally require using based on Tile or The lossless compression algorithm of SuperTile.Using the "the" shape code storage mode based on SuperTile in design, one SuperTile is a compression blocks, is compressed using lossless compression algorithm.To improve compressible, the size of SuperTile It is set as the pixel composition that 4 row 16 is arranged.Because the display of final color relief area needs to be shown line by line, if still adopted The optimum coded system of two-dimensional space locality is accomplished by caching at least 16 row data when then display module reads color buffer, And actually show line by line read characteristic be do not need so jumbo caching, in order to access in color Cache two Dimension space locality is optimum and shows that reading buffer capacity obtains between the two best balance, designs the Block of a Cache Data contain 4 SuperTile on two-dimensional space, totally 256 pixel datas.
When color Cache carries out buffer area read-write, if color Cache there occurs disappearance, need from DDR once Obtain the data of a Block.Using the "the" shape code storage mode based on Tile, a two-dimensional space is adjacent Block data are Coutinuous store in DDR, and the acquisition of DDR data can be completed by a burst transfer, can ensure picture The access bandwidth demand of DDR is reduced on the basis of plain spatial locality optimum.And display module need not be read when reading by Block Take, once only need to read the data of the SuperTile that 4 row 16 is arranged, saved the spatial cache of internal realization, data The optimum balance of bandwidth and buffering when spatial locality, data compression scheme and pixel buffer show.
As shown in figure 3, a kind of store coded system towards the depth buffer that GPU efficiently draws, depth buffer is in GPU The depth value of fragment is drawn in storage when carrying out graphic plotting, and operates the depth to subsequent segments to test by segment, To determine that those segments can be shown on screen.In order to reduce the bandwidth demand of DDR, depth buffer generally requires and adopts base In the lossless compression algorithm of Tile or SuperTile.Using the "the" shape code storage side based on SuperTile in design Formula, a SuperTile is a compression blocks, is compressed using lossless compression algorithm.To improve compressible, The size of SuperTile is set as the segment depth value composition that 8 row 8 is arranged, and designs the Block data of a Cache two-dimentional empty Between on contain 4 SuperTile, totally 256 segment depth datas.
When depth Cache carries out buffer area read-write, if depth Cache there occurs disappearance, need from DDR once Obtain the data of a Block.Using the "the" shape code storage mode based on Tile, a two-dimensional space is adjacent Block data are Coutinuous store in DDR, and the acquisition of DDR data can be completed by a burst transfer, can ensure picture The access bandwidth demand of DDR is reduced on the basis of plain spatial locality optimum.
Finally it should be noted that above example is only to illustrate technical scheme, rather than a limitation;Although The present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those within the art that;It still may be used To modify to the technical scheme that foregoing embodiments are recorded, or equivalent is carried out to which part technical characteristic;And These modifications are replaced, and do not make the spirit and model of the essence disengaging various embodiments of the present invention technical scheme of appropriate technical solution Enclose.

Claims (3)

1. it is a kind of to store coded method towards the frame buffer zone that GPU efficiently draws, it is characterised in that to include:
Code storage object is pressed into stress and strain model and presses stress and strain model in the big Block units of several grades, each Block unit Into the big Tile units of several grades or SuperTile units;Phase is included in each Tile unit or SuperTile units With the view data of quantity;
During coding, view data is compiled according to normal coded sequence in each Tile unit or SuperTile units Code;
The coded sequence of Tile units or SuperTile units in each Block unit is consistent with rasterization direction, according to "the" shape is encoded, in each Block unit from the beginning of first, lower left corner Tile units or SuperTile units, according to From left to right, sequential encoding then from top to bottom;
The coded sequence of Block units is consistent with rasterization direction in each code storage object, encodes according to "the" shape, In each code storage object from the beginning of the Block units of first, the lower left corner, according to from left to right, then order from top to bottom Coding.
2. a kind of frame buffer zone storage coded method efficiently drawn towards GPU as claimed in claim 1, it is characterised in that The code storage object is data texturing either color data or depth data.
3. a kind of frame buffer zone storage coded method efficiently drawn towards GPU as claimed in claim 1, it is characterised in that Described image data arrange totally 16 texels for 4 rows 4, or 4 rows 16 arrange totally 64 pixels, or 8 rows 8 arrange totally 64 depth numbers According to.
CN201611139601.6A 2016-12-12 2016-12-12 Storing and encoding method of frame buffer for efficient GPU drawing Pending CN106683036A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611139601.6A CN106683036A (en) 2016-12-12 2016-12-12 Storing and encoding method of frame buffer for efficient GPU drawing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611139601.6A CN106683036A (en) 2016-12-12 2016-12-12 Storing and encoding method of frame buffer for efficient GPU drawing

Publications (1)

Publication Number Publication Date
CN106683036A true CN106683036A (en) 2017-05-17

Family

ID=58869322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611139601.6A Pending CN106683036A (en) 2016-12-12 2016-12-12 Storing and encoding method of frame buffer for efficient GPU drawing

Country Status (1)

Country Link
CN (1) CN106683036A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107993184A (en) * 2017-11-24 2018-05-04 中国航空工业集团公司西安航空计算技术研究所 A kind of graphics processor depth value shifts to an earlier date test circuit
CN108009978A (en) * 2017-11-24 2018-05-08 中国航空工业集团公司西安航空计算技术研究所 A kind of non-parallel triangle rasterization cellular construction of obstruction
CN109614086A (en) * 2018-11-14 2019-04-12 西安翔腾微电子科技有限公司 TLM model and realization structure are stored towards GPU texture buffer data based on SystemC
CN110223369A (en) * 2019-06-06 2019-09-10 西安博图希电子科技有限公司 Frame buffer write-back method, device and the computer storage medium of TBR framework
WO2020190797A1 (en) * 2019-03-15 2020-09-24 Intel Corporation Systems and methods for updating memory side caches in a multi-gpu configuration
WO2020190776A1 (en) * 2019-03-15 2020-09-24 Intel Corporation Synchronizing encrypted workloads across multiple graphics processing units
CN112734897A (en) * 2020-12-05 2021-04-30 西安翔腾微电子科技有限公司 Graphics processor depth data prefetching method triggered by primitive rasterization
WO2022095010A1 (en) * 2020-11-09 2022-05-12 Qualcomm Incorporated Methods and apparatus for rasterization of compute workloads
US11842423B2 (en) 2019-03-15 2023-12-12 Intel Corporation Dot product operations on sparse matrix elements
US11934342B2 (en) 2019-03-15 2024-03-19 Intel Corporation Assistance for hardware prefetch in cache access

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101243611A (en) * 2005-08-12 2008-08-13 微软公司 Efficient coding and decoding of transform blocks
CN103220507A (en) * 2012-01-19 2013-07-24 中兴通讯股份有限公司 Method and system for video coding and decoding
CN103793893A (en) * 2012-10-26 2014-05-14 辉达公司 Primitive re-ordering between world-space and screen-space pipelines with buffer limited processing
CN106210729A (en) * 2015-05-06 2016-12-07 扬智科技股份有限公司 Decoding video stream system and method for decoding video stream

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101243611A (en) * 2005-08-12 2008-08-13 微软公司 Efficient coding and decoding of transform blocks
CN103220507A (en) * 2012-01-19 2013-07-24 中兴通讯股份有限公司 Method and system for video coding and decoding
CN103793893A (en) * 2012-10-26 2014-05-14 辉达公司 Primitive re-ordering between world-space and screen-space pipelines with buffer limited processing
CN106210729A (en) * 2015-05-06 2016-12-07 扬智科技股份有限公司 Decoding video stream system and method for decoding video stream

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YACINE AMARA ET AL.: "A GPU Tile-Load-Map architecture for terrain rendering: theory and applications", 《THE VISUAL COMPUTER》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009978B (en) * 2017-11-24 2021-04-20 中国航空工业集团公司西安航空计算技术研究所 Non-blocking parallel triangular rasterization unit structure
CN108009978A (en) * 2017-11-24 2018-05-08 中国航空工业集团公司西安航空计算技术研究所 A kind of non-parallel triangle rasterization cellular construction of obstruction
CN107993184A (en) * 2017-11-24 2018-05-04 中国航空工业集团公司西安航空计算技术研究所 A kind of graphics processor depth value shifts to an earlier date test circuit
CN109614086A (en) * 2018-11-14 2019-04-12 西安翔腾微电子科技有限公司 TLM model and realization structure are stored towards GPU texture buffer data based on SystemC
CN109614086B (en) * 2018-11-14 2022-04-05 西安翔腾微电子科技有限公司 GPU texture buffer area data storage hardware and storage device based on SystemC and TLM models
WO2020190797A1 (en) * 2019-03-15 2020-09-24 Intel Corporation Systems and methods for updating memory side caches in a multi-gpu configuration
WO2020190776A1 (en) * 2019-03-15 2020-09-24 Intel Corporation Synchronizing encrypted workloads across multiple graphics processing units
US11709793B2 (en) 2019-03-15 2023-07-25 Intel Corporation Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format
US11842423B2 (en) 2019-03-15 2023-12-12 Intel Corporation Dot product operations on sparse matrix elements
US11899614B2 (en) 2019-03-15 2024-02-13 Intel Corporation Instruction based control of memory attributes
US11934342B2 (en) 2019-03-15 2024-03-19 Intel Corporation Assistance for hardware prefetch in cache access
US11954063B2 (en) 2019-03-15 2024-04-09 Intel Corporation Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format
US11954062B2 (en) 2019-03-15 2024-04-09 Intel Corporation Dynamic memory reconfiguration
CN110223369A (en) * 2019-06-06 2019-09-10 西安博图希电子科技有限公司 Frame buffer write-back method, device and the computer storage medium of TBR framework
WO2022095010A1 (en) * 2020-11-09 2022-05-12 Qualcomm Incorporated Methods and apparatus for rasterization of compute workloads
CN112734897A (en) * 2020-12-05 2021-04-30 西安翔腾微电子科技有限公司 Graphics processor depth data prefetching method triggered by primitive rasterization
CN112734897B (en) * 2020-12-05 2024-04-02 西安翔腾微电子科技有限公司 Graphics processor depth data prefetching method triggered by primitive rasterization

Similar Documents

Publication Publication Date Title
CN106683036A (en) Storing and encoding method of frame buffer for efficient GPU drawing
TWI544751B (en) Reformatting data to decrease bandwidth between a video encoder and a buffer
US8139075B2 (en) Color packing glyph textures with a processor
TW200917228A (en) Compression of multiple-sample-anti-aliasing tile data in a graphics pipeline
Hasselgren et al. Efficient depth buffer compression
CN101896941A (en) Unified compression/decompression graphics architecture
CN108022269A (en) A kind of modeling structure of GPU compressed textures storage Cache
US20130278601A1 (en) Method and Apparatus for Processing Texture Mapping in Computer Graphics by Biasing Level of Detail According to Image Content and Computer Readable Storage Medium Storing the Method
US20140028693A1 (en) Techniques to request stored data from a memory
CN109064535B (en) Hardware acceleration implementation method for texture mapping in GPU
CN111402380B (en) GPU compressed texture processing method
GB2487421A (en) Tile Based Depth Buffer Compression
CN110214338A (en) Application of the increment color compressed to video
CN107153617A (en) For the cache architecture using buffer efficient access data texturing
US20110176739A1 (en) Pixel Block Processing
CN102971765A (en) Lookup tables for text rendering
GB2552136B (en) Storing headers for encoded blocks of data in memory according to a tiled layout
CN106408641A (en) Image data buffering method and device
GB2604266A (en) Compression techniques for pixel write data
CN107993184A (en) A kind of graphics processor depth value shifts to an earlier date test circuit
CN102176205A (en) File format for storage of chain code image sequence and decoding algorithm
CN101795410A (en) Texture compression and synthesis method with fine granularity and high compression rate
CN104113759A (en) Video system and method and device for buffering and recompressing/decompressing video frames
CN104954749B (en) A kind of information recording method
CN104883573B (en) A kind of signal high-efficient treatment method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170517

RJ01 Rejection of invention patent application after publication