CN106683036A - Storing and encoding method of frame buffer for efficient GPU drawing - Google Patents
Storing and encoding method of frame buffer for efficient GPU drawing Download PDFInfo
- Publication number
- CN106683036A CN106683036A CN201611139601.6A CN201611139601A CN106683036A CN 106683036 A CN106683036 A CN 106683036A CN 201611139601 A CN201611139601 A CN 201611139601A CN 106683036 A CN106683036 A CN 106683036A
- Authority
- CN
- China
- Prior art keywords
- units
- tile
- block
- supertile
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0875—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/005—General purpose rendering architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/04—Texture mapping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/40—Specific encoding of data in memory or cache
- G06F2212/401—Compressed data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/45—Caching of specific data in cache memory
- G06F2212/455—Image or video data
Abstract
The invention provides a storing and encoding method of a frame buffer for efficient GPU drawing. The method comprises the following steps: during encoding, image data in each Tile unit or Super Tile unit is encoded according to a normal encoding order; the encoding order of Tile units or Super Tile units in each Block unit is consistent with the rasterization direction, the Tile units or Super Tile units are encoded in a zigzag shape, the first Tile unit or Super Tile unit in the lower left corner is encoded first in each Block unit, and encoding is carried out in a left-to-right and then bottom-to-top order; and the encoding order of the Block units in each encoding and storing object is consistent with the rasterization direction, the Block units are encoded in a zigzag shape, the first Block unit in the lower left corner is encoded first in each encoding and storing object, and encoding is carried out in a left-to-right and then bottom-to-top order. According to the invention, the space locality of a frame buffer in graph drawing can be utilized to the maximum, the absence rate of color, depth and texture Caches can be reduced, GPU drawing can be speeded up, and the bandwidth requirement of DDR can be reduced.
Description
Technical field
The present invention relates to computer hardware technology field, more particularly to the frame buffer zone storage coded method of GPU.
Background technology
GPU needs great memory bandwidth during 3D graphic plottings, mainly due to data texturing, color data and depth number
According to access, often alleviated using texture Cache, color Cache and depth Cache and corresponding compression algorithm in design
DDR memory bandwidth pressure.The access of Cache is accessed by Block, while compression algorithm is to be compressed by Tile and decompress
Contracting, frame buffer zone storage coded system different in DDR can greatly affect the hit rate of the efficiency and Cache of compression algorithm
Efficiency is updated with Cache.
The content of the invention
The purpose of the present invention is:
The present invention describes a kind of frame buffer zone storage coded method efficiently drawn towards GPU, can be maximum
Using the spatial locality of frame buffer zone during graphic plotting, the miss rate of color, depth and texture Cache is reduced, accelerate GPU's
Draw, and reduce the bandwidth demand of DDR.
The technical scheme is that:
It is a kind of to store coded method towards the frame buffer zone that GPU efficiently draws, including:
Code storage object is pressed into stress and strain model and presses grid in the big Block units of several grades, each Block unit
It is divided into the big Tile units of several grades or SuperTile units;Bag in each Tile unit or SuperTile units
View data containing equal number;
During coding, view data is entered according to normal coded sequence in each Tile unit or SuperTile units
Row coding;
The coded sequence of Tile units or SuperTile units in each Block unit is consistent with rasterization direction,
Encode according to "the" shape, in each Block unit from the beginning of first, lower left corner Tile units or SuperTile units,
According to from left to right, then sequential encoding from top to bottom;
The coded sequence of Block units is consistent with rasterization direction in each code storage object, according to "the" shape
Coding, in each code storage object from the beginning of the Block units of first, the lower left corner, according to from left to right, then from top to bottom
Sequential encoding.
The code storage object is data texturing either color data or depth data.
Described image data arrange totally 16 texels for 4 rows 4, or 4 rows 16 arrange totally 64 pixels, or 8 rows 8 arrange totally 64
Depth data.
It is an advantage of the invention that:
The code storage mode of the data texturing can ensure that the Block data that texture Cache is accessed every time
Spatial locality is optimum;The Block that the code storage mode of the color data can ensure access every time with color Cache
The optimum balance of bandwidth and buffering when the spatial locality of data, data compression scheme and pixel buffer show;The depth
The code storage mode of degrees of data can ensure that spatial locality, the data of the Block data that depth Cache is accessed every time
The optimum balance of compress mode.
Description of the drawings
Fig. 1 is a kind of texture buffer storage coded system schematic diagram efficiently drawn towards GPU in the present invention;
Fig. 2 is a kind of color buffer storage coded system schematic diagram efficiently drawn towards GPU in the present invention;
Fig. 3 is a kind of depth buffer storage coded system schematic diagram efficiently drawn towards GPU in the present invention.
Specific embodiment
Below in conjunction with the accompanying drawings and specific embodiment, technical scheme is clearly and completely stated.Obviously,
The embodiment stated only is a part of embodiment of the invention, rather than the embodiment of whole, based on the embodiment in the present invention,
Those skilled in the art are not making the every other embodiment that creative work premise is obtained, and belong to the guarantor of the present invention
Shield scope.
It is a kind of to store coded method towards the frame buffer zone that GPU efficiently draws, including:
Code storage object is pressed into stress and strain model and presses grid in the big Block units of several grades, each Block unit
It is divided into the big Tile units of several grades or SuperTile units;Bag in each Tile unit or SuperTile units
View data containing equal number;
During coding, view data is entered according to normal coded sequence in each Tile unit or SuperTile units
Row coding;
The coded sequence of Tile units or SuperTile units in each Block unit is consistent with rasterization direction,
Encode according to "the" shape, in each Block unit from the beginning of first, lower left corner Tile units or SuperTile units,
According to from left to right, then sequential encoding from top to bottom;
The coded sequence of Block units is consistent with rasterization direction in each code storage object, according to "the" shape
Coding, in each code storage object from the beginning of the Block units of first, the lower left corner, according to from left to right, then from top to bottom
Sequential encoding.
The code storage object is data texturing either color data or depth data.
Described image data arrange totally 16 texels for 4 rows 4, or 4 rows 16 arrange totally 64 pixels, or 8 rows 8 arrange totally 64
Depth data.
Embodiment
As shown in figure 1, a kind of store coded system, the stricture of vagina of texture buffer towards the texture buffer that GPU efficiently draws
According to the difference of pinup picture pattern when reason pinup picture is accessed, pixel may need spatially adjacent 2,4,8 even
More texel points are fitted, so the spatial locality of the texture Cache of texel data is provided for texture mapping just especially
It is important.Using the "the" shape code storage mode based on Tile in design, the Block data of a Cache are in two-dimensional space
On contain 16 Tile, 16 rows 16 arrange totally 256 texel datas, and its spatial locality can be optimal.
Texture Cache is read-only Cache, is read according to the adjacent mode of two-dimensional space from Cache when texture mapping
Multiple texel datas, if texture Cache there occurs disappearance, need the data that a Block is once obtained from DDR.Adopt
With the "the" shape code storage mode based on Tile, the adjacent Block data of two-dimensional space are Coutinuous store in DDR
, the acquisition of DDR data can be completed by a burst transfer, can be on the basis of texel space locality optimum be ensured
Reduce the access bandwidth demand of DDR.
As shown in Fig. 2 a kind of store coded system towards the color buffer that GPU efficiently draws, color buffer is in GPU
The final result of fragment is drawn in storage when carrying out graphic plotting, is read and is shown on screen by display module.So face
The coded format of color relief area will not only consider the access characteristics of color Cache during graphic plotting, it is also contemplated that display module is read
Display characteristic when taking, while the bandwidth demand in order to reduce DDR, color buffer generally require using based on Tile or
The lossless compression algorithm of SuperTile.Using the "the" shape code storage mode based on SuperTile in design, one
SuperTile is a compression blocks, is compressed using lossless compression algorithm.To improve compressible, the size of SuperTile
It is set as the pixel composition that 4 row 16 is arranged.Because the display of final color relief area needs to be shown line by line, if still adopted
The optimum coded system of two-dimensional space locality is accomplished by caching at least 16 row data when then display module reads color buffer,
And actually show line by line read characteristic be do not need so jumbo caching, in order to access in color Cache two
Dimension space locality is optimum and shows that reading buffer capacity obtains between the two best balance, designs the Block of a Cache
Data contain 4 SuperTile on two-dimensional space, totally 256 pixel datas.
When color Cache carries out buffer area read-write, if color Cache there occurs disappearance, need from DDR once
Obtain the data of a Block.Using the "the" shape code storage mode based on Tile, a two-dimensional space is adjacent
Block data are Coutinuous store in DDR, and the acquisition of DDR data can be completed by a burst transfer, can ensure picture
The access bandwidth demand of DDR is reduced on the basis of plain spatial locality optimum.And display module need not be read when reading by Block
Take, once only need to read the data of the SuperTile that 4 row 16 is arranged, saved the spatial cache of internal realization, data
The optimum balance of bandwidth and buffering when spatial locality, data compression scheme and pixel buffer show.
As shown in figure 3, a kind of store coded system towards the depth buffer that GPU efficiently draws, depth buffer is in GPU
The depth value of fragment is drawn in storage when carrying out graphic plotting, and operates the depth to subsequent segments to test by segment,
To determine that those segments can be shown on screen.In order to reduce the bandwidth demand of DDR, depth buffer generally requires and adopts base
In the lossless compression algorithm of Tile or SuperTile.Using the "the" shape code storage side based on SuperTile in design
Formula, a SuperTile is a compression blocks, is compressed using lossless compression algorithm.To improve compressible,
The size of SuperTile is set as the segment depth value composition that 8 row 8 is arranged, and designs the Block data of a Cache two-dimentional empty
Between on contain 4 SuperTile, totally 256 segment depth datas.
When depth Cache carries out buffer area read-write, if depth Cache there occurs disappearance, need from DDR once
Obtain the data of a Block.Using the "the" shape code storage mode based on Tile, a two-dimensional space is adjacent
Block data are Coutinuous store in DDR, and the acquisition of DDR data can be completed by a burst transfer, can ensure picture
The access bandwidth demand of DDR is reduced on the basis of plain spatial locality optimum.
Finally it should be noted that above example is only to illustrate technical scheme, rather than a limitation;Although
The present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those within the art that;It still may be used
To modify to the technical scheme that foregoing embodiments are recorded, or equivalent is carried out to which part technical characteristic;And
These modifications are replaced, and do not make the spirit and model of the essence disengaging various embodiments of the present invention technical scheme of appropriate technical solution
Enclose.
Claims (3)
1. it is a kind of to store coded method towards the frame buffer zone that GPU efficiently draws, it is characterised in that to include:
Code storage object is pressed into stress and strain model and presses stress and strain model in the big Block units of several grades, each Block unit
Into the big Tile units of several grades or SuperTile units;Phase is included in each Tile unit or SuperTile units
With the view data of quantity;
During coding, view data is compiled according to normal coded sequence in each Tile unit or SuperTile units
Code;
The coded sequence of Tile units or SuperTile units in each Block unit is consistent with rasterization direction, according to
"the" shape is encoded, in each Block unit from the beginning of first, lower left corner Tile units or SuperTile units, according to
From left to right, sequential encoding then from top to bottom;
The coded sequence of Block units is consistent with rasterization direction in each code storage object, encodes according to "the" shape,
In each code storage object from the beginning of the Block units of first, the lower left corner, according to from left to right, then order from top to bottom
Coding.
2. a kind of frame buffer zone storage coded method efficiently drawn towards GPU as claimed in claim 1, it is characterised in that
The code storage object is data texturing either color data or depth data.
3. a kind of frame buffer zone storage coded method efficiently drawn towards GPU as claimed in claim 1, it is characterised in that
Described image data arrange totally 16 texels for 4 rows 4, or 4 rows 16 arrange totally 64 pixels, or 8 rows 8 arrange totally 64 depth numbers
According to.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611139601.6A CN106683036A (en) | 2016-12-12 | 2016-12-12 | Storing and encoding method of frame buffer for efficient GPU drawing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611139601.6A CN106683036A (en) | 2016-12-12 | 2016-12-12 | Storing and encoding method of frame buffer for efficient GPU drawing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106683036A true CN106683036A (en) | 2017-05-17 |
Family
ID=58869322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611139601.6A Pending CN106683036A (en) | 2016-12-12 | 2016-12-12 | Storing and encoding method of frame buffer for efficient GPU drawing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106683036A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107993184A (en) * | 2017-11-24 | 2018-05-04 | 中国航空工业集团公司西安航空计算技术研究所 | A kind of graphics processor depth value shifts to an earlier date test circuit |
CN108009978A (en) * | 2017-11-24 | 2018-05-08 | 中国航空工业集团公司西安航空计算技术研究所 | A kind of non-parallel triangle rasterization cellular construction of obstruction |
CN109614086A (en) * | 2018-11-14 | 2019-04-12 | 西安翔腾微电子科技有限公司 | TLM model and realization structure are stored towards GPU texture buffer data based on SystemC |
CN110223369A (en) * | 2019-06-06 | 2019-09-10 | 西安博图希电子科技有限公司 | Frame buffer write-back method, device and the computer storage medium of TBR framework |
WO2020190797A1 (en) * | 2019-03-15 | 2020-09-24 | Intel Corporation | Systems and methods for updating memory side caches in a multi-gpu configuration |
WO2020190776A1 (en) * | 2019-03-15 | 2020-09-24 | Intel Corporation | Synchronizing encrypted workloads across multiple graphics processing units |
CN112734897A (en) * | 2020-12-05 | 2021-04-30 | 西安翔腾微电子科技有限公司 | Graphics processor depth data prefetching method triggered by primitive rasterization |
WO2022095010A1 (en) * | 2020-11-09 | 2022-05-12 | Qualcomm Incorporated | Methods and apparatus for rasterization of compute workloads |
US11842423B2 (en) | 2019-03-15 | 2023-12-12 | Intel Corporation | Dot product operations on sparse matrix elements |
US11934342B2 (en) | 2019-03-15 | 2024-03-19 | Intel Corporation | Assistance for hardware prefetch in cache access |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101243611A (en) * | 2005-08-12 | 2008-08-13 | 微软公司 | Efficient coding and decoding of transform blocks |
CN103220507A (en) * | 2012-01-19 | 2013-07-24 | 中兴通讯股份有限公司 | Method and system for video coding and decoding |
CN103793893A (en) * | 2012-10-26 | 2014-05-14 | 辉达公司 | Primitive re-ordering between world-space and screen-space pipelines with buffer limited processing |
CN106210729A (en) * | 2015-05-06 | 2016-12-07 | 扬智科技股份有限公司 | Decoding video stream system and method for decoding video stream |
-
2016
- 2016-12-12 CN CN201611139601.6A patent/CN106683036A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101243611A (en) * | 2005-08-12 | 2008-08-13 | 微软公司 | Efficient coding and decoding of transform blocks |
CN103220507A (en) * | 2012-01-19 | 2013-07-24 | 中兴通讯股份有限公司 | Method and system for video coding and decoding |
CN103793893A (en) * | 2012-10-26 | 2014-05-14 | 辉达公司 | Primitive re-ordering between world-space and screen-space pipelines with buffer limited processing |
CN106210729A (en) * | 2015-05-06 | 2016-12-07 | 扬智科技股份有限公司 | Decoding video stream system and method for decoding video stream |
Non-Patent Citations (1)
Title |
---|
YACINE AMARA ET AL.: "A GPU Tile-Load-Map architecture for terrain rendering: theory and applications", 《THE VISUAL COMPUTER》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108009978B (en) * | 2017-11-24 | 2021-04-20 | 中国航空工业集团公司西安航空计算技术研究所 | Non-blocking parallel triangular rasterization unit structure |
CN108009978A (en) * | 2017-11-24 | 2018-05-08 | 中国航空工业集团公司西安航空计算技术研究所 | A kind of non-parallel triangle rasterization cellular construction of obstruction |
CN107993184A (en) * | 2017-11-24 | 2018-05-04 | 中国航空工业集团公司西安航空计算技术研究所 | A kind of graphics processor depth value shifts to an earlier date test circuit |
CN109614086A (en) * | 2018-11-14 | 2019-04-12 | 西安翔腾微电子科技有限公司 | TLM model and realization structure are stored towards GPU texture buffer data based on SystemC |
CN109614086B (en) * | 2018-11-14 | 2022-04-05 | 西安翔腾微电子科技有限公司 | GPU texture buffer area data storage hardware and storage device based on SystemC and TLM models |
WO2020190797A1 (en) * | 2019-03-15 | 2020-09-24 | Intel Corporation | Systems and methods for updating memory side caches in a multi-gpu configuration |
WO2020190776A1 (en) * | 2019-03-15 | 2020-09-24 | Intel Corporation | Synchronizing encrypted workloads across multiple graphics processing units |
US11709793B2 (en) | 2019-03-15 | 2023-07-25 | Intel Corporation | Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format |
US11842423B2 (en) | 2019-03-15 | 2023-12-12 | Intel Corporation | Dot product operations on sparse matrix elements |
US11899614B2 (en) | 2019-03-15 | 2024-02-13 | Intel Corporation | Instruction based control of memory attributes |
US11934342B2 (en) | 2019-03-15 | 2024-03-19 | Intel Corporation | Assistance for hardware prefetch in cache access |
US11954063B2 (en) | 2019-03-15 | 2024-04-09 | Intel Corporation | Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format |
US11954062B2 (en) | 2019-03-15 | 2024-04-09 | Intel Corporation | Dynamic memory reconfiguration |
CN110223369A (en) * | 2019-06-06 | 2019-09-10 | 西安博图希电子科技有限公司 | Frame buffer write-back method, device and the computer storage medium of TBR framework |
WO2022095010A1 (en) * | 2020-11-09 | 2022-05-12 | Qualcomm Incorporated | Methods and apparatus for rasterization of compute workloads |
CN112734897A (en) * | 2020-12-05 | 2021-04-30 | 西安翔腾微电子科技有限公司 | Graphics processor depth data prefetching method triggered by primitive rasterization |
CN112734897B (en) * | 2020-12-05 | 2024-04-02 | 西安翔腾微电子科技有限公司 | Graphics processor depth data prefetching method triggered by primitive rasterization |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106683036A (en) | Storing and encoding method of frame buffer for efficient GPU drawing | |
TWI544751B (en) | Reformatting data to decrease bandwidth between a video encoder and a buffer | |
US8139075B2 (en) | Color packing glyph textures with a processor | |
Hasselgren et al. | Efficient depth buffer compression | |
TW200917228A (en) | Compression of multiple-sample-anti-aliasing tile data in a graphics pipeline | |
CN108022269A (en) | A kind of modeling structure of GPU compressed textures storage Cache | |
US20130278601A1 (en) | Method and Apparatus for Processing Texture Mapping in Computer Graphics by Biasing Level of Detail According to Image Content and Computer Readable Storage Medium Storing the Method | |
US20140028693A1 (en) | Techniques to request stored data from a memory | |
GB2564466A (en) | Graphics processing systems | |
CN109064535B (en) | Hardware acceleration implementation method for texture mapping in GPU | |
CN110214338A (en) | Application of the increment color compressed to video | |
US10466915B2 (en) | Accessing encoded blocks of data in memory | |
GB2604266A (en) | Compression techniques for pixel write data | |
CN107993184A (en) | A kind of graphics processor depth value shifts to an earlier date test circuit | |
CN102176205A (en) | File format for storage of chain code image sequence and decoding algorithm | |
CN101795410A (en) | Texture compression and synthesis method with fine granularity and high compression rate | |
CN104113759A (en) | Video system and method and device for buffering and recompressing/decompressing video frames | |
CN104954749B (en) | A kind of information recording method | |
CN104883573B (en) | A kind of signal high-efficient treatment method | |
CN106780638A (en) | A kind of high speed camera compresses image fast reconstructing method | |
US8515187B2 (en) | Method, compressor, decompressor and signal representation for lossless compression of pixel block values using row and column slope codewords | |
US10418002B2 (en) | Merged access units in frame buffer compression | |
CN101662684A (en) | Data storage method and device for video image coding and decoding | |
KR20220166198A (en) | Methods of and apparatus for storing data in memory in graphics processing systems | |
Yela et al. | S3Dc: A 3Dc-based Volume Compression Algorithm. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170517 |
|
RJ01 | Rejection of invention patent application after publication |