CN105849780B - Optimized multipass time in the block formula architecture that tiles reproduces - Google Patents

Optimized multipass time in the block formula architecture that tiles reproduces Download PDF

Info

Publication number
CN105849780B
CN105849780B CN201480070397.XA CN201480070397A CN105849780B CN 105849780 B CN105849780 B CN 105849780B CN 201480070397 A CN201480070397 A CN 201480070397A CN 105849780 B CN105849780 B CN 105849780B
Authority
CN
China
Prior art keywords
time
over
inquiry
condition
gpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201480070397.XA
Other languages
Chinese (zh)
Other versions
CN105849780A (en
Inventor
穆拉特·巴尔契
克里斯托弗·保罗·弗拉斯卡蒂
阿温阿什·赛塔拉迈亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN105849780A publication Critical patent/CN105849780A/en
Application granted granted Critical
Publication of CN105849780B publication Critical patent/CN105849780B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • G06F16/7335Graphical querying, e.g. query-by-region, query-by-sketch, query-by-trajectory, GUIs for designating a person/face/object as a query predicate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Generation (AREA)

Abstract

The present invention is provided to based on tiling block framework on multipath reproduce system and method, it includes: with graphics processing unit GPU execute inquiry all over time;Described inquire all over time execution condition very all over secondary without executing refresh operation is based on the GPU;The inquiry is based on all over time execution condition vacation time time without executing refresh operation with the GPU;And GPU execution refresh operation very is used all over the secondary and condition vacation time time in response to the execution condition.

Description

Optimized multipass time in the block formula architecture that tiles reproduces
Present application advocates United States provisional application the 61/921st, 145 filed on December 27th, 2013 equity, institute The entire content for stating application case is incorporated herein by reference.
Technical field
The present invention relates to the technologies for graphics process, and more specifically to for reproducing the primitive in graphics process Technology.
Background technique
Graphics processing unit (GPU) can be performed the reproduction based on tiling block and can be used for reproducing three-dimensional scenic.Because three-dimensional This reproduction of scene can be very intensive for bandwidth of memory, so dedicated graphics memory (GMEM) is close in GPU core Positioning.GPU core generally uses GMEM reconstruction of scenes.GPU or central processing unit (CPU) then can be by the GMEM containing scene Content resolution to system storage.In other words, indicate that the data of scene can be transmitted to system storage from GMEM.Because moving The size of GMEM in rotating ring border is attributed to physical region constraint and bandwidth of memory can be to be limited, so GPU can will be to again Existing scene splits into smaller portions so that those smaller portions can individually be reproduced.Specifically, GPU can be by by scene It is divided into the part that reproduce in GMEM and each section of scene is rendered in GMEM and carry out reconstruction of scenes.
Summary of the invention
In general, present invention description is for optimizing figure for graphics processing unit (GPU) framework based on tiling block The technology of reproduction.It is passed by the communication and data that reduce between central processing unit (CPU) and GPU when executing reproduction instruction It send, the GPU based on tiling block can improve its performance in terms of render graphical objects and scene.Specifically, it is configured to use The reproduction of larger amount of graphic scene can be executed to GPU itself in the GPU of the reproduction based on tiling block and necessarily waiting for CPU is handed over Interaction, this can improve the reproducibility of GPU.
In an example, the present invention describes a kind of method, it includes: inquiry is executed with graphics processing unit (GPU) All over secondary;Described inquire all over time execution condition very all over secondary without executing refresh operation is based on the GPU;With the GPU base In the inquiry all over time execution condition vacation time time without executing refresh operation;And in response to executing the condition very all over secondary and institute It states condition vacation time time and the GPU is used to execute refresh operation.
In another example, the present invention describes a kind of device, and it includes the GPU for being configured to perform the following operation: executing Inquiry is all over secondary;Based on the inquiry all over time execution condition true time time without executing refresh operation;Based on the inquiry all over time execution Condition vacation is all over secondary without executing refresh operation;And it is very executed all over the secondary and condition vacation time time in response to the execution condition Refresh operation.
In another example, the present invention describes a kind of device, and it includes for looking into graphics processing unit (GPU) execution Ask all over time device, for the GPU based on the inquiry all over time execution condition very all over time without executing refresh operation Device, for the GPU based on the inquiry all over the vacation of time execution condition all over time without executing the device of refresh operation, and use In in response to executing the true time secondary and condition vacation of the condition all over secondary and with GPU execution refresh operation device.
In another example, the present invention describes a kind of computer-readable storage medium.The computer-readable storage medium It is stored with instruction thereon, described instruction causes one or more processors to perform the following operation when being executed: executing inquiry all over secondary; Based on the inquiry all over time execution condition true time time without executing refresh operation;Based on the inquiry all over time execution condition vacation all over secondary Without executing refresh operation;And refresh operation very is executed all over the secondary and condition vacation time time in response to the execution condition.
Detailed description of the invention
Fig. 1 is the explanation according to some aspects of the invention for reproducing processor, the graphics processing unit of three-dimensional scenic With the block diagram of system storage.
Fig. 2 is the explanation according to some aspects of the invention based on the tiling block for reproducing the scene in framework of tiling block Concept map.
Fig. 3 is the concept map for the primitive that the displaying of aspect according to the present invention divides between storehouse.
Fig. 4 is the explanation of technology according to the present invention for executing the concept map for the technology that multipass time reproduces.
Fig. 5 is can be by the example for the function that hardware executes according to the explanation of one or more examples described in the present invention Concept map.
Fig. 6 is the explanation according to one or more examples described in the present invention for more on the framework based on tiling block All over time flow chart of the instance method of reproduction.
Fig. 7 is the block diagram for illustrating can be configured to implement the example of the device of one or more aspects of the invention.
Specific embodiment
Graphics processing unit (GPU) can be used for reproducing three-dimensional (3D) scene.Because this reproduction of 3D scene can be memory Bandwidth is very intensive, so dedicated graphics memory (GMEM) can be used.It is fixed that GMEM is close to the graphics processing core in GPU Position is so that GMEM has very high bandwidth of memory (that is, read and access is written to GMEM relatively fast).Scene can be by GPU Graphics processing core be rendered to GMEM, and the scene can be decomposed memory (for example, frame buffer) from GMEM so that Obtaining the scene can then show at display device.However, because the size of GMEM is attributed to physical region, constraint can be to have Limit, so GMEM may not have enough memory capacities to contain entire three-dimensional scenic (for example, frame).
In some instances, GPU or other processing units can be configured so that 3D scene is split into tiling block, so that group It can be put into GMEM at each tiling block of scene.This is referred to as reproduction or " binning " based on tiling block.As an example, such as Fruit GMEM can store the data of 512kB, then scene can be divided into tiling block so that picture contained in each tiling block Prime number is according to less than or equal to 512kB.By this method, GPU or other processors can pass through following operation reconstruction of scenes: scene is drawn It is divided into the tiling block that reproduce in GMEM and is individually rendered to each tiling block of scene in GMEM;It will be reproduced flat Block is spread to store from GMEM to frame buffer;And for each tiling block repetition of scene and storage.Therefore, GPU or its Multiple reproduce all over time tiling block by with tiling block reconstruction of scenes with each tiling block of reconstruction of scenes can be used in its processor.
In some instances, the reproduction based on tiling block can be executed with several steps.For example, implement based on tiling block Framework GPU originally can binning all over time during processing or pre-process entire scene to define several storehouses, the storehouse is also claimed Make " tiling block ".Binning time time then can be secondary for a series of reproductions time, reproduces described all over the defined tiling block of secondary period reproduction Each of.In some instances, with three phases complete reproduce all over time each of: (1) remove/it is undecomposed, (2) again Existing, (3) are decomposed.Remove/during undecomposed stage, GPU can be for new tiling initialization block GMEM and will be from external storage The value that device is read is stored into GMEM.During reproduction, GPU can re-create polygon associated with current tile block, with And pixel value and surface treatment current tile block are generated, so that tiling block can be shown over the display.Decomposition step can be related to The content of on-chip memory (GMEM) is copied to the memory outside GPU by GPU, for example, being used to show through table by display The buffer of surface treatment scene.
In binning all over time period, GPU can produce the polygon (for example, triangle) of composition scene, and polygon is classified To in multiple " storehouses ".As described in this article, binning is the last scene presented over the display all over the storehouse that time period is defined Tiling block (for example, sometimes referred to as " screen tiling block ") synonym.For example, each storehouse indicates last scene (example Such as, video data frame, computer generate graph image, still image or the predefined part of its fellow) a part or Tile block.Therefore, term " storehouse " and " tiling block " can be used interchangeably herein.Form scene tiling block can respectively with deposit The storehouse stored up in the memory for the primitive for including in each corresponding tiling block is associated.Storehouse is a part, picture or the frame of memory A part, for example, the primitive in the tiling block of picture or frame.It may include executing life in GMEM that the tiling block of scene, which is rendered to, It enables so that the primitive in associated storehouse to be rendered in GMEM.The binning of GPU is all over time can forming the sorting primitives of scene to fitting When in storehouse.The binning of GPU can also be directed to each storehouse all over time and create visibility stream, indicate that any primitive in the storehouse will be most It is visible or invisible in reconstruction of scenes afterwards.Visibility stream is that when reproduction cells, the primitive can in each tiling block for instruction See still sightless bit stream.
Order to reproduce the primitive in storehouse can load in indirect buffer.Indirect buffer can be for (for example) A part of GMEM, frame buffer or other memories.However, in general, indirect buffer can be the one of system storage Part.The executable order being stored in indirect buffer of GPU is to reproduce the primitive being contained in storehouse.If the visibility stream in storehouse Indicate that the storehouse does not contain any visible primitive (that is, all primitives in storehouse will be invisible in the scene finally reproduced), that In GPU by skipping the case where instruction in execution indirect buffer associated with storehouse is without reproducing the primitive in the storehouse Under can improve performance.
It, can multiple reconstruction of scenes and associated object in some examples that multipass time reproduces.When each rendered object, Can computing object appearance additional aspect and combine it with Previous results.In general, this can be related to coarse initial reproducing It reproduces with based on the first coarse detailed second all over secondary query result all over secondary.Inquiry may include data all over time result, for example, referring to Whether show should execute binning all over secondary Counter Value or heuristic.For example, if object to be reproduced is simple (opposite For), then execute inquiry all over time after only execute and reproduce all over time being advantageous.Alternatively, if object to be reproduced is (comparatively) of complexity, then it can be advantageous for inquiring secondary with reproduction time all over the secondary binning time time of execution later in execution.
In some instances, GPU also can be configured to execute operation all over time period in binning to determine which polygon exists It is visible in scene, for example, executing depth test to determine whether a polygon covers another polygon.It is more which is determined Side shape can be seen below in the scene, and GPU just can produce the data flow referred to as " visibility stream ".Visibility stream may include for scene Each of polygon value, and described value can indicate whether polygon visible (for example, value " 1 " can indicate that polygon is Visible and value " 0 " can indicate that polygon is sightless).
After binning time time, GPU can discretely be reproduced in tiling block by handling each of tiling block again Each.In some instances, it is invisible more to omit using the visibility stream generated during binning or skip reproduction by GPU Side shape.Therefore, it only handles and reproduces visible polygon, that is, facilitate those of last scene polygon.GPU can be with three phases To execute reproduction processes to each of tiling block: (1) remove/undecomposed, (2) reproduce, and (3) are decomposed.
Remove/during undecomposed stage, GPU can for new tiling initialization block local storage resource (for example, for GPU memory is local on GPU or chip, is also known as GMEM).In some instances, GPU is by executing at removing Reason initializes GMEM to remove GMEM.In other examples, GPU can initialize GMEM by executing undecomposed processing.? During undecomposed processing, GPU can read value in GMEM from external memory.When with one of new data only more new scene Timesharing, the implementable undecomposed processing of GPU.For example, it can be used undecomposed processing across more than one scene (for example, figure More than one frame of data) retain be not from scene change another scene pixel data.
During reproduction, GPU can re-create polygon associated with current tile block, and generate pixel value and table Surface treatment current tile block, so that tiling block can be shown over the display.For example, GPU can be produced during render stage Raw appropriate pixel value is so that shown pixel data accurately indicates scene.In some instances, GPU can be by last pixel Value is stored on part (for GPU) or chip in GPU memory (that is, GMEM).
After rendering, GPU can by the content of on-chip memory is copied to outside GPU memory (for example, by Display is used to show the buffer of surface treated scene) decompose current tile block.GPU generally has to wait resolve a picture dot Data are until pixel data end of reproduction.For example, if GPU before reproducing pixel completely by pixel number According to decomposing or copying to external memory from GMEM, then generated scene will not show the suitable of set scene when through showing Work as attribute.
In some instances, GPU may wait for decompose tiling block until entirely tile block end of reproduction until.Citing comes It says, GPU can be waited before the block that will tile copies to external memory from GMEM until get out display until the block that entirely tiles. GPU by removed for next tiling block/undecomposed GMEM, reproduce next tiling block and the next tiling block of decomposition come The process is repeated until being surface-treated entire scene.
On the framework based on tiling block, binning above runs at two times times and can be for two times generations and binning phase The data of pass, this can be related to additional refresh point and mid-scene decomposes.Even if when the performance thus behavior increase of application program When more than single time situation, situation can also be such.Therefore, in some cases, it may include executing first pass that multipass time, which reproduces, Secondary reproduction, inquiry and second time time reproduce.Inquiry can be for by any request to information of application triggers.For example, Application program reproduce the something on the side API, refresh the object for being rendered to graphics card.Specific reproduce is completed in this operation.Then may be used Send inquiry.The inquiry can be the number for the pixel transmitted when reproducing and being refreshed to graphics card from API.The inquiry can For to about it is performed it is last reproduce or any request of the information of the current state for reproducing operation, the last reproduction or work as Preceding reproduction operation is by hardware supported.In some cases, application program reproduce something, triggering inquiry, and only sends and be based on looking into Ask the data of result.According to some examples of present application, the transmittable inquiry of the application program and multiple 2nd grades indirect bufferings Device (IB2).IB2 contains the order of the various aspects for reproducing pipeline.For example, before IB2 by GPU containing that can be executed Order is led, for example, the order of the stationary state of initialization GPU and the initial reproducing state of setting GPU.The playback mode of GPU can Comprising that can be arranged based on the GPU of concrete application routine change.IB2 may include a series of status commands and drawing command to be used for Through drawing triangle in load storehouse.Each drawing command can indicate that GPU draws triangle according to graphics processing pipeline.IB2 68 can Influence the behavior by the GPU graphics processing pipeline executed.For example, color, polygon Mode (example can be changed in status command Such as, point rather than solid or line), blend (ON/OFF), depth test (ON/OFF), veining (ON/OFF), reject, wipe out and its Its logical operation.IB2 status command can be issued on the basis of each triangle (or each primitive).
In one example, the application program can send an IB2 for each possible query result.Therefore, it can be transmitted The order of the various aspects of reproduction pipeline for each possible query result.In an example, two query results- "true" and "false" can be possible.Therefore, two IB2, an IB2 for "true" query result can be transmitted in the application program With an IB2 for "false" query result.Therefore, the various aspects of the reproduction pipeline for "true" query result can be transmitted Order, and the order of the various aspects for reproducing pipeline for "false" query result can be transmitted.The application program sends two Possibility, the IB2 for "true" query result and the IB2 for "false" query result.
Because the application program sends both the IB2 and the IB2 for "false" query result for being used for "true" query result, So the application program withouts waiting for sending the data based on query result.In fact, GPU, which will allow, is used for "true" inquiry knot Both the IB2 of fruit and the IB2 for "false" query result are sent by the application program, and GPU may wait for the result of inquiry.Cause This, rather than the application program is allowed to wait, the GPU may wait for query result and the then executable inquiry time time of the GPU.Institute Condition can be executed very all over secondary all over secondary result in inquiry by stating GPU in the case where "true".In addition, refresh operation can not executed In the case of execute the condition very all over time.The GPU can execute condition vacation time in the case where "false" all over secondary result in inquiry It is secondary.Condition vacation can be also executed in the case where not executing refresh operation all over secondary.In response to the result conditionity based on inquiry time time Ground executes condition very all over one of secondary or condition vacation time time, and refresh operation then can be performed in the GPU.
Therefore, in some aspects of the invention, wait query result that can be sent to GPU from application program.In other words It says, the GPU rather than the application program may wait for the result of inquiry.This situation is possible, because in some instances, institute GPU can be sent for the IB2 for being used for condition "true" situation and both the IB2 for being used for condition "false" situation by stating application program.Therefore, The GPU, which has, is directed to both "true" situation and "false" the situation application program for the content of reproduction and the application program The content wished to carry out, because each of IB2 contains the order of the various aspects of the reproduction pipeline for its respective conditions, For example, being respectively used to the condition "true" situation and condition "false" situation of "true" query result and "false" query result.
Can reproduce therewith for additional queries and subsequent passes time: for the block formula system that tiles, this can correspond to following sequence: (1) first pass time reproduces, and also referred to as coarse time secondary or inquiry is all over time, wherein the block formula system that tiles can produce visibility Stream disposes the binning time for being used for the reproduction of first pass time all over secondary load and reproduction and storage;(2) check (can be by for inquiry Application program carries out), the inquiry checks to check time when result of the inquiry all over time (first pass);And (3) second All over secondary.Second time time may include all reproductions completed based on the secondary query result of first pass.In this example, based on tiling block System binning can be performed all over time, generate visibility stream, and execute the reproduction for this second time time all over time (it can have base In the different sets of the detailed geometry of application behavior, this situation is most likely the situation) load, reproduce and deposit Storage.Therefore, in the system based on tiling block, due to the bus access of the reproduction through binning can be triggered for two times times, It may occur in which bottleneck.Therefore, it can be minimized by using any optimization that visibility stream obtains, because implementing answering for these steps Additional refresh for being able to carry out inquiry time time to determine data, for example, indicating whether that binning should be executed can be caused with program All over secondary Counter Value or heuristic.These Counter Values or heuristic are also known as inquiry all over time reproduction statistics.In addition, one As for, can produce for example indicate whether to execute binning all over time Counter Value or heuristic the data or data are true It is set to initial all over secondary part.
In general, in some instances, graphics application program triggering inquiry reproduces coarse time time (first pass), and Then terminate to inquire.Graphics application program can check Query Value (that is, detecting whether to need the pixel of detailed reproduction through transmitting Number).Based on query result (graphics application program can trigger second time time).When inquiry is true, detailed field reproduce Scape;It is fictitious time when inquiring, the scene can not be reproduced completely or reproduce coarse but color pipeline and realize scene.Therefore, second It may include all reproductions all over time, the query result based on first pass time may or may not execute all reproductions.
Some examples can modify the behavior of application program and use to be used for the advantage that multipass time reproduces with it based on flat Spread the framework of block.Some examples, which can define new execution point and will fully control, is transmitted to GPU and graphics driver.Citing comes Say, some examples can: (1) Start_Query_Pass, (2) are called in first pass time through being introduced into querying condition The reproduction for query_pass is submitted to call, (3) terminate query_pass (calling Query_Pass_End), and (4) are called Start_Condition_true is all over time, and (5) submit the reproduction for condition_true_pass to call, and (6) call End_ Condition_true_pass, (7) call Start_Condition_false all over time, and (8), which are submitted, is used for condition_ The reproduction of false_pass is called, and (9) call End_condition_false_pass.
In an example, indirect buffer 1 (IB1) can call inquiry all over secondary, binning time time or reproduce all over secondary.Inquiry time Secondary, binning time time and reproduction time time can be the part of indirect buffer 2 (IB2).IB1 and IB2 is buffer, for example, multistage slow Rush device.Order in top layer buffer (IB1) can be used for calling the entire set of the order in lower level buffer (IB2).? In one example, inquiry time time can be by calling order of the inquiry in the IB1 of secondary IB2 to execute, and the inquiry can contain useful all over time IB2 In inquiry all over secondary all orders.Another IB2 contains all orders for binning, and another IB2 can be containing for reproducing Deng all orders.For example, it inquires time secondary, binning time time and reproduces time IB2 grades of secondary (i.e.) that can be respectively independent IB2 and delay Rush the corpus separatum in device.Inquiry can be run all over secondary before binning time time.Binning can be run all over secondary before reproduction time time.? Under some situations, binning time time can skip so that reproducing all over secondary operation later all over secondary immediately in inquiry.In some cases, exist Inquiry all over time after neither execute binning all over time and also execute reproduction all over time.
For example, inquiry can return to data all over secondary, for example, indicating whether to execute binning all over secondary Counter Value or examination Spy method.In one example, condition, which executes determination unit, can be determined whether to execute binning all over secondary or reproduction time time.To execution binning It can be based on the complexity of object to be reproduced all over secondary determination.For example, for simple object, it can skip binning all over secondary. Conversely, for more complex object, binning can be performed all over secondary.Therefore, for better simply object, can skip binning all over time with So that being reproduced all over secondary operation later all over secondary immediately in inquiry.In addition, individually reproduction can be executed iteration for simple object, For example, entire screen can it is single all over time rather than be written with a series of pieces.This can be for very simply repeating Screen reproduction It is possible.
Fig. 1 is the processor 102 for being used to reproduce three-dimensional (3D) scene according to the explanation of some aspects of the invention, at figure Manage the block diagram of unit (GPU) 120 and system storage 118.Software application 112, operating system can be performed in processor 102 (OS) 114 and graphics driver 116.System storage 118 may include storage for reproduction cells command stream and to by The indirect buffer for the secondary command that GPU 120 is executed.GPU 120 may include GMEM 122.GMEM 122 can be to be retouched above The GMEM stated.In some instances, GMEM 122 can be with GPU 120 at " on chip ".In some cases, demonstrated in Figure 1 All hardware element (for example) System on Chip/SoC (SoC) design in can be on chip.
In the example of fig. 1, processor 102, system storage 118 and GPU 120 can be a part of device.Device Example is including (but not limited to) video-unit, media player, set-top box, wireless phone (for example, mobile phone and so-called intelligence Can phone), personal digital assistant (PDA), desktop PC, laptop computer, game console, video conference unit, Tablet computing device and so on.
Processor 102 can be central processing unit (CPU).GPU 120 can be processing unit, be configured to execute for example It generates and output pattern data is for the function relevant to figure that presents over the display, and execute using by GPU 120 The function with non-graphics related of the large scale processing concurrency of offer.For example, graphics application program can be performed in GPU 120 Both with non-graphic application program.Because general processing capabilities can also be provided in GPU 120 in addition to graphics capability, GPU 120 can be referred to general GPU (GP-GPU).
The example of processor 102 and GPU 120 are including (but not limited to) digital signal processor (DSP), general micro process Device, specific integrated circuit (ASIC), Field Programmable Logic Array (FPGA) or other equivalent integrated or discrete logic.? In some examples, GPU 120 can be microprocessor, be designed for for example provide for handle figure and for execute with The special-purpose of the MPP of the application program of non-graphics related.In addition, although processor 102 and GPU 120 are passed through Illustrate to be independent assembly, but each aspect of the present invention is without being limited thereto.For example, processor 102 and GPU 120 can reside in altogether In same integrated circuit (IC).
The software application 112 executed on the processor 102 may include that instruction processor 102 causes graph data to reproduce One or more graphic renditions to display (not shown) instruct.In some instances, graphic rendition instruction may include that software refers to It enables, may conform to graphics application program programming interface (API), such as open GLAPI, open figure Library embedded system (OpenGL ES) API, Direct3D API, X3D API, RenderMan API, WebGL API or any Other public or propriety standard figure API.In order to handle graphic rendition instruction, processor 102 can be by one or more graphic renditions Order be issued to GPU 120 (for example, via graphics driver 116) with cause GPU 120 execute graph data some or All reproduce.In some instances, graph data to be reproduced may include such as point, line, triangle, quadrangle, triangle strip Graphic primitive list.
GPU 120 can be configured to perform graphic operation so that one or more graphic primitives are rendered to display.Therefore, when When one of software application executed on the processor 102 needs graphics process, processor 102 can by graph command and Graph data is provided to GPU 120 for being rendered to display.The graph data may include (for example) drawing command, state Information, cell information, texture information, etc..In some cases, GPU 120 is built-in highly-parallel structure, the height Parallel organization provides the more effective processing of complex figure relevant operation than processor 102.For example, GPU 120 may include It is configured and in a parallel fashion to multiple processing elements on multiple vertex or pixel operation.In some cases, processor 102 Highly-parallel property allows GPU 120 than using processor 102 directly by scene drawing to display more quickly by graphic diagram As (for example, GUI and two-dimentional (2D) and/or three-dimensional (3D) graphic scene) is plotted on display.
GPU 120 may be directly coupled to GMEM 122.In other words, GPU 120 local storage can be used rather than Chip external memory is in processing locality data.This allow GPU 120 read via (for example) shared bus by eliminating GPU 120 and The needs of data are written and operate in a more effective manner, wherein heavy bus traffic can be undergone via bus operation.However, In some cases, GPU 120 can not include single memory, but utilize system storage 118.GMEM 122 may include one Or multiple volatibility or nonvolatile memories or storage device, for example, random access memory (RAM), static state RAM (SRAM), dynamic ram (DRAM) and one or more registers.
Processor 102 and/or GPU 120 can will be reproduced image data and be stored in frame buffer 124.Frame buffer 124 can be SAM Stand Alone Memory or can to distribute in system storage 118.Video-stream processor (not shown) can be from frame buffer 124 Retrieval is reproduced image data and is shown described over the display through reproduction image data.
System storage 118 can be the memory in device and can reside in outside processor 102 and GPU 120, that is, phase For processor 102 be chip outside, and relative to GPU 120 be chip outside.System storage 118 can be stored by processor 102 The application program executed with GPU 120.In addition, system storage 118 can store what performed application program operated on it Data, and the data generated by application program.However, and this not all data require to be stored in system in each example In memory 118.In some cases, data can be locally stored on processor 102 or GPU 120.For example, described Some or all of data can be locally stored in GPU memory on chip (for example, graphic memory, GMEM 122).
System storage 118 can store can by processor 102 access with for execution program module, instruction or this two It is more than the two in person, the data or the above that are used for the program executed on the processor 102 or both.For example, System storage 118 can memory window manager application, can be used by processor 102 with by graphical user interface (GUI) it presents over the display.In addition, system storage 118 can store user application and related to the application program The application program surface data of connection.System storage 118 may act as the device memory for GPU 120, and can store to by The data that GPU 120 is operated and the data that operation generates as performed by GPU 120.For example, system storage 118 can Store texture buffer, depth buffer, stencil buffer, vertex buffer, any combination of frame buffer or its fellow.
System storage 118 can be the example of computer-readable storage medium.For example, system storage 118 can be deposited Storage instruction causes processor 102 and GPU 120 to execute the function of being attributed to each in the present invention.System storage 118 can It is considered to include the computer-readable storage medium of instruction, described instruction causes one or more processors (for example, processor 102 or GPU 120) it performs various functions.
The example of system storage 118 including (but not limited to) random access memory (RAM), read-only memory (ROM), Electrically erasable programmable read-only memory (EEPROM), or can be used for carrying or storing the institute of instructions or data structures in the form Want program code and any other media accessible by computer or processor.As an example, system storage 118 can It is removed from described device, and is moved to another device.As another example, it is filled generally similar to the storage of system storage 118 Setting can be plugged into described device.
Technology according to the present invention, some examples can modify the behavior of application program to be used for what multipass time reproduced with it Advantage and use based on tiling block framework.In some instances, new execution point/back door/extension can be used in application program, and will It fully controls and is transmitted to GPU and graphics driver.
For example, GPU can call " exposing " to each entrance of application program, so that GPU can will be submitted Instruction to the content of application program is provided to driver." exposing " refers to the entrance of code block, and offer can touch Send out the functional attributes of code block (when needing the functional attributes).In general, it is opened if application call is some Beginning step, that is, StartXXX, then being invoked in driver terminates, and in some instances, reproduced below/state is called It is in order to which XXX times times until terminating to call (that is, EndXXX) called.Therefore, associated all over secondary beginning and end Between all reproductions/state calling can accumulate and can be used for for these all over time indirect buffer of building.Example is called Start_Query_Pass (first pass) through introducing querying condition.GPU can usually be submitted for query_ The reproduction of pass calls and can call Query_Pass_End.GPU can also call Start_Condition_true all over time and Setting inquiry.GPU can submit the reproduction calling for condition_true_pass.GPU can also call End_condition_ True_pass and Start_Condition_false is all over secondary.GPU can also be submitted to be reproduced for condition_false_pass It calls, calls End_condition_false_pass.Therefore, in one example, a refreshing can only be needed.Refreshing is all Submission or transmission of the reproduction order through accumulating to operating system.When graphics application program triggers reproduction order, the reproduction Order is not sent directly to hardware, for example, screen.On the contrary, the order (is translated) as needed by graphics driver accumulation. Refreshing calling, disposition reproduces and driver sends out all order/buffers through accumulating via operating system kernel to need Send/be submitted to the boundary of hardware.
In some instances, application program can not need to refresh until having sent all data.In addition, application program It can not need clearly to check inquiry data.In one example, the executable inquiry of hardware all over secondary, binning time time and is reproduced all over secondary. It therefore, can be secondary all over time triggering binning time for reproduction is matched based on query result.Generally, for being write well Application program, this all over time compared to other all over the more processing cycles that time can be needed to execute.Therefore, it is preferably utilizing It can single refresh point completion operation in the case where hardware resource.Therefore, various examples can remove unnecessary load/store (decompose/undecomposed) and refresh point.
Fig. 2 is the concept map for illustrating the tiling block for reproducing the scene in framework based on tiling block.As shown in FIG. 2, 3D Drawing Object 206 to be reproduced by GPU (for example, GPU 120 demonstrated in Figure 1) can be by primitive (for example, primitive 208) group At.In example demonstrated in Figure 2, primitive can be the triangle comprising three vertex.In other examples, primitive can be Point, line and so on.3D scene 202 containing Drawing Object 206 can be divided into tiling block, such as tiling block 204.Scene The size that the size of 202 each tiling block (for example, tiling block 204) can be based at least partially on GMEM determines.For example, Each tiling block of scene 202 may be sized so that the part of the scene 202 contained in the block that tiles can be all for example It is reproduced in the graphic memory of GMEM 122 demonstrated in Figure 1.Each tiling block of scene 202 can be considered as in the tiling It include the storehouse of triangle in block.In an example, the width in storehouse and height can be by 32 pixel alignments.Because scene 202 is drawn It is divided into the 5x5 grid of tiling block, so there are 25 tiling blocks in total of scene 202 demonstrated in Figure 2.
GPU 120 can reproduce triangle for reproducing the order of triangle by executing.Therefore, GPU 120 can pass through Execute the order for reproducing each of the triangle for forming Drawing Object 206 and render graphical objects 206.GPU 120 It can be by the triangle classification of scene into storehouse, so that each storehouse may include command stream (set of order) to reproduce in the storehouse The triangle for including.Because 25 corresponding storehouses of scene 202 may be present there are 25 tiling blocks in total of scene 202.With Command stream in each storehouse can be stored in the indirect buffer in memory (for example, system storage 108 demonstrated in Figure 1) In.GPU 120 is by executing the command stream in each storehouse so that the triangle in each of storehouse to be rendered on GMEM 122 Render graphical objects 206.
In some instances, for reconstruction of scenes, it is coarse all over secondary that GPU 120 executes first.GPU 120 then can the second essence Thin time time.Coarse all over time period first, GPU 120 can determine whether the triangle in each of storehouse is visible.? In prior art, after GPU completes first pass time, CPU executes refresh operation.Refresh operation store the result of first pass time and The result is returned into CPU.The result may include that (for example) which triangle is visible and which is sightless Etc..
Result (for example, which triangle is visible and which is sightless) based on inquiry operation, CPU is directed to Second time generation parameter.During second time time, GPU executes the second binning all over secondary.GPU is also generated during second time time can Opinion property stream.GPU can generate new visibility stream during second time time.Using this new visibility stream, GPU executes the second reproduction time It is secondary.After second time time, GPU executes another refresh operation.In the second refresh operation, the content of GMEM can be written to figure Shape buffer or system storage 118.
After the part of scene 202 contained in storehouse is rendered on GMEM 122 by GPU 120, the warp of scene 202 Reproduction part can be loaded into memory, such as frame buffer 124 demonstrated in Figure 1 from GMEM 122.GPU 120 it is repeatable with Lower process: command stream is executed;The triangle in storehouse is rendered on GMEM 122;And by scene 202 through reproduction part from GMEM 122 is loaded into frame buffer 124, so that each storehouse reproduces entire scene 202.
As described in this article, " binning " or " reproduction based on tiling block " is a kind of with smaller portions reproduction 3D scene Mode.A large amount of bandwidth of memories are needed since 3D reproduces, are suitable for using GMEM, dedicated graphics memory, wherein high band Width is close to 3D core.However, due to range constraint, the size of GMEM is restricted in mobile environment.Accordingly, it is possible to need Scene is split into smaller portions so that separably reproducing each.
In another example, it can be used towards stream (faceness stream), but it is applied separately in each storehouse. In other examples, towards stream may include indicate triangle be it is preceding to or every triangle position data backwards.In this example, This extends to visibility stream, and wherein whether each instruction triangle pair is fully visible in given storehouse.Exist for each storehouse One visibility stream enumerates triangle visible for the storehouse.Multiple factors can be used to calculate visibility value: (1) triangle Whether shape is rejected through the back side, and whether (2) described triangle hits storehouse region (comprising Z-direction), and whether (3) described triangle passes through Low resolution Z checks closure.
In one example, multiple visibility streams are created all over time period in binning, creates a visibility for each storehouse Stream.It is reproducing all over time period, is only reading a visibility stream (the visibility stream for current storehouse).In addition, compression is visible Property stream.This can lead to smaller memory consumption.It can also make it possible to fast skip during render stage invisible triangle.
In one example, visibility stream can be generated all over time period in binning.This can be related to the command stream for handling entire scene. However, in general, not carrying out any pixel shader.The creation may include with the next stage: (1) being pushed up with storehouse tinter Point Coloring, (2) lower resolution rasterized, (3) low resolution Z test, and the compression of (4) visibility stream.
In two examples, binning is all over time specific binning tinter of needs.This can be the modified version of vertex shader, Wherein unique output is vertex position.All parameter outputs and relative any calculating can be removed from binning tinter. However, in some instances, not needing to add any specific shader code relevant to storehouse.It (can be possible to also in driving journey Commom summit tinter is used as binning tinter during the initial stage of sequence exploitation.In this situation, should also there be appropriate picture Plain tinter, but it not only constantly receives any pixel).
According to vertex is coloured, the low resolution that rasterizer generates triangle is indicated, wherein each pixel is equal to finally 4 × 4 pixel regions in image.Generated low-resolution pixel can have there are two value: partly covering or be completely covered.Light Gated rejecting identical as common rasterisation use is regular (towards, frustum etc.), and therefore only generates and veritably facilitate scene Those triangles.
Phase III during described is low resolution Z test.GMEM can also be used as Z all over time period in binning and buffer Device.Due to completing to reproduce with 4 × 4 block of pixels, the Z-buffer in GMEM is also in this resolution ratio.Furthermore, it is not necessary that GMEM In color buffer.This means that low resolution Z-buffer (LRZ- buffer) can cover very greatly compared to full resolution Screen on region.Since LRZ buffer does not operate under full resolution, LRZ processing needs to guard.Only for by The pixel that triangle is completely covered carries out being written to LRZ buffer, and the pixel through partly covering does not facilitate Z to be written.This is also Mean that LRZ buffer is not exclusively accurate, because there may be gaps at triangular rim.It, can at the end of binning time time LRZ buffer is written out to external memory, and LRZ buffer described below can be used for delaying in reproduction all over time period initialization Z Rush device.This provides improved Z test during reproduction.
Fig. 3 is the concept map for the primitive that the displaying of aspect according to the present invention divides between storehouse.As shown in fig. 3, Respectively the storehouse 302,304,306 and 308 of 4 × 4 grids containing pixel is through reproduction/rasterisation to contain multiple pixels 310.Generally For, rabbit is the process that image is generated based on existing object or model.Rasterisation (rasterisation) (or grating Change (rasterization)) image of generally shooting (for example, shape) description in a vector graphics format, and by the figure As being converted to raster image (for example, pixel or point) for exporting on video display or printer or for bitmap text The storage of part format.
One or more graphic primitives can be visible in each storehouse.For example, the part of triangle A (Tri A) is in storehouse 302 It is visible in 306 the two of storehouse.The part of triangle B (Tri B) is in each of storehouse 302, storehouse 304, storehouse 306 and storehouse 308 It can be seen that.Triangle C (Tri C) is only visible in storehouse 304.It is reproducing all over time period, scene can be split into storehouse and can by GPU 120 Triangle is assigned to the storehouse.If triangle is in more than one storehouse as it can be seen that triangle can be assigned to by so GPU 120 Wherein triangle is only one in visible storehouse, so that triangle is not every in storehouse 302,304,306 and 308 with reproducing One and be reproduced repeatedly.
GPU 120 may further determine which triangle in storehouse most afterwards through actually visible in reconstruction of scenes.For example, Some triangles can be at one or more other triangle rears and will be most afterwards through invisible in reconstruction of scenes.By this method, nothing The sightless triangle for the storehouse need to be reproduced.
When executing specific reproduction time time, figure is storable in the specific pixel data reproduced all over secondary associated storehouse In shape memory (such as GMEM 122 demonstrated in Figure 1 (sometimes referred to as storehouse buffer)).After executing reproduction time time, The content of GMEM 122 can be transmitted to frame buffer 124 by GPU 120.In some cases, GPU 120 can be used and is stored in A part of the data in rewriting data frame buffer 124 in GMEM 122.In other conditions, GPU 120 can delay frame The data rushed in device 124 and the Data Synthesis being stored in GMEM 122 combine.The content of GMEM 122 is being transmitted to frame After buffer 124, GMEM 122 can be initialized as default value and start the subsequent reproduction time relative to different storehouses by GPU 120 It is secondary.
Fig. 4 is the explanation of technology according to the present invention for executing the concept map for the technology that multipass time reproduces.In general, This, which allows only to execute, refreshes once.Can function in Fig. 4 it is oriented come from " top to bottm " and from it is " left-to-right " execute as described in Function.More precisely, as illustrated in Figure 4, it is very false all over secondary 404, condition that condition can be executed after executing inquiry time time 400 All over secondary 406, and once three times times (inquiry is all over secondary 400, condition true 404 and condition vacation 406) completions, it is carried out refresh operation 408.Therefore, in some instances, technology of the invention can be completed to remove in GPU 120 by modifying the behavior of application program The refresh operation 408 that inquiry carries out after taking second place, the application program utilize the GPU architecture based on tiling block.Specifically, exist In some examples, technology of the invention may include new execution point, back door order, and/or extend.These new execution points, back door life Enabling and/or extending allows GPU 120 and graphics driver to remove the second inquiry operation as described above.In some examples In, 11 figure API of technology modification DirectX of the invention comprising the additional reproduction for allowing GPU 120 to remove refresh command to order It enables.
When executing reproduction of the multipass time based on tiling block, 120 general execution of GPU inquiry time time 400 and inquiry are checked 402, condition is and then executed very all over secondary 404 and condition vacation time time 406.Inquiry checks that 402 can be for by pair of application triggers Any request of information.For example, application program reproduce the something on the side API, refresh the object for being rendered to graphics card.This Specific reproduce is completed in operation.Then inquiry can be transmitted.The inquiry can be to be transmitted when reproducing and being refreshed to graphics card from API Pixel number.The inquiry can be appointing to the information about the performed last state for reproducing or currently reproducing operation What is requested, and the last reproduction or current reproduction operation are by hardware supported.In some cases, application program reproduce something, Triggering inquiry, and only send the data based on query result.
According to some examples of present application, the transmittable inquiry of the application program and multiple IB2.As described above, IB2 contains the order of the various aspects for reproducing pipeline.
In an example, two query result-"true" and "false" can be possible.Therefore, the application program can be sent out Two IB2 are sent, an IB2 for "true" query result and an IB2 for "false" query result.Therefore, it can be transmitted and use In the order of the various aspects of the reproduction pipeline of "true" query result, and the reproduction pipeline for "false" query result can be transmitted The order of various aspects.The application program sends two possibilities, inquires for the IB2 of "true" query result and for "false" As a result IB2.
Because the application program sends both the IB2 and the IB2 for "false" query result for being used for "true" query result, So the application program withouts waiting for sending the data based on query result.In fact, GPU, which will allow, is used for "true" inquiry knot Both the IB2 of fruit and the IB2 for "false" query result are sent by the application program, and GPU may wait for the result of inquiry.Cause This, rather than the application program is allowed to wait, the GPU may wait for query result and the then executable inquiry time time of the GPU.Institute Condition can be executed very all over secondary all over secondary result in inquiry by stating GPU in the case where "true".In addition, refresh operation can not executed The condition is executed in the case where 408 very all over secondary.The GPU can execute condition in the case where "false" all over secondary result in inquiry False time time.Condition vacation can be also executed in the case where not executing refresh operation 408 all over secondary.In response to the result based on inquiry time time Condition is executed to conditionity very all over one of secondary or condition vacation time time, and refresh operation 408 then can be performed in the GPU.
Therefore, in some aspects of the invention, wait query result that can be sent to GPU from application program.In other words It says, the GPU rather than the application program may wait for the result of inquiry.This situation is possible, because in some instances, institute GPU can be sent for the IB2 for being used for condition "true" situation and both the IB2 for being used for condition "false" situation by stating application program.Therefore, The GPU, which has, is directed to both "true" situation and "false" the situation application program for the content of reproduction and the application program The content wished to carry out, because each of IB2 contains the order of the various aspects of the reproduction pipeline for its respective conditions, For example, being respectively used to the condition "true" situation and condition "false" situation of "true" query result and "false" query result.
Inquiry is visible or sightless all over time 400 determining triangles, and very false all over secondary 404 and condition for condition All over time 406 set up the condition.Once GPU 120, which completes to inquire, is carried out condition very all over secondary 404 all over time 400, GPU 120, and then The vacation of execution condition is all over secondary 406.Condition very reproduces different data and reproduction order based on application program with condition vacation time time all over time Sequence is accumulated wherein.
Technology of the invention may include the reproduction instruction of the beginning and end for the reproduction time time that may specify that GPU 120 is executed. Therefore, some case technologies of the invention include and may specify (for example) to inquire all over secondary 400, condition very all over secondary 404 and condition vacation time The reproduction instruction of secondary 406 beginning and end.Specifically, some case technologies of the invention include entrance, for example, Start_Query_Pass order, End_Query_Pass order, Start_Condition_True_pass and End_ Condition_true_pass and End_condition_false_pass order.These are through " exposing " to allow to deposit Take the entrance of different code subprogram.In addition, " exposing " refers to the entrance of code block, provides and can trigger code block Functional attributes (when needing the functional attributes).As described in this article, these are to be exposed through out to application program So that the application program can provide the instruction for the content for being submitted to driver by application program to entering for driver Mouth point.
In one example, in each pair of corresponding order (for example, inquiry is really opened all over secondary beginning and inquiry all over time end, condition Begin and condition really terminate, condition vacation beginning and condition false knot beam) between, graphics driver or application program are specified for GPU 120 in the reproduction order for reproducing and executing all over time period.Once completing all times times, GPU 120 is carried out refresh command.Brush Three results all over secondary-inquiry all over secondary 400, condition true 404 and condition vacation 406-can be written to system storage by newer command 118。
As described in this article, inquiry can be run all over secondary 400 before binning time time.Binning can reproduced all over secondary all over secondary It is run before (for example, for condition very reproduction all over secondary 404 or the reproduction for condition vacation time time 406).It (is not specified in Fig. 4 Binning is all over secondary.) in some cases, it can skip binning and reproduced so that being run after secondary 400 immediately in inquiry all over secondary all over secondary 404 or 406.In some cases, inquire after times 400 neither execute binning all over time and also execute reproduction all over time, for example, For the condition very reproduction all over secondary 404 or the reproduction for condition vacation time time 406.For example, condition vacation can cause not all over secondary Binning is executed all over secondary and reproduction all over time, but situation is not that such was the case with.However, in some instances, it should be for condition vacation all over secondary It executes binning time time and reproduces all over secondary.Condition can be directed to very all over the secondary binning time time and reproduction of executing all over secondary.Condition is very all over secondary and item Main difference between part vacation time time is that different data and reproduction order is made to be based on the accumulation of application program reproduction sequence wherein.
Inquiry can return to all over secondary 400 indicates whether to execute binning all over secondary data.In one example, condition executes determination Unit can be determined whether should to execute binning all over time or reproduce all over time, for example, for condition very all over times 404 reproduction or be used for condition False time time 406 reproduction.It can be based on the complexity of object to be reproduced all over secondary determination to binning is executed.For example, for Simple object can skip binning all over secondary.Conversely, for more complex object, binning can be performed all over secondary.As retouched herein It states, condition really reproduces and condition vacation reproduction can carry out.Inquiry is really reproduced all over secondary and condition and condition vacation can be before refreshing 408 It carries out.
As described in this article, some examples just refresh until all data have arranged team.For example, inquiry all over times 400, It reproduces all over times 404 and reproduces and can respectively line up all over times 406 all over times 400 and to be reproduced all over times 404 for refreshing 408 in inquiry And it reproduces and executes refreshing after secondary each of 406.Therefore, single refreshing 408 can be performed.This situation can be to have ready conditions And can based on inquiry.As described in this article, in some instances, GPU 120 completes to inquire all over secondary 400, reproduction all over secondary 404 With reproduction all over secondary 406, and refresh 408.Processor 102 can be transmitted data to by refreshing 408, for the use of operating system 114. The data can be for from the data through accumulating reproduction order.As described in this article, in some instances, refresh as all warps Accumulate the submission or transmission of reproduction order to operating system 114.When graphics application program triggers reproduction order, graphics driver journey Sequence will directly not accumulated reproduction order and be sent to hardware.Reproduction order is accumulated and (is translated as needed) by graphics driver.
In addition, in some instances, it is not important that Query Value is how many.Therefore, it may be unnecessary to lock memory place with So that value can not be written to (for example) through lock memory place because whether rewrite memory place can be inessential.It may It is unnecessary to fetch calling, etc..In some instances, the executable pre- binning time for not facilitating visibility stream of storage drive program Secondary/inquiry is all over secondary.In some instances, binning can be performed all over secondary in storage drive program.Binning is executed to conditionity all over secondary.It looks into Asking result can check that 402 return from inquiry.Inquiry checks that 402 can return to "true" or "false" value.It can be based on the true or false knot of inquiry Fruit reconstruction of scenes.In other words, binning is executed to the query result conditionity based on true or false all over secondary.True query result can lead to Reproduction is all over secondary 404, and false query result can lead to reproduction time time 406.
When the condition or value that are returned by binning time time are true, condition, which really reproduces 404, can facilitate visibility stream.Alternatively, such as Fruit is false all over time condition of return or value by binning, then condition vacation, which reproduces 406 as IB2 is reproduced, facilitates visibility stream.Condition Property execute reproduction all over times 404 and 406.It can be for binning of correct reproduction time time triggering all over secondary.Then it can be performed and correctly may be used Opinion property stream and optimization.Reproduction only can be executed all over secondary without executing an inquiry time time to perfect form.Can individually it refresh Point completes the operation.
For example, the method for the multipath graphic rendition on the framework based on tiling block can be performed in some devices. Such device may include checking 402, condition very all over secondary 404 without executing refresh operation 408, base all over time execution inquiry based on inquiry In inquiry all over time execution condition vacation all over time 406 GPU without executing refresh operation 408.In general, it is checked based on given inquiry 402 result executes condition true time time 404 or condition vacation time time 406.It is true secondary all over secondary and condition vacation time in response to execution condition, Refresh operation 408 can be performed in GPU.In some instances, condition is true or condition is false can lead to the binning for generating visibility stream all over secondary. Alternatively, any one of these-condition is true or condition is false, directly reproduction time time can be used to carry out reconstruction of scenes.
In some instances, inquiry time time (for example, inquiry checks 402) may include the first inquiry all over secondary.First is executed to look into Asking all over time may include executing instruction the first inquiry all over the graphic rendition order of secondary beginning.In addition, in some instances, the is executed One inquiry further includes all over time and executes instruction the first inquiry all over the graphic rendition order of secondary end.In some instances, it holds The vacation of row condition further includes all over time and executes instruction condition vacation all over the graph command of secondary end.In some instances, execution is opened Beginning condition is further included all over time indicates beginning condition all over the graphic rendition order of secondary end.In some instances, item is executed Part very further includes all over time and executes instruction the first inquiry all over the graphic rendition order of secondary end.In some instances, it executes Condition vacation further comprises executing instruction condition vacation all over the graph command of secondary beginning all over time.
Fig. 5 is can be by the example for the function that hardware executes according to the explanation of one or more examples described in the present invention Concept map.In the illustrated example of Fig. 5, inquiry can be executed within hardware all over secondary 500.Binning time can be also executed within hardware Secondary 502.In addition, reproduction can be executed within hardware all over secondary 504.In some instances, hardware may include GPU 120 or other processing Hardware.Inquiry can be controlled by operating system 506 all over secondary 500, binning time time 502 and is reproduced all over secondary 504.Operating system 506 can rise Dynamic inquiry is all over secondary 500.In some instances, it reproduces and executes condition very all over secondary and condition comprising the result based on query result all over secondary One of false time time.Both the controllable second time binning of the result and reproduction.
Inquiry can be executed in query block 508 all over secondary 500, the query block 508 can return to query result in advance really Fixed memory or predetermined register.Operating system 506 can cause query result to be stored in predetermined memory Or in predetermined register.In addition, can be stored in the inquiry in predetermined memory or predetermined register As a result it can be used all over secondary 502, reproduction all over secondary 504 or the two by binning.For example, in combination with multipass time reconstruction of scenes and phase Associated object uses query result.It, can multiple reconstruction of scenes and associated object in multipass time reproduces.Drafting pair every time As when, can computing object appearance additional aspect and combine it with Previous results.In general, this can be related to coarse initial It reproduces and based on the first coarse detailed second reproduction time time all over secondary query result.Inquiry knot can be checked during inquiring inspection Fruit, and the query result can lead to the true query result of condition or condition vacation query result.As described above, the inquiry can For by any request to information of application triggers.The query result can then be caused by the true queue of execution condition The true graphic rendition 404 of condition or the query result can then cause condition dummy pattern to reproduce by execution condition vacation queue 406。
Binning is conditional all over secondary 502.In binning all over time 502 periods, GPU can produce composition scene and divide polygon Polygon (for example, triangle) of the class to multiple " storehouses ".As described in this article, in binning all over the storehouse that time 502 periods are defined It can be directly related to that the tiling block of last scene over the display is presented (for example, sometimes referred to as " screen tiling block ").Citing For, each storehouse indicates last scene (for example, graph image, still image or its class that video data frame, computer generate Like the predefined part of person) a part or tiling block.Therefore, term " storehouse " and " tiling block " can make interchangeably herein With.
In some instances, GPU also executes operation all over time 502 periods in binning to determine which of polygon in scene In it is seen, for example, execute depth test with determine a polygon whether cover another polygon.Determining which polygon is on the scene Jing Zhongke is seen below, and GPU just can produce the data flow referred to as " visibility stream ".Visibility stream may include the polygon for scene Each of value, and described value can indicate whether polygon visible (for example, value " 1 " can indicate polygon be it is visible and Value " 0 " can indicate that polygon is sightless).
It reproduces and is also conditional all over secondary 504.It is each in time 504 periods, the defined tiling block of reproduction reproducing Person.In some instances, it can complete to reproduce all over each of secondary with three phases: (1) remove/undecomposed, (2) reproduce, (3) It decomposes.
In some instances, the executable pre- binning for not facilitating visibility stream of storage drive program is all over secondary.In some realities In example, binning is can be performed all over secondary 502 in storage drive program.Binning is executed to conditionity all over secondary 502.For example, for straight Reproduction is connect, can skip binning all over secondary 502.In some instances, when conditionity executing binning time time 502, the binning time Secondary 502 can return to instruction, and whether it facilitates the value of visibility stream.Alternatively, if being all over time 502 conditions returned or value by binning Vacation, then binning facilitates visibility stream as IB2 is reproduced all over secondary 502.Reproduction is also executed to conditionity all over secondary 504.When condition is When true, reproduce and facilitate visibility stream all over secondary 504.When conditionity executing reproduction time time 504, the reproduction time time 504 may be used also Return to the value of "true" or "false".When condition is fictitious time, reproduction facilitates visibility stream as IB2 is reproduced all over secondary 504.It can be for correct It reproduces all over time 504 triggering binnings time time 502.Then correct visibility stream and optimization can be performed.It can be only to correct geometric form Shape executes reproduction all over secondary 504 without executing an inquiry time time 500.It can the single refresh point completion operation.
In an example, indirect buffer 1 (IB1) can call inquiry all over secondary 500, binning time time 502 or reproduce all over secondary 504.It inquires all over secondary 500, binning all over secondary 502 and reproduction all over a part that secondary 504 can be indirect buffer 2 (IB2).Citing comes It says, inquiry all over secondary 502 and reproduces the list that can be respectively in (i.e.) IB2 grades of buffers of independent IB2 all over secondary 504 all over secondary 500, binning Only entity.Inquiry can be run before secondary 502 all over secondary 500 in binning.Binning can be run before secondary 504 all over secondary 502 reproducing. In some cases, it can skip binning and reproduced so that being run after secondary 500 immediately in inquiry all over secondary 504 all over secondary 502.One Under a little situations, binning is neither executed all over times 502 nor execute reproduction all over times 504 inquiring after times 500.
For example, inquiry can return to data all over secondary 500, for example, indicating whether to execute binning all over secondary 502 counter Value or heuristic.In one example, condition executes determination unit and can be determined whether that binning time time 502 or reproduction time time should be executed 504.It can be based on the complexity of object to be reproduced all over secondary 502 determination to binning is executed.For example, for simple right As can skip binning all over secondary 502.Conversely, for more complex object, binning can be performed all over secondary 502.Therefore, for simpler Single object can skip binning time time 502 so that immediately in time time 500 operation reproductions time time later 504 are inquired.In addition, right In simple object, can single iteration execute and reproduce all over times 504, for example, entire screen can it is single all over time rather than with a system The write-in of column block.This can be possible for (for example) simply repeating very much Screen reproduction.
As described in this article, some examples just refresh until having sent all data.In addition, in some instances, looking into It is not important that inquiry value is how many.Therefore, it may be unnecessary to locking (for example) memory place because whether rewrite memory place can It is inessential.It may not be necessary and fetch calling, etc..In some instances, storage drive program is executable does not facilitate visibility The pre- binning of stream is all over secondary/inquiry time time.In some instances, binning can be performed all over secondary 502 in storage drive program.Conditionability Ground executes binning all over secondary 502.In some instances, when conditionity executing binning time time 502, the binning can all over secondary 502 Returning to instruction, it facilitates the value of visibility stream.Alternatively, if the conditions or value that are returned by binning all over secondary 502 are false, binning Facilitate visibility stream as IB2 is reproduced all over secondary 502.Reproduction is also executed to conditionity all over secondary 504.When condition is true, reproduction time Secondary 504 facilitate visibility stream.When conditionity executing reproduction all over times 504, it is described reproduce all over times 504 also can return to "true" or The value of "false".When condition is fictitious time, reproduction facilitates visibility stream as IB2 is reproduced all over secondary 504.It can reproduce for correct all over time touching A binning is sent out all over secondary.Then correct visibility stream and optimization can be performed.Only perfect form can be executed and be reproduced all over secondary 504 Without executing an inquiry all over secondary 500.It can the single refresh point completion operation.
For example, the method for the multipath graphic rendition on the framework based on tiling block can be performed in some devices. Such device may include being based on inquiring without executing refresh operation all over secondary, condition true time time based on the secondary 500 execution inquiry of inquiry time All over time 500 execution condition vacations all over the secondary GPU without executing refresh operation, and it is true all over secondary and condition vacation time in response to executing condition Secondary, refresh operation can be performed in GPU.
In some instances, inquiry may include the first inquiry all over secondary all over secondary 500.It may include executing that the first inquiry, which is executed, all over time Indicate the first inquiry all over the graphic rendition order of secondary beginning.In addition, in some instances, executing the first inquiry all over secondary further Comprising executing instruction the first inquiry all over the graphic rendition order of secondary end.In some instances, condition vacation is executed all over secondary into one Step is comprising executing instruction condition vacation all over the graph command of secondary end.In some instances, beginning condition is executed all over secondary further Comprising instruction beginning condition all over the graphic rendition order of secondary end.In some instances, condition is executed very all over time further packet Graphic rendition order containing the end for executing instruction the first inquiry time time.In some instances, condition vacation is executed all over secondary further Including executing instruction condition vacation all over the graph command of secondary beginning.
Fig. 6 is the explanation according to one or more examples described in the present invention for more on the framework based on tiling block The flow chart for the instance method that graphics path reproduces.GPU 120 generates inquiry all over time (600).Inquiry can be wrapped further all over secondary 500 Containing the first inquiry all over secondary.In addition, executing the secondary figure that may include the beginning for executing instruction the first inquiry time time of the first inquiry time again Now order.In some instances, it executes the first inquiry time time and further includes the figure for executing instruction the end of the first inquiry time time Shape reproduction order.
GPU 120 is based on inquiring all over time 500 Production conditions very all over secondary without executing refresh operation (602).In some examples In, beginning condition is executed all over the graphic rendition order of the secondary end for further including instruction beginning condition time time.Execution condition is true It also can further include all over time and execute instruction the first inquiry all over the graphic rendition order of secondary end.
GPU 120 is based on inquiry all over time 500 Production conditions vacations time time without executing refresh operation (604).In some examples In, the vacation of execution condition further includes all over time and executes instruction condition vacation all over the graph command of secondary end.In some instances, it holds The vacation of row condition further includes all over time and executes instruction condition vacation all over the graph command of secondary beginning.
GPU 120 very executes refresh operation (606) all over secondary and condition vacation time time in response to execution condition.Once completing three A time time-inquiry is all over secondary 400, condition true 404 and condition vacation 406, so that it may execute refresh operation.In general, this allows single The performance of a refresh operation.Refresh command 408 can be by three all over secondary-inquiry time time 400, condition true 404 and condition vacation 406- Result be written to system storage 118.
Fig. 7 is the block diagram for illustrating can be configured to implement the example of the device of one or more aspects of the invention.Citing comes It says, Fig. 7 illustrates device 702.The example of device 702 is including (but not limited to) video-unit, media player, set-top box, wireless Mobile phone (for example, mobile phone and so-called smart phone), personal digital assistant (PDA), desktop PC, calculating on knee Machine, game console, video conference unit, tablet computing device and so on.
In the example of figure 7, device 702 may include processor 10, system storage 118 and GPU 120.For succinct Purpose is not in relation to Fig. 7 and further describes processor 102, system storage 118 and GPU 120, because previously about Fig. 1 Describe these components.Device 702 also may include video-stream processor 724, transceiver module 726, user interface 728, and display Device 730.Both transceiver module 726 and video-stream processor 724 can be identical with processor 102 and/or GPU 120 integrated A part of circuit (IC).In another example, transceiver module 726 and video-stream processor 724 both can IC or comprising The outside of several IC of processor 102 and/or GPU 120.In a further example, transceiver module 726 and video-stream processor 724 It can be formed in the IC outside the IC comprising processor 102 and/or GPU 120.
For purposes of clarity, device 702 may include the additional modules not shown in Fig. 7 or unit.For example, it fills Setting 702 may include loudspeaker and microphone, none in the two is shown in Fig. 7.In the reality that device 702 is mobile radiotelephone In example, loudspeaker and microphone can be used for realizing telephone communication.It may include to mention when device 702 is media player Loudspeaker or its for sound output may include accessory power outlet.Device 702 also may include video camera.In addition, institute's exhibition in device 702 The various modules and unit shown can need not be in each examples of device 702.It for example, is desktop calculating in device 702 Machine or equipped to being carried out with external user interface or display in the example of other devices of interface connection, user interface 728 It can be outside device 702 with display 730.
The example of user interface 728 is including (but not limited to) touch screen, tracking ball, mouse, keyboard and other types of defeated Enter device.User interface 728 also for touch screen and can be used as a part of display 730 and be incorporated to.Transceiver module 726 can wrap Containing to allow the circuit wirelessly or non-wirelessly communicated between device 702 and another device or network.Transceiver module 726 can wrap Containing modulator, demodulator, amplifier and for other such circuits of wired or wireless communication.
In some instances, the image formed completely can be stored in system storage 118 by GPU 120.Display processing Device 724 can retrieve image from system storage 118, and output causes the pixel illumination of display 730 to show described image Value.Display 730 can be to show by the display of the device 702 of the picture material generated of GPU 120.Display 730 can be liquid Crystal display (LCD), organic light emitting diode display (OLED), cathode-ray tube (CRT) display, plasma display or Another type of display device.
In one or more examples, described function can be implemented with hardware, software, firmware, or any combination thereof.Such as Fruit is implemented in software, then function can be stored on computer-readable media as one or more instructions or codes or via Computer-readable media is transmitted.Computer-readable media may include computer data storage media or communication medium, communication Media include any media for promoting for computer program to be transmitted to another place from one.Data storage medium can for can by one or Multiple computers or the access of one or more processors are to retrieve instruction for implementing technology described in the present invention, code And/or any useable medium of data structure.By way of example and not limitation, these computer-readable medias may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage apparatus, disk storage device or other magnetic storage devices.Such as institute herein It uses, disk and CD include compact disk (CD), laser-optical disk, optical compact disks, digital versatile disc (DVD), floppy disc And Blu-ray Disc, wherein disk usually magnetically reproduce data, and CD with laser reproduce data optically.It is above-mentioned Every combination should also be included within the scope of computer-readable media.
Code can be executed by one or more processors, one or more described processors such as one or more Digital Signal Processing Device (DSP), general purpose microprocessor, specific integrated circuit (ASIC), Field Programmable Logic Array (FPGA) or other equivalent Integrated or discrete logic.Therefore, " processor " can refer to aforementioned structure or be adapted for carrying out as used herein, the term Any one of any other structure of technology described herein.In addition, in certain aspects, function described herein Can property may be provided in specialized hardware, software module or be configured for use in coding and these decoded combination, or can be simultaneously Enter in combined codec.In addition, the technology can be fully implemented in one or more circuits or logic elements.
Technology of the invention may be implemented in extensive a variety of devices or equipment, described device or equipment include wireless phone, Integrated circuit (IC) or one group of IC (that is, chipset).Various components, modules, or units are described to emphasize to be configured in the present invention The functional aspect of the device of revealed technology is executed, but not necessarily needs to pass different hardware unit realization.In fact, as above Described by text, various units combine in codec hardware unit in combination with suitable software, firmware or the two, or by interior Portion operates the set of hardware cell (comprising one or more processors as described above) to provide.
Various examples have been described.These and other example is in the scope of the following claims.

Claims (22)

1. a kind of method for the multipath graphic rendition on the framework based on tiling block, which comprises
It is received from the application program executed on the processor in addition to the GPU for condition with graphics processing unit GPU True time the first group command and second group command secondary for condition vacation time;
Inquiry time time is executed without executing refresh operation to refresh the inquiry all over secondary knot to the processor with the GPU Fruit;
True condition is indicated all over the secondary result in response to the inquiry and is used the GPU to be based on the inquiry time time and is executed institute Condition is stated very all over secondary without executing the refresh operation, the execution condition very includes execution first group of life all over secondary It enables;
It uses the GPU to be based on the inquiry time time all over secondary result instruction false condition in response to the inquiry and executes institute Condition vacation is stated all over secondary without executing the refresh operation, the execution condition vacation includes to execute second group of life all over time It enables;And
The GPU execution refresh operation very is used all over the secondary and condition vacation time time in response to the execution condition.
2. according to the method described in claim 1, it includes the first inquiry all over secondary and execution described first that wherein the inquiry time is secondary Inquiry time time includes executing instruction first inquiry all over the graphic rendition order of secondary beginning.
3. according to the method described in claim 2, wherein execute first inquiry all over time further comprise execute instruction it is described Graphic rendition order of first inquiry all over secondary end.
4. according to the method described in claim 1, wherein executing the condition vacation time time further comprises executing instruction the item Graph command of the part vacation all over secondary end.
5. according to the method described in claim 1, it further comprises coming scene again with GPU execution directly reproduction time time Scape.
6. according to the method described in claim 5, wherein executing the condition further comprises very executing instruction first to look into all over time Ask the graphic rendition order all over secondary end.
7. according to the method described in claim 1, wherein executing the condition vacation time time further comprises executing instruction the item Graph command of the part vacation all over secondary beginning.
8. a kind of device for the multipath graphic rendition on the framework based on tiling block comprising:
Memory;And
GPU, the GPU are configured to:
From the application program that is executed on the processor in addition to the GPU receive for condition very all over time the first group command and For condition vacation all over the second secondary group command;
Inquiry time time is executed without executing refresh operation to refresh the inquiry all over secondary result to the processor;
True condition is indicated all over the secondary result in response to the inquiry and is executed described condition true time based on the inquiry all over secondary It is secondary without executing the refresh operation, it is described to execute the condition very all over time comprising executing first group command;
The condition vacation time is executed based on the inquiry all over secondary all over secondary result instruction false condition in response to the inquiry It is secondary without executing the refresh operation, it is described to execute the condition vacation all over time comprising executing second group command;And
The refresh operation very is executed all over the secondary and condition vacation time time in response to the execution condition.
9. device according to claim 8, wherein it includes the first inquiry all over secondary and execution described first that the inquiry time is secondary Inquiry time time includes executing instruction first inquiry all over the graphic rendition order of secondary beginning.
10. device according to claim 9, wherein execute first inquiry all over time further comprise execute instruction it is described Graphic rendition order of first inquiry all over secondary end.
11. device according to claim 8, wherein executing the condition vacation time time further comprises executing instruction the item Graph command of the part vacation all over secondary end.
12. device according to claim 8, wherein the GPU is further configured to execute directly reproduction time time and come again Live scape.
13. device according to claim 12, wherein executing the condition further comprises very executing instruction first all over time Graphic rendition order of the inquiry all over secondary end.
14. device according to claim 8, wherein executing the condition vacation time time further comprises executing instruction the item Graph command of the part vacation all over secondary beginning.
15. a kind of device for the multipath graphic rendition on the framework based on tiling block comprising:
For being used for graphics processing unit GPU from the application program reception executed on the processor in addition to the GPU Condition is very all over the first secondary group command and for condition vacation all over the device of the second secondary group command;
For executing inquiry all over secondary without executing refresh operation to refresh the inquiry all over secondary to the processor with the GPU Result device;
It uses the GPU to be based on the inquiry time time for indicating true condition all over the secondary result in response to the inquiry and holds For the row condition very all over the secondary device without executing the refresh operation, described device includes for executing first group command Device;
It is held for using the GPU to be based on the inquiry time time all over secondary result instruction false condition in response to the inquiry All over the secondary device without executing the refresh operation, described device includes for executing second group command for the row condition vacation Device;And
For very using the GPU execution refresh operation all over the secondary and condition vacation time time in response to the execution condition Device.
16. device according to claim 15, wherein the inquiry all over time include the first inquiry all over time and execute described the One inquiry time time includes executing instruction first inquiry all over the graphic rendition order of secondary beginning.
17. device according to claim 16, wherein executing first inquiry all over time further comprises referring to for executing Show the device of graphic rendition order of first inquiry all over secondary end.
18. device according to claim 15, wherein execute the condition vacation all over time further comprise execute instruction it is described Graph command of the condition vacation all over secondary end.
19. device according to claim 15 further comprises for using GPU execution directly to reproduce all over secondary next The device of reconstruction of scenes.
20. device according to claim 19, wherein executing the condition further comprises very executing instruction first all over time Graphic rendition order of the inquiry all over secondary end.
21. device according to claim 15, wherein execute the condition vacation all over time further comprise execute instruction it is described Graph command of the condition vacation all over secondary beginning.
22. a kind of computer-readable storage medium, the computer-readable storage medium have the instruction that is stored thereon with In the multipath graphic rendition on the framework based on tiling block, described instruction just causes one or more graphics process lists after execution First GPU is performed the following operation:
From the application program that is executed on the processor in addition to the GPU receive for condition very all over time the first group command and For condition vacation all over the second secondary group command;
Inquiry time time is executed without executing refresh operation to refresh the inquiry all over secondary result to the processor;
True condition is indicated all over the secondary result in response to the inquiry and is executed described condition true time based on the inquiry all over secondary It is secondary without executing the refresh operation, it is described to execute the condition very all over time comprising executing first group command;
The condition vacation time is executed based on the inquiry all over secondary all over secondary result instruction false condition in response to the inquiry It is secondary without executing the refresh operation, it is described to execute the condition vacation all over time comprising executing second group command;And
The refresh operation very is executed all over the secondary and condition vacation time time in response to the execution condition.
CN201480070397.XA 2013-12-27 2014-12-04 Optimized multipass time in the block formula architecture that tiles reproduces Active CN105849780B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201361921145P 2013-12-27 2013-12-27
US61/921,145 2013-12-27
US14/154,996 2014-01-14
US14/154,996 US9280845B2 (en) 2013-12-27 2014-01-14 Optimized multi-pass rendering on tiled base architectures
PCT/US2014/068573 WO2015099970A1 (en) 2013-12-27 2014-12-04 Optimized multi-pass rendering on tiled base architectures

Publications (2)

Publication Number Publication Date
CN105849780A CN105849780A (en) 2016-08-10
CN105849780B true CN105849780B (en) 2019-01-22

Family

ID=52118029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201480070397.XA Active CN105849780B (en) 2013-12-27 2014-12-04 Optimized multipass time in the block formula architecture that tiles reproduces

Country Status (6)

Country Link
US (2) US9280845B2 (en)
EP (1) EP3087553B1 (en)
JP (1) JP6073533B1 (en)
KR (1) KR101721861B1 (en)
CN (1) CN105849780B (en)
WO (1) WO2015099970A1 (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9280845B2 (en) * 2013-12-27 2016-03-08 Qualcomm Incorporated Optimized multi-pass rendering on tiled base architectures
US9842424B2 (en) * 2014-02-10 2017-12-12 Pixar Volume rendering using adaptive buckets
GB2526598B (en) 2014-05-29 2018-11-28 Imagination Tech Ltd Allocation of primitives to primitive blocks
DE102015115605A1 (en) * 2014-09-16 2016-03-17 Jeffrey A. Bolz Techniques for passing on dependencies in an API
US9600926B2 (en) * 2014-12-15 2017-03-21 Intel Corporation Apparatus and method decoupling visibility bins and render tile dimensions for tiled rendering
US9922449B2 (en) * 2015-06-01 2018-03-20 Intel Corporation Apparatus and method for dynamic polygon or primitive sorting for improved culling
US10535114B2 (en) * 2015-08-18 2020-01-14 Nvidia Corporation Controlling multi-pass rendering sequences in a cache tiling architecture
US9842376B2 (en) 2015-09-29 2017-12-12 Qualcomm Incorporated Graphics processing unit preemption with pixel tile level granularity
US10096147B2 (en) 2016-03-10 2018-10-09 Qualcomm Incorporated Visibility information modification
US20170352182A1 (en) * 2016-06-06 2017-12-07 Qualcomm Incorporated Dynamic low-resolution z test sizes
CN106708594B (en) * 2016-12-12 2020-06-09 中国航空工业集团公司西安航空计算技术研究所 Implementation method of hierarchical OpenGL runtime compiling software
US10607390B2 (en) * 2016-12-14 2020-03-31 Nvidia Corporation Techniques for tiling compute work with graphics work
US10699368B1 (en) * 2017-08-30 2020-06-30 Apple Inc. Memory allocation techniques for graphics shader
CN110058926B (en) * 2018-01-18 2023-03-14 伊姆西Ip控股有限责任公司 Method, apparatus, and computer-readable medium for processing GPU tasks
CN108510430A (en) * 2018-03-27 2018-09-07 长沙景嘉微电子股份有限公司 A kind of implementation method of resource-sharing in the GPU rendered based on piecemeal
US11315225B2 (en) 2019-06-20 2022-04-26 Samsung Electronics Co., Ltd. Coarse depth culling during binning
US11373267B2 (en) * 2019-11-04 2022-06-28 Qualcomm Incorporated Methods and apparatus for reducing the transfer of rendering information
US11263718B2 (en) * 2020-02-03 2022-03-01 Sony Interactive Entertainment Inc. System and method for efficient multi-GPU rendering of geometry by pretesting against in interleaved screen regions before rendering
US11373268B2 (en) * 2020-09-30 2022-06-28 Qualcomm Incorporated Apparatus and method for graphics processing unit hybrid rendering
US20230140640A1 (en) * 2021-11-03 2023-05-04 Intel Corporation 3d graphics driver to split frames into multiple command buffer submissions based on analysis of previous frames
US20240104684A1 (en) * 2022-09-23 2024-03-28 Qualcomm Incorporated Visibility generation in tile based gpu architectures

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7962923B2 (en) * 2005-12-30 2011-06-14 Level 3 Communications, Llc System and method for generating a lock-free dual queue
US8266232B2 (en) * 2005-10-15 2012-09-11 International Business Machines Corporation Hardware processing of commands within virtual client computing environment

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5778432A (en) * 1996-07-01 1998-07-07 Motorola, Inc. Method and apparatus for performing different cache replacement algorithms for flush and non-flush operations in response to a cache flush control bit register
US6646639B1 (en) 1998-07-22 2003-11-11 Nvidia Corporation Modified method and apparatus for improved occlusion culling in graphics systems
US20030167460A1 (en) * 2002-02-26 2003-09-04 Desai Vipul Anil Processor instruction set simulation power estimation method
US20030212728A1 (en) * 2002-05-10 2003-11-13 Amit Dagan Method and system to perform complex number multiplications and calculations
US6952206B1 (en) 2002-08-12 2005-10-04 Nvidia Corporation Graphics application program interface system and method for accelerating graphics processing
US7554538B2 (en) * 2004-04-02 2009-06-30 Nvidia Corporation Video processing, such as for hidden surface reduction or removal
US20050257026A1 (en) * 2004-05-03 2005-11-17 Meeker Woodrow L Bit serial processing element for a SIMD array processor
US7721069B2 (en) * 2004-07-13 2010-05-18 3Plus1 Technology, Inc Low power, high performance, heterogeneous, scalable processor architecture
US20060095894A1 (en) * 2004-09-15 2006-05-04 Wilde Myles J Method and apparatus to provide graphical architecture design for a network processor having multiple processing elements
US8633936B2 (en) * 2008-04-21 2014-01-21 Qualcomm Incorporated Programmable streaming processor with mixed precision instruction execution
GB0810311D0 (en) 2008-06-05 2008-07-09 Advanced Risc Mach Ltd Graphics processing systems
US9354887B2 (en) * 2010-06-28 2016-05-31 International Business Machines Corporation Instruction buffer bypass of target instruction in response to partial flush
US20120084539A1 (en) * 2010-09-29 2012-04-05 Nyland Lars S Method and sytem for predicate-controlled multi-function instructions
US9330430B2 (en) 2011-03-21 2016-05-03 Apple Inc. Fast queries in a multithreaded queue of a graphics system
US9117302B2 (en) 2011-11-30 2015-08-25 Qualcomm Incorporated Switching between direct rendering and binning in graphics processing using an overdraw tracker
US10242481B2 (en) * 2012-03-15 2019-03-26 Qualcomm Incorporated Visibility-based state updates in graphical processing units
US10535185B2 (en) * 2012-04-04 2020-01-14 Qualcomm Incorporated Patched shading in graphics processing
US8941676B2 (en) * 2012-10-26 2015-01-27 Nvidia Corporation On-chip anti-alias resolve in a cache tiling architecture
US9280845B2 (en) * 2013-12-27 2016-03-08 Qualcomm Incorporated Optimized multi-pass rendering on tiled base architectures

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8266232B2 (en) * 2005-10-15 2012-09-11 International Business Machines Corporation Hardware processing of commands within virtual client computing environment
US7962923B2 (en) * 2005-12-30 2011-06-14 Level 3 Communications, Llc System and method for generating a lock-free dual queue

Also Published As

Publication number Publication date
WO2015099970A1 (en) 2015-07-02
JP6073533B1 (en) 2017-02-01
US20160148338A1 (en) 2016-05-26
US20150187117A1 (en) 2015-07-02
EP3087553B1 (en) 2020-10-28
CN105849780A (en) 2016-08-10
KR101721861B1 (en) 2017-03-31
JP2017505476A (en) 2017-02-16
US9836810B2 (en) 2017-12-05
EP3087553A1 (en) 2016-11-02
KR20160096719A (en) 2016-08-16
US9280845B2 (en) 2016-03-08

Similar Documents

Publication Publication Date Title
CN105849780B (en) Optimized multipass time in the block formula architecture that tiles reproduces
US10282813B2 (en) Flex rendering based on a render target in graphics processing
US9978115B2 (en) Sprite graphics rendering system
CN106575228B (en) Post-processing object order rearrangement in graphics process
JP6571884B2 (en) Start node determination for shadow ray tree traversal in graphics processing
JP6273380B2 (en) Start node determination for tree traversal in raytracing applications
JP5960368B2 (en) Rendering of graphics data using visibility information
US9569811B2 (en) Rendering graphics to overlapping bins
KR102614847B1 (en) Apparatus and method for graphics processing unit hybrid rendering
KR101711775B1 (en) Graphics memory load mask for graphics processing
CN105210111A (en) Conditional execution of rendering commands based on per bin visibility information with added inline operations
KR20180056316A (en) Method and apparatus for performing tile-based rendering
EP3427229B1 (en) Visibility information modification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant