CN117058288A - Graphics processor, multi-core graphics processing system, electronic device, and apparatus - Google Patents

Graphics processor, multi-core graphics processing system, electronic device, and apparatus Download PDF

Info

Publication number
CN117058288A
CN117058288A CN202210490158.6A CN202210490158A CN117058288A CN 117058288 A CN117058288 A CN 117058288A CN 202210490158 A CN202210490158 A CN 202210490158A CN 117058288 A CN117058288 A CN 117058288A
Authority
CN
China
Prior art keywords
block
graphics
core
primitives
processing system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210490158.6A
Other languages
Chinese (zh)
Inventor
武杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangdixian Computing Technology Chongqing Co ltd
Original Assignee
Xiangdixian Computing Technology Chongqing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangdixian Computing Technology Chongqing Co ltd filed Critical Xiangdixian Computing Technology Chongqing Co ltd
Priority to CN202210490158.6A priority Critical patent/CN117058288A/en
Publication of CN117058288A publication Critical patent/CN117058288A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/50Lighting effects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Generation (AREA)

Abstract

The present disclosure provides a graphics processor, a graphics processing method, a multi-core graphics processing system, an electronic device, and an electronic apparatus. The graphics processor includes: a geometry processing module configured to: performing geometric processing on the primitives distributed to the graphic processor; a tile partitioning module configured to: and carrying out block division processing on the primitives allocated to the graphic processor, respectively storing the obtained block information of each block into a first block list of each block corresponding to the graphic processor, respectively storing the indexes of the first block list of each block into a second block list of each block, and storing the indexes of the blocks in the first block lists corresponding to a plurality of graphic processors of the multi-core graphic processing system in the second block list of each block.

Description

Graphics processor, multi-core graphics processing system, electronic device, and apparatus
Technical Field
The present disclosure relates to the technical field of GPUs (Graphics Processing Unit, graphics processors), and in particular, to a graphics processor, a multi-core graphics processing system, an electronic device, an electronic apparatus, and a graphics processing method.
Background
GPUs are widely used in personal computers, workstations, servers, embedded systems, electronic gaming machines, and the like. Because the GPU is designed as a highly parallel architecture, it is advantageous over general purpose processor CPUs (Central Processing Unit, central processing units) in terms of large parallel processing algorithms, which is highly efficient for graphics processing.
A multi-core graphics processing system (also called a multi-core GPU) refers to a GPU product that commonly implements graphics processing functions through multiple GPUs (also called GPU cores). Each GPU in a multi-core graphics processing system has the same, complete GPU functionality.
Typical flow of Tile-based GPU graphics processing can be divided into several stages, host-side (Host-side) processing, geometry processing, tile partitioning, rasterization, and pixel processing.
In the prior art, a multi-core GPU adopting a Tile-based rendering architecture is required to complete rendering operation before pixel processing in a main core, and after determining the Tile (Tile) division of a graphic element, a plurality of GPU cores are respectively and separately distributed to own tiles for rasterization and pixel processing.
The current rendering process is difficult to fully utilize the advantages of multiple cores, and geometric processing and block division are realized through multi-core parallelization.
Disclosure of Invention
The disclosure aims to provide a graphics processor, a multi-core graphics processing system, an electronic device, electronic equipment and a graphics processing method, which realize multi-core parallelization for geometric processing and block division.
According to one aspect of the present disclosure, there is provided a graphics processor applied to a multi-core graphics processing system, the graphics processor including at least:
a geometry processing module configured to: performing geometric processing on the primitives distributed to the graphic processor;
a tile partitioning module configured to: the method comprises the steps of carrying out block division processing on primitives allocated to a graphics processor, respectively storing obtained block information of each block into first block lists of each block corresponding to the graphics processor, respectively storing indexes of the first block lists of each block into second block lists of each block, and storing indexes of the block in the first block lists corresponding to a plurality of graphics processors of the multi-core graphics processing system in the second block lists of the block.
If the graphics processor is the primary core in a multi-core graphics processing system, the graphics processor may further include a primitive assignment module configured to: primitives in the image frames are assigned to a plurality of graphics processors in a multi-core graphics processing system.
On the basis of any of the above graphics processor embodiments, the allocating primitives in the image frame to a plurality of graphics processors in the multi-core graphics processing system may include: the primitives in the image frames are grouped according to a predetermined rule, and each group of primitives is distributed to a plurality of graphics processors in the multi-core graphics processing system according to a predetermined load balancing strategy.
Based on any of the graphics processor embodiments described above, the geometry processing module may include a primitive assignment sub-module and a plurality of geometry processing sub-modules. Wherein the primitive assignment submodule is configured to: distributing the graphic elements distributed to the graphic processor to a plurality of geometric processing sub-modules; the geometry processing sub-module is configured to: and performing geometric processing on the primitives distributed to the geometric processing submodule.
Based on any of the graphics processor embodiments described above, the tile partitioning module may be further configured to:
the index of the first tile list of each tile is saved to the second tile list of each tile, respectively.
According to another aspect of the present disclosure, there is also provided a graphics processing system including at least two graphics processors, each graphics processor configured to:
performing geometric processing on the primitives distributed to the graphic processor;
and carrying out block division processing on the primitives allocated to the graphic processor, respectively storing the obtained block information of each block into a first block list of each block corresponding to the graphic processor, respectively storing the indexes of the first block list of each block into a second block list of each block, and storing the indexes of the block in the first block lists of the at least two graphic processors in the second block list of each block.
Of at least two graphics processors of a multi-core graphics processing system, one graphics processor is a master core and the other graphics processors are slaves. Wherein the primary core may be further configured to: primitives in the image frames are assigned to a plurality of graphics processors in a multi-core graphics processing system.
Further, assigning primitives in the image frame to a plurality of graphics processors in a multi-core graphics processing system may include: the primitives in the image frames are grouped according to a predetermined rule, and each group of primitives is distributed to a plurality of graphics processors in the multi-core graphics processing system according to a predetermined load balancing strategy.
Based on any of the above embodiments of the multi-core graphics processing system, the specific implementation manner of the method for performing geometric processing on the primitives allocated to the graphics processor may include: the primitives assigned to the graphics processor are geometrically processed in parallel. For a specific implementation of parallel geometry processing, reference may be made to the embodiments of graphics processors described above, which are not limited by the present disclosure.
In accordance with any of the multi-core graphics processing system embodiments described above, each graphics processor may be further configured to: the index of the first tile list of each tile is saved to the second tile list of each tile, respectively.
According to another aspect of the present disclosure, there is also provided an electronic device including the multi-core graphics processing system described in any one of the embodiments above. In some use cases, the product form of the electronic device is embodied as a graphics card; in other use scenarios, the product form of the electronic device is embodied as a CPU motherboard.
According to another aspect of the present disclosure, there is also provided an electronic apparatus including the above-described electronic device. In some use scenarios, the product form of the electronic device is a portable electronic device, such as a smart phone, a tablet computer, a VR device, etc.; in some use cases, the electronic device is in the form of a personal computer, a game console, or the like.
According to another aspect of the present disclosure, there is also provided a graphics processing method applied to a graphics processor in a multi-core graphics processing system, the graphics processing method including at least the following operations:
performing geometric processing on the primitives distributed to the graphic processor;
the method comprises the steps of carrying out block division processing on primitives allocated to a graphics processor, respectively storing obtained block information of each block into first block lists of each block corresponding to the graphics processor, respectively storing indexes of the first block lists of each block into second block lists of each block, and storing indexes of the block in the first block lists of a plurality of graphics processors of the multi-core graphics processing system in the second block lists of the blocks.
If the graphics processor is a main core in the multi-core graphics processing system, the graphics processing method may further include: primitives in the image frames are assigned to a plurality of graphics processors in a multi-core graphics processing system.
Further, assigning primitives in the image frame to a plurality of graphics processors in a multi-core graphics processing system may include: the primitives in the image frames are grouped according to a predetermined rule, and each group of primitives is distributed to a plurality of graphics processors in the multi-core graphics processing system according to a predetermined load balancing strategy.
On the basis of any graphics processing method embodiment, the geometric processing of the graphics primitives allocated to the graphics processor may include: the primitives assigned to the graphics processor are geometrically processed in parallel.
On the basis of any of the above graphics processing method embodiments, the graphics processing method may further include: the index of the first tile list of each tile is saved to the second tile list of each tile, respectively.
Drawings
FIG. 1 is a diagram illustrating a first tile list and primitive data structure according to one embodiment of the present disclosure;
FIG. 2 is a diagram of a second tile list structure provided by one embodiment of the present disclosure;
FIG. 3 is a schematic diagram of parallel geometry processing and tile partitioning flow provided by one embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a graphics processing system architecture based on a multi-core GPU architecture according to one embodiment of the present disclosure;
FIG. 5 is a flow chart of a graphics processing method according to one embodiment of the disclosure.
Detailed Description
Before describing embodiments of the present disclosure, it should be noted that:
some embodiments of the disclosure are described as process flows, in which the various operational steps of the flows may be numbered sequentially, but may be performed in parallel, concurrently, or simultaneously.
The terms "first," "second," and the like may be used in embodiments of the present disclosure to describe various features, but these features should not be limited by these terms. These terms are only used to distinguish one feature from another.
The term "and/or," "and/or" may be used in embodiments of the present disclosure to include any and all combinations of one or more of the associated features listed.
It will be understood that when two elements are described in a connected or communicating relationship, unless a direct connection or direct communication between the two elements is explicitly stated, connection or communication between the two elements may be understood as direct connection or communication, as well as indirect connection or communication via intermediate elements.
In order to make the technical solutions and advantages of the embodiments of the present disclosure more apparent, the following detailed description of exemplary embodiments of the present disclosure is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments of which are exhaustive. It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other.
Because tile partitioning requires centralized processing, the corresponding tile information needs to be stored centrally, resulting in the GPU operations prior to rasterization needing to be completed within the main core. After determining the tile partitioning of the primitives, the multiple GPU cores are subdivided into tiles that are assigned to themselves. Because the problem of centralized storage of the block information cannot be overcome, multi-core parallel processing is performed on the geometric processing and the block dividing process in the prior art.
It is an object of the present disclosure to provide an implementation of geometry processing and tile partitioning in parallel with multiple cores in a multi-core graphics processing system employing a tile-based rendering architecture. The multiple cores in the graphics processing system refer to multiple GPUs, and the GPUs refer to processors which are realized through hardware and have computing functions, and include computing units, caches and other components, which may be GPGPUs (general-purpose graphics processing unit) or GPUs.
One embodiment of the present disclosure provides a graphics processor, applied to a multi-core graphics processing system, adapted for tile-based rendering architectures, such as TBR (Tile Based Render, tile-based rendering), TBDR (Tile Based Deferred Rendering, tile-based deferred rendering), and the like.
The graphics processor includes at least a geometry processing module and a tile partitioning module.
Wherein the geometry processing module is configured to: the primitives assigned to the present graphics processor are geometrically processed.
The geometry processing module is typically implemented using custom hardware and at least one programmable shader, although this disclosure is not limited in this regard. In practical applications, the geometry processing function may also be implemented using custom hardware.
Wherein the tile partitioning module is configured to: the method comprises the steps of carrying out block division processing on primitives allocated to a graphics processor, respectively storing obtained block information of each block into first block lists of each block corresponding to the graphics processor, respectively storing indexes of the first block lists of each block into second block lists of each block, and storing indexes of the block in the first block lists corresponding to a plurality of graphics processors of the multi-core graphics processing system in the second block lists of the block.
The tile partitioning process includes: the primitives (pritive) covered by each tile are determined. The graphic processor stores the primitive index of the primitive covered by each graphic block determined after the graphic block dividing process to a first graphic block list corresponding to the graphic processor, so that a module (e.g. a rasterization module) for executing a subsequent rendering process reads the primitive information of the primitive covered by the graphic block according to the primitive index of the primitive covered by the graphic block.
The primitive index of the primitive covered by each tile determined after the tile dividing process is the tile information of each tile obtained after the tile dividing process.
Wherein the first tile list may be, but is not limited to being, stored in system memory.
The primitive information is generated in a geometric stage, and the primitive information includes vertex attribute information (such as vertex color, vertex coordinates and the like) of the primitive and basic information (such as primitive identification and the like) of the primitive.
By way of example and not limitation, as shown in FIG. 1, in some embodiments, the data structure holding primitive information is primitive blocks (primitive blocks), each primitive block including primitive information for a plurality of primitives. Each block corresponds to a first block list, and each first block list comprises a primitive index corresponding to the primitive covered by the block, wherein the primitive index indicates the storage address of the primitive information of the primitive. For example, the first Tile0 covers four primitives: primitive 1-0, primitive 1-1, primitive 2-0, and primitive 3-1. The primitive information of the primitive 1-0 is primitive information 0 in the first primitive block, the primitive information of the primitive 1-1 is primitive information 1 in the first primitive block, the primitive information of the primitive 2-0 is primitive information 0 in the second primitive block, and the primitive information of the primitive 3-1 is primitive information 1 in the third primitive block. Then, the Tile list corresponding to the first Tile0 includes the primitive index of the primitive 1-0 (in this embodiment, the primitive index of the primitive 1-0 is the storage address of the primitive information 0 in the first primitive block), the primitive index of the primitive 1-1 (in this embodiment, the primitive index of the primitive 1-1 is the storage address of the primitive information 1 in the first primitive block), the primitive index of the primitive 2-0 (in this embodiment, the primitive index of the primitive 2-0 is the storage address of the primitive information 0 in the second primitive block), and the primitive index of the primitive 3-1 (in this embodiment, the primitive index of the primitive 3-1 is the storage address of the primitive information 1 in the third primitive block). For another example, the first Tile1 covers three primitives: primitive 1-1, primitive 2-0, and primitive 3-0. The description of the primitives 1-1 and 2-0 may be referred to above, and will not be repeated here. The primitive information of primitive 3-0 is primitive information 0 in the third primitive block. Then, the first Tile list corresponding to the first Tile1 includes the primitive index of the primitive 1-1, the primitive index of the primitive 2-0 and the primitive index of the primitive 3-0 (in this embodiment, the primitive index of the primitive 3-0 is the storage address of the primitive information 0 in the third primitive block).
In practical application, the primitive identifier may also be used as a primitive index.
As can be seen from the above description, the tile information of each tile processed by each graphics processor is stored in the corresponding first tile list, which belongs to a decentralized storage manner. But the rendering phase in units of tiles (e.g., rasterization) requires that the primitive information for all primitives covered by one tile be read at a time. In order to read the primitive information of all primitives covered by one tile at a time on the premise of the decentralized storage of the tile information, the present disclosure adopts a tile information storage structure of a secondary tile-list. That is, the primitive indexes are stored in the first tile list corresponding to each graphics processor in a scattered manner, and the indexes of the primitive indexes in each first tile list are stored in the second tile list in a concentrated manner. Accordingly, in a rendering stage (e.g., rasterization) in units of tiles, the index of the primitive index may be read from the second tile list, indexed to the first tile list, and further indexed to the corresponding primitive information.
For simplicity of illustration, consider dividing 4 tiles as an example, assuming that N graphics processors in a graphics processing system participate in geometry processing and tile division, as shown in fig. 2, a second tile list of one tile stores an index of the tile in a first tile list corresponding to each graphics processor. For example, the second Tile list of the first Tile0 stores pointers of the first Tile list Core0 Tile-list0 of the first Tile0 corresponding to the graphics processor Core0, pointers of the first Tile list Core1Tile-list 0 of the first Tile0 corresponding to the graphics processor Core1, and so on until pointers of the first Tile list Core Tile-list0 of the first Tile0 corresponding to the graphics processor Core n; the second block list of the second block Tile1 stores pointers of first block list Core0 Tile-list1 of the second block Tile1 corresponding to the graphic processor Core0, pointers of first block list Core1Tile-list 1 of the second block Tile1 corresponding to the graphic processor Core1, and so on until pointers of first block list Core Tile-list1 of the second block Tile1 corresponding to the graphic processor Core n; the second block list of the third block Tile2 stores pointers of the first block list Core0 Tile-list2 of the third block Tile2 corresponding to the graphic processor Core0, pointers of the first block list Core1Tile-list 2 of the third block Tile2 corresponding to the graphic processor Core1, and so on until pointers of the first block list Core Tile-list2 of the third block Tile2 corresponding to the graphic processor Core n; the second block list of the fourth block Tile3 stores pointers of the first block list Core0 Tile-list3 of the fourth block Tile3 corresponding to the graphics processor Core0, pointers of the first block list Core1Tile-list3 of the fourth block Tile3 corresponding to the graphics processor Core1, and so on until pointers of the first block list Core Tile-list3 of the fourth block Tile3 corresponding to the graphics processor Core n.
The above description uses a pointer as an index as an example, and in practical application, the present disclosure is not limited to a specific implementation form of the index.
While the above description has been given by taking the example of storing the first tile list indexes of the tiles in the second tile list in the order of the graphics processor numbers, in practical application, the first tile list indexes of the tiles may also be stored in other orders, which is not limited in this disclosure.
With the above-described two-level tile-list storage structure, subsequent rendering stages (e.g., rasterization) in tile units may still address each tile to a unique second tile list, but each second tile list is made up of the index of the first tile list of the next level.
In this disclosure, after determining that the tile dividing modules of the graphics processing units all complete tile division, the main core may send the storage address of the second tile list to a subsequent rendering module (e.g. a rasterizing module) taking the tiles as a unit, so that the subsequent rendering module reads, from each second tile list, the index of each first tile list block by block according to the storage address, further reads the primitive index of the primitive according to the index of each first tile list, and further reads the primitive information according to the primitive information index of the primitive.
In addition to any of the graphics processor embodiments described above, the graphics processor may further include a primitive assignment module configured to: primitives in the image frames are assigned to a plurality of graphics processors in a multi-core graphics processing system. The graphics processor operates as its primitive assignment module when acting as a master core and does not operate as its slave core.
It should be noted that the primitive assignment may also be implemented by other modules in the graphics processing system, as an example and not by way of limitation, and the primitive assignment of the various graphics processors may also be implemented by application processors in the graphics processing system.
The present disclosure does not limit the primitive assignment policy. In practical application, the primitive allocation strategy can be formulated according to the needs or practical situations. By way of example and not limitation, primitives may be allocated in a round robin fashion, or may be allocated using a preset load balancing policy.
In some embodiments, primitives in an image frame may be grouped according to a predetermined rule and each group of primitives may be assigned to multiple graphics processors in a multi-core graphics processing system according to a predetermined load balancing policy.
In the embodiment of the disclosure, the allocation primitive may be, but not limited to, vertex data corresponding to a primitive allocated to a graphics processor is sent to the graphics processor.
In some embodiments, geometry processing may be performed in parallel through multiple geometry pipelines, that is, one graphics processor having multiple geometry pipelines. Accordingly, in accordance with any of the above-described graphics processor embodiments, the geometry processing module may include a primitive assignment sub-module and a plurality of geometry processing sub-modules. Wherein the primitive assignment submodule is configured to: distributing the graphic elements distributed to the graphic processor to a plurality of geometric processing sub-modules; the geometry processing sub-module is configured to: and performing geometric processing on the primitives distributed to the geometric processing submodule.
Correspondingly, as shown in fig. 3, the primary core performs primary allocation on the primitives, each graphics processor (including the primary core and the secondary core) performs primitive allocation on the allocated primitives again, the primitives allocated to the graphics processor are allocated to a plurality of geometric pipelines (corresponding to geometric processing sub-modules) in the graphics processor, and after the geometric pipelines perform parallel geometric processing, each graphics processor performs block division on the primitives allocated to the graphics processor.
By the parallel geometry processing and the block division shown in fig. 3, the parallel advantages of multiple cores can be fully exerted, and the graphics processing performance is improved.
The description of the implementation of the primitive assignment sub-module may refer to the description of the primitive assignment module, which is not repeated herein. In practical application, the function of the primitive distribution submodule can be realized by adopting the existing implementation mode or implementation principle.
The embodiments of the present disclosure are not limited to a specific implementation of geometric processing, and may be implemented using existing geometric processing implementations.
Based on any of the graphics processor embodiments described above, the second tile list may be pre-configured, with the index of each first tile list being fixed. The index of each first tile list in the second tile list may also be dynamically changing, then the tile partitioning module may be further configured to: the index of the tile information of each tile in the first tile list is saved to the tile information of each tile in the second tile list.
The disclosed embodiments also provide a multi-core graphics processing system of a tile-based rendering architecture, the graphics processing system comprising at least two graphics processors, wherein the functionality implemented by each graphics processor may be as described with reference to any of the graphics processor embodiments above.
The multi-core graphic processing system provided by the real-time method has the advantages of fully delivering the parallelism of multiple cores in the geometric processing stage and the image block dividing stage, and better realizing the performance expansion and improvement.
In an embodiment of the present disclosure, a product form of the graphics processing System may be an SOC (System on Chip) Chip.
The graphics processing system in the embodiments of the present disclosure may be a single die SOC chip or a multi die interconnect SOC chip.
The architecture and the working principle of the graphics processing system provided in the present disclosure are described below by taking one die as an example.
In one embodiment shown in fig. 4, the single die graphics processing system includes multiple GPU cores (GPU cores), i.e., the graphics processor described above, with one GPU Core as a master and the other GPU cores as slave cores.
The GPU core is used to process drawing instructions, and according to the drawing instructions, execute Pipeline of image rendering, and can also be used to execute other operation instructions. The GPU core further includes: the computing unit is used for executing instructions compiled by the shader, belongs to a programmable module and consists of a large number of ALUs; a Cache (Cache) for caching GPU-kernel data to reduce access to memory; the rasterization module is used for a fixed stage of the 3D rendering pipeline and further comprises a primitive information calculation module and a pixel information processing module; a block division (tiling) module, wherein the TBR and TBDR GPU architectures perform block division processing on a frame; the clipping module clips out primitives which are outside the observation range or are not displayed on the back surface at a fixed stage of the 3D rendering pipeline; the post-processing module is used for performing operations such as zooming, cutting, rotating and the like on the drawn graph; microcores (microcores) for scheduling between various pipeline hardware modules on a GPU core, or for task scheduling for multiple GPU cores.
The GPU core is connected to a network on chip. Wherein the network-on-chip is used for data exchange between various masters and slaves (salves) on the graphics processing system, in this embodiment the network-on-chip includes a configuration bus, a data communication network, a communication bus, and so on.
As shown in fig. 4, the graphics processing system may further include:
a general purpose DMA (Direct Memory Access ) for performing data movement between the host side to a graphics processing system memory (e.g., graphics card memory), such as moving vertex (vertex) data of a 3D drawing from the host side to the graphics processing system memory via DMA;
the PCIe controller is used for realizing PCIe protocol through the interface communicated with the host, so that the graphics processing system is connected to the host through the PCIe interface, and programs such as a graphics API, a driver of a display card and the like are run on the host;
the application processor is used for scheduling tasks of each module on the graphic processing system, for example, the GPU is notified to the application processor after rendering a frame of image, and the application processor is restarted to display the image drawn by the GPU on a screen by the display controller;
the memory controller is used for connecting memory equipment and storing data on the SOC;
a display controller for controlling the frame buffer in the memory to be output to the display by a display interface (HDMI, DP, etc.);
video decoding, which can decode the coded video on the host hard disk into pictures capable of being displayed;
the original video code stream on the hard disk of the host can be coded into a specified format and returned to the host.
Based on the multi-core graphics processing system architecture shown in FIG. 4, in one embodiment, the graphics rendering process is as follows:
the graphics API of the host (in practical application, for the graphics processing system of the mobile terminal, software on the application processor may also send a drawing instruction to the SOC chip, which requires rendering of an image frame.
Wherein the image frame includes at least one object therein.
The universal DMA transfers vertex coordinate information of each object in the image frame from the host side to the graphics processing system memory.
After the computing unit of the main core acquires the drawing instruction, the drawing instruction is decoded.
The primitive assignment module of the main core (the function of which is implemented by the computing unit) assigns primitives in the image frame to a plurality of GPU cores including the main core according to a predetermined assignment policy.
The primitive assignment submodule (the function of which is realized by the computing unit) of each GPU core assigns the primitives assigned to the GPU core to a plurality of geometric pipelines of the GPU core; each geometric pipeline performs geometric processing in parallel: the vertex shader (whose function is implemented by the computing unit) acquires vertex coordinate information corresponding to the primitives allocated to the present GPU core from the system memory, and transmits the vertex coordinate information to the geometry shader (whose function is implemented by the computing unit), which converts the 3D coordinates of the vertices into expanded texture coordinates (i.e., (u, v) coordinates). In addition, the calculation unit also performs primitive assembly according to the vertex coordinate information. The value of the texture coordinate corresponding to the vertex coordinate in the texture map is vertex color information.
Vertex coordinate information and vertex texture coordinates of the primitives are saved to a data structure of the primitives in system memory.
After the geometric processing is finished, the block division module of each GPU core performs block division processing on the primitives in the image frame according to the size of the depth buffer zone, and stores the block division processing result in a first block list, wherein the first block list comprises primitive indexes of the primitives covered by the blocks, and the primitive indexes indicate storage addresses of primitive information of the primitives as shown in fig. 1.
When all objects to be displayed for a frame have completed the tile splitting, then the rasterization process is initiated.
According to the requirement of delay rendering, the rasterizing module processes each image block, reads the index of a first image block list of the current image block corresponding to each GPU core from a second image block list of the current image block each time, and reads the primitive index (primitive identification or primitive information storage address) stored in the first image block list according to the index of the first image block list; the rasterizing module reads the primitive information of the primitive through the primitive index, and performs a pixel coverage test using the primitive information of the primitive to determine a pixel covered by the primitive, and then performs a pixel interpolation calculation and at least one pixel test (by way of example and not limitation, the pixel test may include a depth test, a template test, etc.).
In the present disclosure, pixel coverage testing, pixel interpolation calculation, and pixel testing are implemented using existing techniques.
And outputting the information of the pixels through rasterization processing.
When all first tiles of a frame have been rasterized, then pixel processing will be invoked.
Specifically, the fragment shader of the GPU core (whose functions are implemented by the computing unit) performs shading calculations (e.g., illumination calculations) for corresponding pixels on a tile-by-tile basis according to texture coordinates of the primitive-covered pixels.
The disclosed embodiments also provide an electronic device including the graphics processing system described in any of the above embodiments. In some use cases, the product form of the electronic device is embodied as a graphics card; in other use scenarios, the product form of the electronic device is embodied as a CPU motherboard.
The embodiment of the disclosure also provides electronic equipment, which comprises the electronic device. In some use scenarios, the product form of the electronic device is a portable electronic device, such as a smart phone, a tablet computer, a VR device, etc.; in some use cases, the electronic device is in the form of a personal computer, game console, workstation, server, etc.
Based on the same inventive concept, the embodiments of the present disclosure also provide a graphics processing method, which is applied to a graphics processor in a multi-core graphics processing system, as shown in fig. 5, and at least includes the following operations:
step 501, performing geometric processing on primitives allocated to the graphic processor;
step 502, performing tile division processing on the primitives allocated to the graphics processor, respectively storing the obtained tile information of each tile into a first tile list of each tile corresponding to the graphics processor, respectively storing the index of the first tile list of each tile in a second tile list of each tile, and storing the index of the tile in the first tile lists of a plurality of graphics processors of the multi-core graphics processing system in the second tile list of each tile.
If the graphics processor is a main core in the multi-core graphics processing system, the graphics processing method may further include: primitives in the image frames are assigned to a plurality of graphics processors in a multi-core graphics processing system.
Further, assigning primitives in the image frame to a plurality of graphics processors in a multi-core graphics processing system may include: the primitives in the image frames are grouped according to a predetermined rule, and each group of primitives is distributed to a plurality of graphics processors in the multi-core graphics processing system according to a predetermined load balancing strategy.
On the basis of any graphics processing method embodiment, the geometric processing of the graphics primitives allocated to the graphics processor may include: the primitives assigned to the graphics processor are geometrically processed in parallel.
On the basis of any of the above graphics processing method embodiments, the graphics processing method may further include: the index of the first tile list of each tile is saved to the second tile list of each tile, respectively.
It should be noted that the above-described graphics processing method is based on the same inventive concept as the above-described graphics processor. Therefore, the specific implementation manner of each step in the method and the explanation of the related nouns can refer to the description of the above embodiments, which are not repeated here.
While the preferred embodiments of the present disclosure have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the disclosure.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present disclosure without departing from the spirit or scope of the disclosure. Thus, the present disclosure is intended to include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (17)

1. A graphics processor for use in a multi-core graphics processing system, the graphics processor comprising at least:
a geometry processing module configured to: performing geometric processing on the primitives distributed to the graphic processor;
a tile partitioning module configured to: and carrying out block division processing on the primitives allocated to the graphic processor, respectively storing the obtained block information of each block into a first block list of each block corresponding to the graphic processor, respectively storing the indexes of the first block list of each block into a second block list of each block, and storing the indexes of the blocks in the first block lists corresponding to a plurality of graphic processors of the multi-core graphic processing system in the second block list of each block.
2. The graphics processor of claim 1, further comprising a primitive assignment module if the graphics processor is a master core in the multi-core graphics processing system, the primitive assignment module configured to: primitives in the image frames are assigned to a plurality of graphics processors in the multi-core graphics processing system.
3. The graphics processor of claim 2, the assigning primitives in an image frame to a plurality of graphics processors in the multi-core graphics processing system, comprising: grouping the primitives in the image frame according to a preset rule, and distributing each group of primitives to a plurality of graphics processors in the multi-core graphics processing system according to a preset load balancing strategy.
4. A graphics processor according to any one of claims 1 to 3, the geometry processing module comprising a primitive assignment sub-module and a plurality of geometry processing sub-modules;
the primitive assignment submodule is configured to: assigning primitives assigned to the present graphics processor to the plurality of geometry processing sub-modules;
the geometry processing submodule is configured to: and performing geometric processing on the primitives distributed to the geometric processing submodule.
5. A graphics processor according to any one of claims 1 to 3, the tile partitioning module further configured to:
and respectively saving the indexes of the first block lists of the blocks to the second block lists of the blocks.
6. A multi-core graphics processing system comprising at least two graphics processors, each graphics processor configured to:
performing geometric processing on the primitives distributed to the graphic processor;
and carrying out block division processing on the primitives allocated to the graphic processor, respectively storing the obtained block information of each block into a first block list of each block corresponding to the graphic processor, respectively storing the indexes of the first block list of each block into a second block list of each block, and storing the indexes of the blocks in the first block lists corresponding to the at least two graphic processors in the second block list of each block.
7. The multi-core graphics processing system of claim 6, one of the at least two graphics processors being a master core and the other graphics processors being slave cores;
the primary core is further configured to: primitives in the image frames are assigned to a plurality of graphics processors in the multi-core graphics processing system.
8. The multi-core graphics processing system of claim 7, the assigning primitives in an image frame to a plurality of graphics processors in the multi-core graphics processing system, comprising: grouping the primitives in the image frame according to a preset rule, and distributing each group of primitives to a plurality of graphics processors in the multi-core graphics processing system according to a preset load balancing strategy.
9. The multi-core graphics processing system of any of claims 6 to 8, the geometrically processing primitives assigned to the present graphics processor, comprising: the primitives assigned to the graphics processor are geometrically processed in parallel.
10. The multi-core graphics processing system of any of claims 6 to 8, the each graphics processor further configured to: and respectively saving the indexes of the first block lists of the blocks to the second block lists of the blocks.
11. An electronic device comprising the multi-core graphics processing system of any of claims 6 to 10.
12. An electronic device comprising the electronic apparatus of claim 11.
13. A graphics processing method, applied to a graphics processor in a multi-core graphics processing system, the graphics processing method comprising at least:
performing geometric processing on the primitives distributed to the graphic processor;
and carrying out block division processing on the primitives allocated to the graphic processor, respectively storing the obtained block information of each block into a first block list of each block corresponding to the graphic processor, respectively storing the index of the first block list of each block into a second block list of each block, and storing the index of the block in the first block list of a plurality of graphic processors of the multi-core graphic processing system in the second block list of each block.
14. The graphics processing method of claim 13, further comprising, if the graphics processor is the primary core in the multi-core graphics processing system:
primitives in the image frames are assigned to a plurality of graphics processors in the multi-core graphics processing system.
15. The graphics processing method of claim 14, the assigning primitives in an image frame to a plurality of graphics processors in the multi-core graphics processing system, comprising: grouping the primitives in the image frame according to a preset rule, and distributing each group of primitives to a plurality of graphics processors in the multi-core graphics processing system according to a preset load balancing strategy.
16. A method of graphics processing according to any one of claims 13 to 15, the geometrically processing primitives allocated to the graphics processor comprising: the primitives assigned to the graphics processor are geometrically processed in parallel.
17. The graphics processing method of any one of claims 13 to 15, the method further comprising:
and respectively saving the indexes of the first block lists of the blocks to the second block lists of the blocks.
CN202210490158.6A 2022-05-07 2022-05-07 Graphics processor, multi-core graphics processing system, electronic device, and apparatus Pending CN117058288A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210490158.6A CN117058288A (en) 2022-05-07 2022-05-07 Graphics processor, multi-core graphics processing system, electronic device, and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210490158.6A CN117058288A (en) 2022-05-07 2022-05-07 Graphics processor, multi-core graphics processing system, electronic device, and apparatus

Publications (1)

Publication Number Publication Date
CN117058288A true CN117058288A (en) 2023-11-14

Family

ID=88666851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210490158.6A Pending CN117058288A (en) 2022-05-07 2022-05-07 Graphics processor, multi-core graphics processing system, electronic device, and apparatus

Country Status (1)

Country Link
CN (1) CN117058288A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117252751A (en) * 2023-11-17 2023-12-19 摩尔线程智能科技(北京)有限责任公司 Geometric processing method, device, equipment and storage medium
CN117472840A (en) * 2023-12-28 2024-01-30 摩尔线程智能科技(北京)有限责任公司 Multi-core system and method for data processing of multi-core system
CN117934260A (en) * 2024-03-25 2024-04-26 摩尔线程智能科技(北京)有限责任公司 Rendering method, block distribution device, graphics processing device and computing device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117252751A (en) * 2023-11-17 2023-12-19 摩尔线程智能科技(北京)有限责任公司 Geometric processing method, device, equipment and storage medium
CN117252751B (en) * 2023-11-17 2024-02-13 摩尔线程智能科技(北京)有限责任公司 Geometric processing method, device, equipment and storage medium
CN117472840A (en) * 2023-12-28 2024-01-30 摩尔线程智能科技(北京)有限责任公司 Multi-core system and method for data processing of multi-core system
CN117472840B (en) * 2023-12-28 2024-04-16 摩尔线程智能科技(北京)有限责任公司 Multi-core system and method for data processing of multi-core system
CN117934260A (en) * 2024-03-25 2024-04-26 摩尔线程智能科技(北京)有限责任公司 Rendering method, block distribution device, graphics processing device and computing device

Similar Documents

Publication Publication Date Title
US10402937B2 (en) Multi-GPU frame rendering
US10217183B2 (en) System, method, and computer program product for simultaneous execution of compute and graphics workloads
CN111062858B (en) Efficient rendering-ahead method, device and computer storage medium
CN117058288A (en) Graphics processor, multi-core graphics processing system, electronic device, and apparatus
US9293109B2 (en) Technique for storing shared vertices
US11908039B2 (en) Graphics rendering method and apparatus, and computer-readable storage medium
CN106575430B (en) Method and apparatus for pixel hashing
US9418616B2 (en) Technique for storing shared vertices
US8195882B2 (en) Shader complex with distributed level one cache system and centralized level two cache
CN110675480B (en) Method and apparatus for acquiring sampling position of texture operation
CN111667542B (en) Decompression technique for processing compressed data suitable for artificial neural network
CN112801855B (en) Method and device for scheduling rendering task based on graphics primitive and storage medium
US20140055465A1 (en) Method and system for coordinated data execution using a primary graphics processor and a secondary graphics processor
CN116263982B (en) Graphics processor, system, method, electronic device and apparatus
CN111798361A (en) Rendering method, rendering device, electronic equipment and computer-readable storage medium
WO2023202367A1 (en) Graphics processing unit, system, apparatus, device, and method
CN115375821A (en) Image rendering method and device and server
CN112801856A (en) Data processing method and device
CN116263981B (en) Graphics processor, system, apparatus, device, and method
WO2023202365A1 (en) Graphics processing unit, system and method, and apparatus and device
WO2023202366A1 (en) Graphics processing unit and system, electronic apparatus and device, and graphics processing method
CN116957898A (en) Graphics processor, system, method, electronic device and electronic equipment
KR102111740B1 (en) Method and device for processing image data
WO2022089504A1 (en) Data processing method and related apparatus
WO2009145919A1 (en) Shader complex with distributed level one cache system and centralized level two cache

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination