CN116917925A

CN116917925A - Graphics processor, image processing method, and electronic apparatus

Info

Publication number: CN116917925A
Application number: CN202180093752.5A
Authority: CN
Inventors: 赵学军; 张雷; 肖潇
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-02-19
Filing date: 2021-02-19
Publication date: 2023-10-20
Also published as: WO2022174395A1

Abstract

The application discloses a graphics processor, an image processing method and electronic equipment, relates to the field of graphics processing, and is used for shortening the time for processing images by a multi-core GPU. The graphics processor includes: a polygon list generator, a task scheduler, and a plurality of kernels; the polygon list generator is used for acquiring a plurality of blocks of a frame of image and a polygon list corresponding to each block, and calculating the load capacity of each block, wherein the polygon list indicates polygons included in the corresponding blocks; the task scheduler is used for scheduling the multiple kernels to process the polygon list corresponding to the multiple blocks according to the order of the load quantity from large to small so as to draw the image.

Description

Graphics processor, image processing method, and electronic apparatus

Technical Field

The present application relates to the field of graphics processing, and in particular, to a graphics processor, an image processing method, and an electronic device.

Background

Graphics processors (graphics processing unit, GPUs) are increasingly used in personal computers, workstations, gaming machines, and some mobile devices (e.g., tablet computers, smartphones, etc.). The GPU of the application specific integrated circuit is used for accelerating the operation of images and graphics, so that the processing speed can be improved. In addition, in order to improve the parallel processing capability of the GPU, the GPU may include a plurality of cores, and the GPU may divide a frame of image into a plurality of blocks, and each core processes different blocks randomly or according to the order of the blocks, so that the processing speed of a frame of image may be improved.

However, since the processing complexity of each block is different, there is a problem that the processing time of each core is unbalanced, so that the processing time of the whole GPU for one frame of image is long.

Disclosure of Invention

The embodiment of the application provides a GPU, an image processing method and electronic equipment, which are used for shortening the time for processing images by the multi-core GPU.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical scheme:

in a first aspect, there is provided a graphics processor comprising: a polygon list generator, a task scheduler, and a plurality of kernels; the polygon list generator is used for acquiring a plurality of blocks of a frame of image and a polygon list corresponding to each block, and calculating the load capacity of each block, wherein the polygon list indicates polygons included in the corresponding blocks; the task scheduler is used for scheduling the multiple kernels to process the polygon list corresponding to the multiple blocks according to the order of the load quantity from large to small so as to draw the image.

According to the graphics processor provided by the embodiment of the application, the polygonal list generator in the graphics processor can also obtain the load of each block when generating the polygonal list, the task scheduler in the graphics processor can allocate the load of each block to a plurality of cores in sequence from large to small, so that the blocks with larger load are preferentially processed, the blocks with smaller load can be allocated more uniformly, the processing time of each core is averaged as much as possible, the problem of unbalanced processing time of the graphics processor with multiple cores is solved, and the time of processing images by the graphics processor with multiple cores is shortened.

In one possible embodiment, the load amount of each block includes at least one of the following information: the number of polygons a in each block, the complexity B of the drawing instruction for each point of the polygon in each block.

In one possible embodiment, the load amount of each block is a×w1+b×w2, where W1 and W2 are weight values. W1 is 0 to represent the complexity B of the drawing instructions for each point of the polygon in each block, W2 is 0 to represent the number a of polygons in each block, and W1 and W2 are other values to represent the combination of the number a of polygons in each block and the complexity B of the drawing instructions for each point of the polygon in each block, in proportion to the number of the polygons in each block. W1 and W2 may be non-negative values.

In one possible implementation manner, the task scheduler is configured to schedule the multiple cores to process the polygon list corresponding to the multiple blocks according to the order of the load from large to small, and includes: and the task scheduler schedules idle cores in the cores to process polygon lists corresponding to the blocks according to the order of the load from large to small. The principle that this embodiment can promote balancing of cores is that: and as long as the idle kernel exists, the block with the largest load capacity in the rest blocks is preferentially processed, the maximization of the processing capacity of each kernel is realized, and the balance of each kernel is achieved.

In one possible implementation manner, the task scheduler is configured to schedule the multiple cores to process the polygon list corresponding to the multiple blocks according to the order of the load from large to small, and includes: the task scheduler alternately schedules a plurality of cores to process polygon lists corresponding to a plurality of blocks according to the order of the load capacity from large to small; the alternate scheduling refers to the sequence of the task scheduler in which the task scheduler performs the reverse replacement of the scheduling kernel relative to the previous round when the task scheduler performs the scheduling kernel in two adjacent rounds. The principle that this embodiment can promote balancing of cores is that: taking the core 0 as an example, the core 0 processes the block with the largest load in the first round, and the core 0 processes the block with the smallest load in the second round, so that the average processing time of each core in two adjacent rounds is relatively close, and the processing time of each core after the average of multiple rounds can reach relative balance.

In one possible implementation manner, the task scheduler is configured to schedule the multiple cores to process the polygon list corresponding to the multiple blocks according to the order of the load from large to small, and includes: the task scheduler is used for scheduling a plurality of cores to process the polygon list corresponding to the N blocks with the largest load capacity first and process the polygon list corresponding to other blocks later. The principle that this embodiment can promote balancing of cores is that: because the blocks with larger load are concentrated in the Top N, the balance of each kernel is less affected when the blocks with other small load are distributed sequentially or randomly. The task scheduler can be used for preferentially distributing the blocks of the Top N and then distributing other blocks, and the balance of each kernel can be achieved.

In a second aspect, there is provided an image processing method applied to the graphics processor according to the first aspect and any implementation thereof, the method including: acquiring a plurality of blocks of a frame of image and a polygon list corresponding to each block, and calculating the load capacity of each block, wherein the polygon list indicates polygons included in the corresponding blocks; and according to the order of the load quantity from large to small, a plurality of kernels of the graphic processor are scheduled to process the polygon list corresponding to the plurality of blocks so as to draw the image.

In one possible embodiment, the load amount of each block is a×w1+b×w2, where W1 and W2 are weight values.

In one possible implementation manner, the scheduling the multiple cores to process the polygon list corresponding to the multiple blocks according to the order of the load from large to small includes: and scheduling idle cores in the cores to process the polygon list corresponding to the blocks according to the order of the load from large to small.

In one possible implementation manner, the scheduling the multiple cores to process the polygon list corresponding to the multiple blocks according to the order of the load from large to small includes: alternately scheduling a plurality of cores to process polygon lists corresponding to a plurality of blocks according to the order of the load capacity from large to small; the alternate scheduling refers to the sequence of the reverse replacement of the scheduling kernel of the next round relative to the previous round when the kernels are scheduled in two adjacent rounds.

In one possible implementation manner, the scheduling the multiple cores to process the polygon list corresponding to the multiple blocks according to the order of the load from large to small includes: and scheduling a plurality of kernels to firstly process polygon lists corresponding to N blocks with the largest load capacity, and then processing polygon lists corresponding to other blocks.

In a third aspect, there is provided an electronic device comprising a graphics processor as described in the first aspect and any implementation thereof, and a display screen, the graphics processor being configured to render an image on the display screen.

In a fourth aspect, a computer readable storage medium having instructions stored therein that run on a graphics processor to cause the graphics processor to perform a method according to the second aspect and any of its embodiments is provided.

In a fifth aspect, there is provided a computer program product comprising instructions that run on a graphics processor to cause the graphics processor to perform a method as in the second aspect and any of its embodiments.

The technical effects described with reference to the first aspect and any of its embodiments with respect to the technical effects of the second aspect to the fifth aspect are not repeated here.

Drawings

Fig. 1 is a schematic structural diagram of a GPU according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a block of a frame image and polygons in the block according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a core imbalance of a GPU according to an embodiment of the present application;

FIG. 4 is a schematic diagram of another GPU according to an embodiment of the present application;

fig. 5 is a schematic flow chart of an image processing method according to an embodiment of the present application;

fig. 6 is a schematic diagram of a task scheduler according to an embodiment of the present application for scheduling multiple cores according to the load amounts of multiple blocks;

FIG. 7 is a schematic diagram of another task scheduler according to an embodiment of the present application for scheduling multiple cores according to the load amounts of multiple blocks;

fig. 8 is a schematic structural diagram of an electronic device including the GPU according to an embodiment of the present application.

Detailed Description

One possible structure of a GPU is first described in connection with fig. 1. The GPU can be applied to devices such as vehicle-mounted instrument panels, intelligent watches, mobile phones, personal computers, tablet personal computers and the like.

As shown in fig. 1, the GPU includes: a polygon list generator (polygon list generator, PLG) 101, a level two cache (L2) 102, a task scheduler 103, and a plurality of cores (also referred to as GPU cores) 104.

The functions of each module in the GPU are described below in connection with the process by which the GPU draws an image:

a frame of an image may be considered to be made up of multiple polygons (e.g., points, lines, triangles, etc.), which may be drawn when the GPU draws all of the polygons together and spells the image together. At present, one type of GPU comprises two process stages when processing a frame of image: a blocking process (binding Pass) and a Rendering process (Rendering Pass).

As shown in fig. 2, in the deblocking process, PLG 101 slices an input frame image into a plurality of blocks (bins) of the same size and shape, each of which includes a plurality of polygons (e.g., triangles) each of which has a vertex of one pixel. PLG 101 obtains a corresponding polygon list (polygon list) for each block, and buffers the position information of each block and the polygon list corresponding to each block into L2 102. Wherein the location information of the block may include coordinates of the block or a serial number of the block. The polygon list indicates information of polygons included in the corresponding block, taking a triangle as an example, and may include numbers, coordinates, colors, transparency, etc. of respective vertices of the triangle.

In the rendering process, the task scheduler 103 schedules the plurality of cores 104 to process the plurality of blocks, respectively, that is, after acquiring the position information of each block from the L2 102, distributing the position information to each core 104. Each core 104 is allocated with a certain number of blocks, each core 104 obtains a polygon list corresponding to the block from the L2 102 according to the position information of the allocated block, and draws the polygon in the block in the display screen according to the polygon list, and after the cores 104 process all the blocks, drawing a frame of image on the display screen can be completed.

Currently, when a task scheduler allocates blocks to a plurality of cores, the task scheduler allocates blocks to the cores sequentially according to the position information of the blocks, or allocates the blocks to the cores randomly. This may possibly result in a block with a large computational complexity (or a large amount of load) being centrally allocated to some cores, or a block being scheduled to a core at the end of a frame of image, resulting in smearing, such that the amount of load and processing time handled between cores are unbalanced, resulting in a long processing time of the GPU for a frame of image. Through actual measurement, the Benchmark performance test and the game performance can be affected to a certain extent.

For example, as shown in fig. 3, assuming that the GPU has 4 cores, a frame of image is divided into 16 blocks, the task scheduler sequentially allocates block 0 to core 0, block 1 to core 1, block 2 to core 2, block 3 to core 3, block 4 to core 0, and so on in the order of the position information of the blocks. Assuming that the block 13 with greater computational complexity (or heavier load) is scheduled to be processed on the core 1 at the end of a frame of image, other cores are already processed, and the block 13 still needs to be processed for a relatively long time, the processing load and the processing time are unbalanced, so that the processing time of the GPU on a frame of image is longer.

To this end, as shown in fig. 4, the present application provides another GPU, and PLG 101 may also obtain the amount of load (workload) of each block when generating the polygon list. The task scheduler 103 may allocate the blocks with larger load to the multiple cores 104 in order from large to small load of each block, so that the blocks with larger load may be processed preferentially, and the blocks with smaller load may be allocated more uniformly, so as to average the processing time of each core 104 as much as possible, so as to solve the problem of unbalanced processing time of the GPU with multiple cores, and shorten the time of processing images by the GPU with multiple cores.

Specifically, the GPU performs the image processing method shown in fig. 5, and the method includes:

s501, PLG 101 acquires a plurality of blocks of a frame image and a polygon list corresponding to each block, and calculates the load amount of each block.

This step may be performed in a blocking process (binding Pass). For how PLG 101 obtains a block (Bin), a polygon list (polygon list) corresponding to the block, reference is made to the foregoing description, and will not be repeated here.

The load amount of each block includes at least one of the following information:

the number of polygons a in each block, the complexity B of the drawing instruction for each point (which may be all or part of the points) of the polygon in each block.

That is, the load amount of each block may be a×w1+b×w2, where W1 and W2 are weight values. W1 is 0 to represent the complexity B of the drawing instructions for each point of the polygon in each block, W2 is 0 to represent the number a of polygons in each block, and W1 and W2 are other values to represent the combination of the number a of polygons in each block and the complexity B of the drawing instructions for each point of the polygon in each block, in proportion to the number of the polygons in each block. W1 and W2 may be non-negative values.

For the number of polygons in each block, exemplary, as shown in fig. 2, the number of polygons in the block in the upper left corner is 2, the number of polygons in the block in the upper right corner is 3, the number of polygons in the block in the lower left corner is 1, and the number of polygons in the block in the lower right corner is 2.

For the complexity of the drawing instructions of the points of the polygon in each block, since the colors and the transparencies of the points of the polygon may not be identical, the complexity of the corresponding drawing instructions is also not identical, for example, the complexity of the drawing instructions of the points with a certain transparence is higher than that of the drawing instructions of the points without transparence, and the complexity of the drawing instructions of the color is higher than that of the drawing instructions of the black and white.

A first buffer may be added to PLG 101 to store a count of the number of polygons in each block. PLG 101 may further include a second buffer for storing a count value of the complexity of the drawing instruction for each point of the polygon in each block. PLG 101 calculates the load amount of each block as follows:

for each polygon in a block to be processed, PLG 101 may retrieve a count value of the number of polygons in the corresponding block from the first cache, and if (hit) is retrieved, increment the count value by 1. If not (miss), a count value of the number of polygons in the corresponding block is newly built in the first cache, and is assigned to 1.

Similarly, for each polygon in a block being processed, PLG 101 may retrieve a count value of the complexity of the drawing instructions for each point of the polygon in the corresponding block from the second cache, and if (hit) is retrieved, add the count value to the complexity of the drawing instructions for each point of the newly added polygon. If not (miss), a count value of the complexity of the drawing instruction of each point of the polygon in the corresponding block is newly built in the second cache, and the complexity of the drawing instruction of each point of the polygon which is newly added is assigned.

After counting, PLG 101 may buffer the count value in the first buffer or the count value in the second buffer to L2 102 as the load amount of each block, or PLG 101 may buffer the count values in the first buffer and the second buffer to L2 102 according to the foregoing formula a×w1+b×w2 to obtain the load amounts of each block.

Optionally, PLG 101 may also order the blocks according to their load, e.g., from large to small. The application is not limited to the algorithm used for sorting, and can be, for example, a bubbling sorting algorithm, a selection sorting algorithm, a rapid sorting algorithm, a merging sorting algorithm, a heap sorting algorithm, a Top N sorting algorithm, and the like.

For example, taking Top N ordering for each block as an example, N groups of registers may be newly added in PLG 101, where each group of registers is used to store the load of one block, PLG 101 compares the load of the compared block with the load of N registers, and if the load is greater than any load of N registers, replaces the smallest load of N registers with the load of the compared block. And finally, sequencing the loading capacity of the blocks in the N groups of registers to obtain the loading capacity of the blocks sequenced by Top N.

Note that N may be equal to the number of cores in the GPU. This is done over multiple Top N ranks.

Optionally, PLG 101 may also buffer the load amount of each block, the location information of each block, and the polygon list corresponding to each block in the ranked order into L2 102.

S502, the task scheduler 103 schedules the kernels 104 to process polygon lists corresponding to the blocks according to the order of the load amount of each block from large to small so as to draw the image.

This step may be performed in the Rendering process (Rendering Pass).

The task scheduler 103 may read the load amounts of the plurality of blocks and the location information from the L2 102. The task scheduler 103 may send the position information of the plurality of blocks to the plurality of cores 104 in order of increasing the load amount of the plurality of blocks, and the plurality of cores 104 obtain the polygon list corresponding to the allocated blocks from the L2 102 according to the position information of the allocated blocks, where the plurality of cores 104 process the polygon list corresponding to the plurality of blocks refers to: the multiple kernels 104 draw the polygons in the blocks in the display screen according to the polygon list, and the multiple kernels 104 can finish drawing a frame of image on the display screen after processing all the blocks.

Alternatively, the task scheduler 103 may also order the blocks according to their load amounts. The content of ordering each block according to the amount of load of each block may be specifically referred to the previous PLG 101 and will not be repeated here. It should be noted that only one of the task scheduler 103 or PLG 101 is required to order each block according to the load amount of each block.

In one possible implementation, the task scheduler 103 may schedule the cores in the plurality of cores 104 to process the polygon list corresponding to the plurality of blocks in order of the load amount of the plurality of blocks from large to small.

Illustratively, as shown in fig. 6, in a of fig. 6 (i.e., fig. 3 above), the task scheduler 103 allocates blocks to the cores in the order of the position information of the blocks. In fig. 6B, the task scheduler 103 allocates blocks to the free cores in order of the large-to-small load amounts of the plurality of blocks. Wherein, each block is ordered from big to small into a block 13, a block 1, a block 2, a block 3, a block 5, a block 6, a block 7, a block 9, a block 10, a block 11, a block 0, a block 14, a block 15, a block 4, a block 8 and a block 12 according to the load quantity. In the first round, the task scheduler 103 sequentially distributes the blocks 13, 1, 2 and 3 to the cores 0-3, i.e. schedules the cores 0-3 to sequentially process the polygon list corresponding to the blocks 13, 1, 2 and 3; in the second round, since the kernel 0 has not been processed yet, the other kernels are idle, so the task scheduler 103 allocates the blocks 5, 6, 7 to the idle kernels 1-3 in sequence, i.e. schedules the kernels 1-3 to process the polygon list corresponding to the blocks 5, 6, 7 in sequence; in the third round, since the core 0 has not been processed yet, the other cores are idle, the task scheduler 103 allocates the blocks 9, 10, 11 to the idle cores 1-3 in sequence, i.e. schedules the cores 1-3 to process the polygon list corresponding to the blocks 9, 10, 11 in sequence; on the fourth round, the cores 0-3 are idle, so the task scheduler 103 allocates the blocks 4, 8, 12 to the idle cores 0-2 in turn, i.e. schedules the cores 0-2 to process the polygon list corresponding to the blocks 4, 8, 12 in turn. It can be seen that B in fig. 6 can save a lot of time relative to a in fig. 6.

The kernel 104 may send an indication (e.g., an interrupt) to the task scheduler 103 when the polygon list corresponding to a block has been or will be processed, to request allocation of a new block, i.e., to request scheduling of the polygon list corresponding to the new block. Alternatively, the task scheduler 103 may query each core in a polling manner to determine whether the polygon list corresponding to a block is processed, and if so, the task scheduler 103 allocates a new block to the core, i.e. schedules the core to process the polygon list corresponding to the new block. Both of these modes can make full use of the processing capability of the kernel, reduce the occurrence of the case where the kernel is idle between the processing of the block (corresponding to the case where the processing is completed in the foregoing), or make the case where the kernel is idle between the processing of the block and the block (corresponding to the case where the processing is completed in the foregoing).

The principle that this embodiment can promote balancing of cores is that: and as long as the idle kernel exists, the block with the largest load capacity in the rest blocks is preferentially processed, the maximization of the processing capacity of each kernel is realized, and the balance of each kernel is achieved.

In another possible implementation, the task scheduler 103 may alternately schedule the plurality of cores 104 to process the polygon list corresponding to the plurality of blocks in order of increasing load amounts of the plurality of blocks. Alternate scheduling refers to the sequence in which the task scheduler 103 changes the scheduling kernel in the next round relative to the previous round when scheduling kernels in two adjacent rounds.

Illustratively, as shown in fig. 7, in a of fig. 7, the task scheduler 103 allocates kernels in the order of the position information of the blocks. In fig. 7B, the task scheduler 103 allocates blocks alternately to cores in order of the large-to-small load amounts of the plurality of blocks. Wherein, each block is ordered from big to small into a block 9, a block 13, a block 5, a block 1, a block 12, a block 10, a block 8, a block 4, a block 0, a block 2, a block 3, a block 6, a block 7, a block 11, a block 14 and a block 15 according to the load quantity. In the first round, the task scheduler 103 sequentially distributes the blocks 9, 13, 5 and 1 to the cores 0-3, i.e. schedules the cores 0-3 to sequentially process the polygon list corresponding to the blocks 9, 13, 5 and 1; in the second round, the task scheduler 103 changes the order of the scheduling kernels reversely relative to the first round, and sequentially distributes the blocks 12, 10, 8 and 4 to the kernel 3-kernel 0, namely, the scheduling kernel 3-kernel 0 sequentially processes the polygon list corresponding to the blocks 12, 10, 8 and 4; in the third round, the task scheduler 103 changes the order of scheduling kernels in a reverse direction relative to the second round, and sequentially distributes the blocks 0, 2, 3 and 6 to the kernels 0-3, i.e. the scheduling kernels 0-3 sequentially process the polygon list corresponding to the blocks 0, 2, 3 and 6; on the fourth round, the task scheduler 103 changes the order of scheduling kernels with respect to the third round, and sequentially allocates the blocks 7, 11, 14, and 15 to the cores 3-0, i.e., the scheduling cores 3-0 sequentially process the polygon list corresponding to the blocks 7, 11, 14, and 15. It can be seen that B in fig. 7 can save a lot of time relative to a in fig. 7.

The principle that this embodiment can promote balancing of cores is that: taking the core 0 as an example, the core 0 processes the block with the largest load in the first round, and the core 0 processes the block with the smallest load in the second round, so that the average processing time of each core in two adjacent rounds is relatively close, and the processing time of each core after the average of multiple rounds can reach relative balance.

In yet another possible implementation, the task scheduler 103 may schedule the multiple kernels 104 to process the polygon list corresponding to the N blocks with the largest load capacity first, and then process the polygon list corresponding to the other blocks.

The principle that this embodiment can promote balancing of cores is that: because the blocks with larger load are concentrated in the Top N, the balance of each kernel is less affected when the blocks with other small load are distributed sequentially or randomly. So that the task scheduler 103 can allocate the Top N block preferentially and then allocate other blocks, and can reach the balance of each kernel.

If the load amounts of the two blocks are the same, the task scheduler 103 may schedule the idle cores of the plurality of cores to process the polygon list corresponding to the plurality of blocks in the order of the position information of the plurality of blocks.

According to the GPU and the image processing method provided by the embodiment of the application, the PLG in the GPU can also obtain the load of each block when generating the polygon list, the task scheduler in the GPU can be distributed to a plurality of cores according to the order of the load of each block from large to small, so that the blocks with larger load are preferentially processed, the blocks with smaller subsequent load can be distributed more uniformly, the processing time of each core is averaged as much as possible, the problem of unbalanced processing time of the multi-core GPU is solved, and the time of processing images by the multi-core GPU is shortened.

As shown in FIG. 8, an embodiment of the present application also provides an electronic device 80 comprising a GPU 801 and a display screen 802 as described above, the GPU 801 being configured to draw images on the display screen 802.

Embodiments of the present application also provide a computer-readable storage medium having instructions stored therein that run on a GPU to cause the GPU to perform the corresponding method of fig. 5.

Embodiments of the present application also provide a computer program product comprising instructions that run on a GPU to cause the GPU to perform the corresponding method of fig. 5.

Technical effects of the electronic device, the computer-readable storage medium, and the computer program product according to the embodiments of the present application refer to the technical effects described above with respect to the GPU and the image processing method, and are not repeated here.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, indirect coupling or communication connection of devices or units, electrical, mechanical, or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (Digital Subscriber Line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more servers, data centers, etc. that can be integrated with the medium. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A graphics processor, comprising: a polygon list generator, a task scheduler, and a plurality of kernels;

the polygon list generator is used for acquiring a plurality of blocks of a frame of image and a polygon list corresponding to each block, and calculating the load capacity of each block, wherein the polygon list indicates polygons included in the corresponding blocks;

and the task scheduler is used for scheduling the multiple kernels to process the polygon list corresponding to the multiple blocks according to the order of the load capacity from large to small so as to draw the image.
The graphics processor of claim 1, wherein the amount of load per block comprises at least one of:

the number of polygons a in each block, the complexity B of the drawing instruction for each point of the polygon in each block.
The graphics processor of claim 2, wherein the load amount of each block is a x w1+b x W2, wherein W1 and W2 are weight values.
A graphics processor as claimed in any one of claims 1-3, wherein the task scheduler is configured to schedule the plurality of cores to process the polygon list corresponding to the plurality of blocks according to the order of the load amount from large to small, including:

and the task scheduler schedules idle cores in the cores to process polygon lists corresponding to the blocks according to the order of the load capacity from large to small.
A graphics processor as claimed in any one of claims 1-3, wherein the task scheduler is configured to schedule the plurality of cores to process the polygon list corresponding to the plurality of blocks according to the order of the load amount from large to small, including:

the task scheduler alternately schedules a plurality of cores to process polygon lists corresponding to a plurality of blocks according to the order of the load quantity from large to small; the alternate scheduling refers to the sequence of the task scheduler in which the task scheduler performs the reverse replacement of the scheduling kernel relative to the previous round when the task scheduler performs the scheduling kernel in two adjacent rounds.
A graphics processor as claimed in any one of claims 1-3, wherein the task scheduler is configured to schedule the plurality of cores to process the polygon list corresponding to the plurality of blocks according to the order of the load amount from large to small, including:

the task scheduler is used for scheduling a plurality of kernels to process polygon lists corresponding to N blocks with the largest load capacity first and process polygon lists corresponding to other blocks later.
An image processing method, comprising:

acquiring a plurality of blocks of a frame of image and a polygon list corresponding to each block, and calculating the load capacity of each block, wherein the polygon list indicates polygons included in the corresponding blocks;

and according to the order of the load quantity from large to small, a plurality of kernels of the graphic processor are scheduled to process the polygon list corresponding to the blocks so as to draw the image.
The method of claim 7, wherein the amount of load per block comprises at least one of the following information:

the number of polygons a in each block, the complexity B of the drawing instruction for each point of the polygon in each block.
The method of claim 8, wherein the loading of each block is a x w1+b x W2, wherein W1 and W2 are weight values.
The method according to any one of claims 7-9, wherein the scheduling the plurality of cores to process the polygon list corresponding to the plurality of blocks according to the order of the load amount from large to small includes:

and scheduling idle cores in the cores to process polygon lists corresponding to the blocks according to the order of the load from large to small.
The method according to any one of claims 7-9, wherein the scheduling the plurality of cores to process the polygon list corresponding to the plurality of blocks according to the order of the load amount from large to small includes:

alternately scheduling a plurality of cores to process polygon lists corresponding to a plurality of blocks according to the order of the load quantity from large to small; the alternate dispatching refers to the sequence of the reverse replacement of the dispatching cores of the next round relative to the previous round when the cores are dispatched in two adjacent rounds.
The method according to any one of claims 7-9, wherein the scheduling the plurality of cores to process the polygon list corresponding to the plurality of blocks according to the order of the load amount from large to small includes:

and scheduling a plurality of kernels to firstly process polygon lists corresponding to N blocks with the largest load capacity, and then processing polygon lists corresponding to other blocks.
An electronic device comprising a graphics processor as claimed in any one of claims 1-6 and a display screen, the graphics processor being arranged to draw an image on the display screen.