WO2022174395A1 - 图形处理器、图像处理方法和电子设备 - Google Patents

图形处理器、图像处理方法和电子设备 Download PDF

Info

Publication number
WO2022174395A1
WO2022174395A1 PCT/CN2021/076921 CN2021076921W WO2022174395A1 WO 2022174395 A1 WO2022174395 A1 WO 2022174395A1 CN 2021076921 W CN2021076921 W CN 2021076921W WO 2022174395 A1 WO2022174395 A1 WO 2022174395A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
load
polygon
blocks
task scheduler
Prior art date
Application number
PCT/CN2021/076921
Other languages
English (en)
French (fr)
Inventor
赵学军
张雷
肖潇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2021/076921 priority Critical patent/WO2022174395A1/zh
Priority to CN202180093752.5A priority patent/CN116917925A/zh
Publication of WO2022174395A1 publication Critical patent/WO2022174395A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Definitions

  • the present application relates to the field of graphics processing, and in particular, to a graphics processor, an image processing method and an electronic device.
  • GPUs graphics processing units
  • the GPU can include multiple cores, and the GPU can divide a frame of image into multiple blocks, and each core processes different blocks randomly or in the order of the blocks, thereby improving the processing speed of a frame of image.
  • Embodiments of the present application provide a GPU, an image processing method, and an electronic device, which are used to shorten the time for a multi-core GPU to process images.
  • a graphics processor including: a polygon list generator, a task scheduler, and multiple kernels; the polygon list generator is used to obtain multiple blocks of a frame of image and a polygon list corresponding to each block, and Calculate the load of each block, wherein the polygon list indicates the polygons included in the corresponding block; the task scheduler is used to schedule multiple kernels to process the polygon lists corresponding to multiple blocks according to the order of the load from large to small to draw images .
  • the load amount of each block can also be obtained, and the task scheduler in the graphics processor can obtain the load amount of each block from It is assigned to multiple cores in an order from large to small, so that blocks with a larger load are processed first, and subsequent blocks with a smaller load can be allocated more evenly to try to average the processing time of each core to solve the problem of multi-core.
  • the problem of unbalanced graphics processor processing time shortens the time for multi-core graphics processors to process images.
  • the load amount of each block includes at least one item of the following information: the number A of polygons in each block, and the complexity B of drawing instructions for each point of the polygons in each block.
  • the load of each block is A*W1+B*W2, where W1 and W2 are weight values.
  • W1 it means that the load of each block is the complexity B of the drawing instructions of each point of the polygon in each block.
  • W2 When W2 is 0, it means that the load of each block is the number of polygons in each block.
  • W1 and W2 can be non-negative values.
  • the task scheduler is configured to schedule multiple kernels to process polygon lists corresponding to multiple blocks according to the order of the load from the largest to the smallest, including: the task scheduler is in the order of the load from the largest to the smallest. , schedules idle cores among multiple cores to process polygon lists corresponding to multiple blocks.
  • the principle that this embodiment can promote the balance of each core is that as long as there are idle cores, the block with the largest load among the remaining blocks is preferentially processed, thereby maximizing the processing capacity of each core and achieving the balance of each core.
  • the task scheduler is configured to schedule multiple kernels to process polygon lists corresponding to multiple blocks according to the order of the load from the largest to the smallest, including: the task scheduler is in the order of the load from the largest to the smallest. , alternately schedule multiple kernels to process polygon lists corresponding to multiple blocks; wherein, alternate scheduling means that when the task scheduler schedules kernels in two adjacent rounds, the order of scheduling kernels in the latter round is reversed relative to the previous round.
  • the principle that this embodiment can promote the balance of each core is: taking core 0 as an example, in the first round, core 0 processes the block with the largest load in the first round, and in the second round, core 0 processes the block with the least load in the second round. Therefore, the average processing time of each kernel in two adjacent rounds is relatively close, and the average processing time of each kernel will reach a relative balance after multiple rounds.
  • the task scheduler is configured to schedule multiple cores to process polygon lists corresponding to multiple blocks according to the order of load from large to small, including: the task scheduler is configured to schedule multiple cores to process the load first
  • the polygon lists corresponding to the largest N blocks, and the polygon lists corresponding to other blocks are post-processed.
  • the principle that this embodiment can promote the balance of each core is that because the blocks with a larger load are concentrated in the Top N, the sequential allocation or random allocation of other blocks with a small load has little impact on the balance of the cores.
  • the task scheduler can give priority to allocating Top N blocks, and then allocate other blocks, which can also achieve the balance of each core.
  • an image processing method is provided, which is applied to the graphics processor according to the first aspect and any one of the embodiments thereof, the method includes: acquiring a plurality of blocks of a frame of image and a polygon corresponding to each block list, and calculate the load amount of each block, where the polygon list indicates the polygons included in the corresponding block; according to the load amount from large to small, schedule multiple cores of the graphics processor to process the polygon lists corresponding to the multiple blocks, to draw the image.
  • the load amount of each block includes at least one item of the following information: the number A of polygons in each block, and the complexity B of drawing instructions for each point of the polygons in each block.
  • the load of each block is A*W1+B*W2, where W1 and W2 are weight values.
  • scheduling multiple cores to process polygon lists corresponding to multiple blocks according to the descending order of the load including: scheduling idlers in the multiple cores in descending order of the load
  • the kernel processes a list of polygons corresponding to multiple blocks.
  • scheduling multiple kernels to process polygon lists corresponding to multiple blocks according to the descending order of the load including: alternately scheduling multiple kernels to process the polygon lists in descending order of the load List of polygons corresponding to multiple blocks; among them, alternate scheduling means that when two adjacent rounds of kernels are scheduled, the order of scheduling kernels is reversed in the latter round relative to the previous round.
  • scheduling multiple kernels to process polygon lists corresponding to multiple blocks according to the descending order of load including: scheduling multiple cores to process polygons corresponding to N blocks with the largest load first List, post-processing the polygon lists corresponding to other blocks.
  • an electronic device comprising the graphics processor and a display screen according to the first aspect and any of the embodiments thereof, where the graphics processor is used to draw an image on the display screen.
  • a computer-readable storage medium where instructions are stored in the computer-readable storage medium, and the instructions are executed on a graphics processor, so that the graphics processor executes the operations corresponding to the second aspect and any one of the embodiments thereof. method.
  • a computer program product comprising instructions, the instructions are executed on a graphics processor, so that the graphics processor executes the method according to the second aspect and any one of the embodiments thereof.
  • FIG. 1 is a schematic structural diagram of a GPU according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a block of a frame of image and a polygon in the block provided by an embodiment of the present application;
  • FIG. 3 is a schematic diagram of the imbalance of each core of a GPU according to an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of another GPU provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a task scheduler scheduling multiple cores according to the load of multiple blocks according to an embodiment of the present application
  • FIG. 7 is a schematic diagram of another task scheduler scheduling multiple cores according to the loads of multiple blocks according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device including the above-mentioned GPU according to an embodiment of the present application.
  • the GPU can be applied to in-vehicle dashboards, smart watches, mobile phones, personal computers, tablet computers and other devices.
  • the GPU includes: a polygon list generator (PLG) 101 , a level two cache (L2) 102 , a task scheduler 103 and a plurality of cores (also called GPU cores) 104 .
  • PAG polygon list generator
  • L2 level two cache
  • task scheduler a task scheduler
  • plurality of cores also called GPU cores
  • a frame of image can be considered to be composed of multiple polygons (such as points, lines, triangles, etc.), and when the GPU draws all these polygons and puts them together, a frame of image can be drawn.
  • a class of GPUs includes two process stages when processing a frame of image: a block process (Binning Pass) and a rendering process (Rendering Pass).
  • the PLG 101 divides an input frame of image into a plurality of blocks (Bin) of the same size and shape, each block includes a plurality of polygons (for example, triangles), each of the polygons A vertex is a pixel.
  • the PLG 101 obtains a corresponding polygon list (polygon list) for each block, and caches the position information of each block and the polygon list corresponding to each block into the L2 102.
  • the location information of the block may include the coordinates of the block or the serial number of the block.
  • the polygon list indicates information of polygons included in the corresponding block. Taking the polygon as a triangle as an example, the polygon list may include the numbers, coordinates, colors, transparency, etc. of each vertex of the triangle.
  • the task scheduler 103 schedules multiple cores 104 to process multiple blocks respectively, that is, after obtaining the location information of each block from the L2 102, it distributes it to each core 104.
  • Each kernel 104 is allocated a certain number of blocks, each kernel 104 obtains a polygon list corresponding to the block from the L2 102 according to the position information of the allocated block, and draws the polygons in the block on the display screen according to the polygon list , the multiple cores 104 can finish drawing a frame of image on the display screen after processing all the blocks.
  • the blocks are allocated to the cores in order according to the position information of the blocks, or the blocks are randomly allocated to the cores. This is likely to cause blocks with larger computational complexity (or heavier workload) to be allocated to some cores, or to be dispatched to cores only at the end of a frame of image, resulting in smearing, which makes each core
  • the amount of processing load and processing time is unbalanced, resulting in a long processing time for one frame of image on the GPU. After the actual measurement, it will have a certain degree of impact on the Benchmark benchmark performance test and game performance.
  • the GPU has 4 cores, and a frame of image is divided into 16 blocks.
  • the task scheduler sequentially assigns block 0 to kernel 0 and block 1 according to the order of the location information of the blocks.
  • the present application provides another GPU, and the PLG 101 can also obtain the workload of each block when generating the polygon list.
  • the task scheduler 103 can allocate the blocks to multiple cores 104 in descending order of the load of each block, so that the block with a larger load is preferentially processed, and the subsequent blocks with a smaller load can be distributed more evenly, so as to minimize the load.
  • the processing time of each core 104 is averaged to solve the problem of unbalanced processing time of multi-core GPUs and shorten the time for multi-core GPUs to process images.
  • the GPU executes the image processing method shown in FIG. 5 , and the method includes:
  • the PLG 101 acquires multiple blocks of a frame of image and a polygon list corresponding to each block, and calculates the load of each block.
  • This step can be performed in a block process (Binning Pass).
  • Binning Pass For how the PLG 101 obtains the block (Bin) and the polygon list (polygon list) corresponding to the block, refer to the previous description, which will not be repeated here.
  • the payload of each block includes at least one of the following information:
  • the load of each block may be A*W1+B*W2, where W1 and W2 are weight values.
  • W1 is 0, it means that the load of each block is the complexity B of the drawing instructions of each point of the polygon in each block.
  • W2 is 0, it means that the load of each block is the number of polygons in each block.
  • W1 and W2 can be non-negative values.
  • the number of polygons in the upper left block is 2, the number of polygons in the upper right block is 3, the number of polygons in the lower left block is 1, and the number of polygons in the lower right corner is 1.
  • the number of polygons in the block is 2.
  • the complexity of the drawing instructions of each point of the polygon in each block since the color and transparency of each point of the polygon may not be exactly the same, the complexity of the corresponding drawing instructions is also not the same, for example, the point with a certain transparency
  • the complexity of the drawing instruction is higher than that of the point without transparency, and the complexity of the colored drawing instruction is higher than that of the black and white drawing instruction.
  • a first buffer may be added to the PLG 101 for storing the count value of the number of polygons in each block.
  • a second cache may also be added to the PLG 101 for storing the count value of the complexity of the drawing instructions for each point of the polygon in each block.
  • the PLG 101 calculates the load per block as follows:
  • the PLG 101 may retrieve a count value of the number of polygons in the corresponding block from the first cache, and if retrieved (hit), increment the count value by 1. If it is not detected (miss), the count value of the number of polygons in the corresponding block is newly created in the first cache, and the value is assigned as 1.
  • the PLG 101 can retrieve the count value of the complexity of the drawing instruction of each point of the polygon in the corresponding block from the second cache. The value plus the complexity of the drawing instructions for each point of the added polygon. If it is not detected (miss), the count value of the complexity of the drawing instruction of each point of the polygon in the corresponding block is newly created in the second cache, and assigned as the complexity of the drawing instruction of each point of the newly added polygon.
  • the PLG 101 may cache the count value in the first cache or the count value in the second cache to the L2 102 as the load of each block, or the PLG 101 may store the first cache and the second cache in the L2 102.
  • the count value of obtains the load of each block according to the formula A*W1+B*W2 described above, and caches the load to L2 102.
  • the PLG 101 may also sort each block according to the load amount of each block, for example, sort each block according to the load amount from large to small.
  • This application does not limit the algorithm used for sorting, for example, it can be a bubble sort algorithm, a selection sort algorithm, a quick sort algorithm, a merge sort algorithm, a heap sort algorithm, a Top N sort, and the like.
  • N groups of registers can be added to the PLG 101, each group of registers is used to store the load of one block, and the PLG 101 compares the load of the compared block with the N registers. If it is greater than any one of the N registers, replace the smallest load in the N registers with the load of the compared block. Finally, sort the loads of the blocks in the N groups of registers, that is, the loads of the blocks sorted by Top N can be obtained.
  • N can be equal to the number of cores in the GPU. This can be done after multiple Top N sorting.
  • the PLG 101 may also cache the load amount of each block, the position information of each block, and the polygon list corresponding to each block into the L2 102 according to the arranged order.
  • the task scheduler 103 schedules the multiple kernels 104 to process the polygon lists corresponding to the multiple blocks according to the descending order of the load of each block, so as to draw the image.
  • This step can be performed in the Rendering Pass.
  • the task scheduler 103 can read the load amount and location information of multiple blocks from the L2 102.
  • the task scheduler 103 can send the location information of the multiple blocks to the multiple cores 104 according to the load of the multiple blocks in descending order.
  • the polygon list corresponding to the block, the multiple kernels 104 processing the polygon list corresponding to the multiple blocks means: the multiple kernels 104 draw the polygons in the block on the display screen according to the polygon list, and after the multiple kernels 104 process all the blocks That is, a frame of image can be drawn on the display screen.
  • the task scheduler 103 may also sort each block according to the load amount of each block. For details, please refer to the content of sorting each block according to the load amount of each block in the previous PLG 101, which will not be repeated here. It should be noted that, only one of the task scheduler 103 or the PLG 101 needs to sort each block according to the load amount of each block.
  • the task scheduler 103 may schedule the idle cores among the multiple cores 104 to process the polygon lists corresponding to the multiple blocks according to the descending order of the loads of the multiple blocks.
  • the task scheduler 103 allocates blocks to each core according to the order of the location information of the blocks.
  • the task scheduler 103 allocates blocks to idle cores in descending order of the loads of the plurality of blocks. Among them, each block is sorted according to the load from large to small as block 13, block 1, block 2, block 3, block 5, block 6, block 7, block 9, block 10, block 11, block 0, block 14, block 15. Block 4, Block 8, Block 12.
  • the task scheduler 103 sequentially assigns block 13, block 1, block 2, and block 3 to kernel 0-kernel 3, that is, schedules kernel 0-kernel 3 to process block 13, block 1, block 2, and block 3 in sequence Corresponding polygon list; in the second round, since core 0 has not been processed yet and other cores are idle, the task scheduler 103 assigns block 5, block 6, and block 7 to the idle cores 1-3 in turn, that is, scheduling core 1 - Kernel 3 sequentially processes the polygon lists corresponding to block 5, block 6, and block 7; in the third round, since kernel 0 has not been processed yet and other cores are idle, task scheduler 103 allocates block 9, block 10, and block 11 in turn For the idle kernel 1-kernel 3, that is, scheduling kernel 1-kernel 3 to process the polygon lists corresponding to block 9, block 10, and block 11 in turn; in the fourth round, kernel 0-kernel 3 are all idle, so the task scheduler 103 will Block 4, block 8, and block 12 are sequentially allocated to idle kernel 0-
  • the kernel 104 may send instruction information (eg, an interrupt) to the task scheduler 103 to request allocation of a new block, that is, request scheduling to process the polygon list corresponding to the new block.
  • instruction information eg, an interrupt
  • the task scheduler 103 may query each kernel whether the polygon list corresponding to a block has been processed in a polling manner, and if the processing is completed or will be completed, the task scheduler 103 allocates a new block to the kernel, that is, schedules the kernel Process the list of polygons corresponding to the new block.
  • Both of these two methods can make full use of the processing power of the kernel, reduce the situation that the kernel is idle between the processing of the block and the block (corresponding to the situation of "processing" in the preceding paragraph), or make the processing of the block and the block. There will be no kernel idle situation (corresponding to the "to be processed” situation in the previous section).
  • this embodiment can promote the balance of each core is that as long as there are idle cores, the block with the largest load among the remaining blocks is preferentially processed, thereby maximizing the processing capability of each core and achieving the balance of each core.
  • the task scheduler 103 may alternately schedule multiple kernels 104 to process polygon lists corresponding to multiple blocks in descending order of the loads of multiple blocks. Alternate scheduling means that when the task scheduler 103 schedules kernels in two adjacent rounds, the order of scheduling kernels in the latter round is reversed relative to the previous round.
  • the task scheduler 103 allocates the cores in the order of the location information of the blocks.
  • the task scheduler 103 alternately allocates blocks to the cores in descending order of the loads of the plurality of blocks. Among them, each block is sorted according to the load from large to small as block 9, block 13, block 5, block 1, block 12, block 10, block 8, block 4, block 0, block 2, block 3, block 6, block 7. Block 11, Block 14, Block 15.
  • the task scheduler 103 assigns block 9, block 13, block 5, and block 1 to kernel 0-kernel 3 in sequence, that is, schedules kernel 0-kernel 3 to process block 9, block 13, block 5, and block 1 in sequence Corresponding polygon list; in the second round, the task scheduler 103 reverses the order of scheduling kernels relative to the first round, and assigns block 12, block 10, block 8, and block 4 to kernel 3-kernel 0 in turn, that is, scheduling Kernel 3-kernel 0 sequentially process the polygon lists corresponding to block 12, block 10, block 8, and block 4; in the third round, the task scheduler 103 reverses the order of scheduling kernels relative to the second round, and assigns block 0, block 2.
  • Block 3 and block 6 are sequentially assigned to kernel 0-kernel 3, that is, the scheduling kernel 0-kernel 3 sequentially processes the polygon lists corresponding to block 0, block 2, block 3, and block 6; in the fourth round, the task scheduler 103 Relative to the third round of reversely changing the order of scheduling kernels, block 7, block 11, block 14, and block 15 are sequentially assigned to kernel 3-kernel 0, that is, the scheduling kernel 3-kernel 0 processes block 7, block 11, block 0 in turn 14.
  • this embodiment can promote the balance of each core is: taking core 0 as an example, in the first round, core 0 processes the block with the largest load in the first round, and in the second round, core 0 processes the block with the least load in the second round. Therefore, the average processing time of each kernel in two adjacent rounds is relatively close, and the average processing time of each kernel will reach a relative balance after multiple rounds.
  • the task scheduler 103 may schedule multiple cores 104 to process the polygon lists corresponding to the N blocks with the largest load first, and then process the polygon lists corresponding to other blocks.
  • This embodiment can promote the balance of each core is that because the blocks with a larger load are concentrated in the Top N, the sequential allocation or random allocation of other blocks with a small load has little impact on the balance of the cores.
  • the task scheduler 103 can give priority to allocating blocks of Top N, and then allocate other blocks, which can also achieve the balance of each core.
  • the task scheduler 103 may schedule an idle core among the multiple cores to process the polygon lists corresponding to the multiple blocks according to the order of the location information of the multiple blocks.
  • the load amount of each block can also be obtained, and the task scheduler in the GPU can calculate the load amount of each block in descending order according to the load amount of each block. Allocate to multiple cores, so that blocks with larger loads are processed first, and subsequent blocks with smaller loads can be distributed more evenly to try to average the processing time of each core to solve the multi-core GPU processing time imbalance The problem is to shorten the time of multi-core GPU processing images.
  • an embodiment of the present application further provides an electronic device 80, including the GPU 801 and the display screen 802 as described above, and the GPU 801 is used to draw an image on the display screen 802.
  • Embodiments of the present application further provide a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and the instructions are executed on the GPU, so that the GPU executes the corresponding method in FIG. 5 .
  • the embodiment of the present application further provides a computer program product including instructions, and the instructions are executed on the GPU, so that the GPU executes the corresponding method in FIG. 5 .
  • the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.
  • the disclosed systems, devices and methods may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • a software program it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server or data center via wired (eg coaxial cable, optical fiber, Digital Subscriber Line, DSL) or wireless (eg infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or data storage devices including one or more servers, data centers, etc. that can be integrated with the medium.
  • the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (eg, a Solid State Disk (SSD)), and the like.
  • a magnetic medium eg, a floppy disk, a hard disk, a magnetic tape
  • an optical medium eg, a DVD
  • a semiconductor medium eg, a Solid State Disk (SSD)

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Generation (AREA)

Abstract

本申请公开了一种图形处理器、图像处理方法和电子设备,涉及图形处理领域,用于缩短多核的GPU处理图像的时间。图形处理器包括:多边形列表产生器、任务调度器和多个内核;多边形列表产生器用于获取一帧图像的多个块以及每个块对应的多边形列表,并计算每个块的负载量,其中,多边形列表指示对应的块包括的多边形;任务调度器用于根据负载量从大到小的顺序,调度多个内核处理多个块对应的多边形列表,以绘制图像。

Description

图形处理器、图像处理方法和电子设备 技术领域
本申请涉及图形处理领域,尤其涉及一种图形处理器、图像处理方法和电子设备。
背景技术
目前,图形处理器(graphics processing unit,GPU)在个人电脑、工作站、游戏机和一些移动设备(如平板电脑、智能手机等)等设备上的使用越来越普遍。使用专用集成电路的GPU对图像和图形的运算进行加速,可以提升处理速度。而且为了提高GPU的并行处理能力,GPU可以包括多个内核,GPU可以将一帧图像分为多个块,各个内核随机或按照块的顺序处理不同块,从而可以提高一帧图像的处理速度。
但是由于各个块的处理复杂度是不同的,所以存在各个内核处理时间不平衡的问题,使得整个GPU对一帧图像的处理时间较长。
发明内容
本申请实施例提供一种GPU、图像处理方法和电子设备,用于缩短多核的GPU处理图像的时间。
为达到上述目的,本申请的实施例采用如下技术方案:
第一方面,提供了一种图形处理器,包括:多边形列表产生器、任务调度器和多个内核;多边形列表产生器用于获取一帧图像的多个块以及每个块对应的多边形列表,并计算每个块的负载量,其中,多边形列表指示对应的块包括的多边形;任务调度器用于根据负载量从大到小的顺序,调度多个内核处理多个块对应的多边形列表,以绘制图像。
本申请实施例提供的图形处理器,图形处理器中的多边形列表产生器在生成多边形列表时,还可以得到各个块的负载量,图形处理器中的任务调度器可以根据各个块的负载量从大到小的顺序分配给多个内核,使得负载量较大的块优先被处理,后续负载量较小的块可以分配得更均匀,以尽量平均化各个内核的处理时间,以解决多内核的图形处理器处理时间不平衡的问题,缩短多核的图形处理器处理图像的时间。
在一种可能的实施方式中,每个块的负载量包括以下信息中的至少一项:每个块中多边形数目A、每个块中多边形的各点的绘制指令的复杂度B。
在一种可能的实施方式中,每个块的负载量为A*W1+B*W2,其中,W1和W2为权重值。W1为0则表示每个块的负载量为每个块中多边形的各点的绘制指令的复杂度B,W2为0则表示每个块的负载量为每个块中多边形数目A,W1和W2为其他值则表示每个块的负载量为每个块中多边形数目A和每个块中多边形的各点的绘制指令的复杂度B按照一定比例的组合。W1和W2可以为非负值。
在一种可能的实施方式中,任务调度器用于根据负载量从大到小的顺序,调度多个内核处理多个块对应的多边形列表,包括:任务调度器按照负载量从大到小的顺序,调度多个内核中空闲的内核处理多个块对应的多边形列表。该实施方式能够促进各内核平衡的原理在于:只要有空闲的内核即优先处理剩余块中负载量最大的块,实现了 各内核处理能力的最大化,达到各内核的平衡。
在一种可能的实施方式中,任务调度器用于根据负载量从大到小的顺序,调度多个内核处理多个块对应的多边形列表,包括:任务调度器按照负载量从大到小的顺序,交替调度多个内核处理多个块对应的多边形列表;其中,交替调度指任务调度器在相邻两轮调度内核时,后一轮相对于前一轮反向更换调度内核的顺序。该实施方式能够促进各内核平衡的原理在于:以内核0为例,在第一轮内核0处理第一轮中负载量最大的块,在第二轮内核0处理第二轮中负载量最小的块,所以各个内核在相邻两轮平均下来处理时间较为接近,经过多轮后平均下来各个内核的处理时间会达到相对平衡。
在一种可能的实施方式中,任务调度器用于根据负载量从大到小的顺序,调度多个内核处理多个块对应的多边形列表,包括:任务调度器用于调度多个内核先处理负载量最大的N个块对应的多边形列表,后处理其他块对应的多边形列表。该实施方式能够促进各内核平衡的原理在于:因为较大的负载量的块集中在Top N中,其他负载量小的块进行顺序分配或随机分配时对各内核平衡影响很小。使得任务调度器可以优先分配Top N的块,后分配其他块,也可以达到各内核的平衡。
第二方面,提供了一种图像处理方法,应用于如第一方面及其任一实施方式所述的图形处理器,该方法包括:获取一帧图像的多个块以及每个块对应的多边形列表,并计算每个块的负载量,其中,多边形列表指示对应的块包括的多边形;根据负载量从大到小的顺序,调度图形处理器的多个内核处理多个块对应的多边形列表,以绘制图像。
在一种可能的实施方式中,每个块的负载量包括以下信息中的至少一项:每个块中多边形数目A、每个块中多边形的各点的绘制指令的复杂度B。
在一种可能的实施方式中,每个块的负载量为A*W1+B*W2,其中,W1和W2为权重值。
在一种可能的实施方式中,根据负载量从大到小的顺序,调度多个内核处理多个块对应的多边形列表,包括:按照负载量从大到小的顺序,调度多个内核中空闲的内核处理多个块对应的多边形列表。
在一种可能的实施方式中,根据负载量从大到小的顺序,调度多个内核处理多个块对应的多边形列表,包括:按照负载量从大到小的顺序,交替调度多个内核处理多个块对应的多边形列表;其中,交替调度指在相邻两轮调度内核时,后一轮相对于前一轮反向更换调度内核的顺序。
在一种可能的实施方式中,根据负载量从大到小的顺序,调度多个内核处理多个块对应的多边形列表,包括:调度多个内核先处理负载量最大的N个块对应的多边形列表,后处理其他块对应的多边形列表。
第三方面,提供了一种电子设备,包括如第一方面及其任一实施方式所述的图形处理器和显示屏,该图形处理器用于在显示屏上绘制图像。
第四方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,指令在图形处理器上运行,使得图形处理器执行如第二方面及其任一实施方式对应的方法。
第五方面,提供了一种包含指令的计算机程序产品,指令在图形处理器上运行, 使得图形处理器执行如第二方面及其任一实施方式对应的方法。
关于第二方面至第五方面的技术效果参照第一方面及其任一实施方式所述的技术效果,在此不再重复。
附图说明
图1为本申请实施例提供的一种GPU的结构示意图;
图2为本申请实施例提供的一种一帧图像的块以及块中多边形的示意图;
图3为本申请实施例提供的一种GPU各核不平衡的示意图;
图4为本申请实施例提供的另一种GPU的结构示意图;
图5为本申请实施例提供的一种图像处理方法的流程示意图;
图6为本申请实施例提供的一种任务调度器根据多个块的负载量调度多个内核的示意图;
图7为本申请实施例提供的另一种任务调度器根据多个块的负载量调度多个内核的示意图;
图8为本申请实施例提供的一种包括上述GPU的电子设备的结构示意图。
具体实施方式
首先结合图1介绍一种GPU的可能结构。该GPU可以应用于车载仪表盘、智能手表、手机、个人电脑、平板电脑等设备。
如图1所示,该GPU包括:多边形列表产生器(polygon list generator,PLG)101、二级缓存(level two cache,L2)102、任务调度器103和多个内核(也称GPU核)104。
下面结合GPU绘制图像的过程来说明GPU中各个模块的功能:
一帧图像可以认为是由多个多边形(例如点、直线、三角形等)组成,当GPU将这些多边形全部绘制出来并且拼在一起,就可以将一帧图像绘制出来。目前一类GPU处理一帧图像时包括两个过程阶段:分块过程(Binning Pass)和渲染过程(Rendering Pass)。
如图2所示,在分块过程中,PLG 101将输入的一帧图像切分成多个大小和形状相同的块(Bin),每个块中包括多个多边形(例如三角形),多边形的每个顶点是一个像素点。PLG 101针对每个块得到对应的多边形列表(polygon list),并将各个块的位置信息以及各个块对应的多边形列表缓存至L2 102中。其中,块的位置信息可以包括块的坐标或者块的序号。多边形列表指示对应的块中包括的多边形的信息,以多边形为三角形为例,多边形列表可以包括三角形的各个顶点的编号、坐标、颜色、透明度等。
在渲染过程中,任务调度器103调度多个内核104对多个块分别进行处理,即从L2 102获取各个块的位置信息后,分发给各个内核104。每个内核104会被分配一定数量的块,各个内核104根据分配的块的位置信息从L2 102获取该块对应的多边形列表,并且根据该多边形列表在显示屏中将该块中的多边形绘制出来,多个内核104处理完所有块后即可以完成在显示屏绘制一帧图像。
目前,任务调度器向多个内核分配块时,是按照块的位置信息来按照顺序分配给各内核,或者,随机将块分配给各内核。这很可能导致计算复杂度较大(或称负载量(workload)较重)的块集中分配给一些内核,或者在一帧图像的结尾块才被调度到 内核从而导致拖尾现象,使得各内核之间处理的负载量和处理时间不平衡,导致GPU对一帧图像的处理时间较长。经过实测,会对Benchmark基准性能测试和游戏性能造成一定程度的影响。
示例性的,如图3所示,假设GPU有4个内核,一帧图像分为16个块,任务调度器按照块的位置信息的顺序,依次将块0分配给内核0,将块1分配给内核1,将块2分配给内核2,将块3分配给内核3,将块4分配给内核0,依此类推。假设计算复杂度较大(或称负载较重)的块13,在一帧图像的结尾才被调度到内核1上处理,导致其他内核已经处理完成,而块13仍需要处理相当长一段时间,处理的负载量和处理时间不平衡,导致GPU对一帧图像的处理时间较长。
为此,如图4所示,本申请提供了另一种GPU,PLG 101在生成多边形列表时,还可以得到各个块的负载量(workload)。任务调度器103可以根据各个块的负载量从大到小的顺序分配给多个内核104,使得负载量较大的块优先被处理,后续负载量较小的块可以分配得更均匀,以尽量平均化各个内核104的处理时间,以解决多内核的GPU处理时间不平衡的问题,缩短多核的GPU处理图像的时间。
具体的,该GPU执行如图5所示的图像处理方法,该方法包括:
S501、PLG 101获取一帧图像的多个块以及每个块对应的多边形列表,并计算每个块的负载量。
本步骤可以在分块过程(Binning Pass)中执行。关于PLG 101如何获取块(Bin)、块对应的多边形列表(polygon list)参照前面描述,在此不再重复。
每个块的负载量包括以下信息中的至少一项:
每个块中多边形数目A、每个块中多边形的各点(可以是全部点或部分点)的绘制指令的复杂度B。
也就是说,每个块的负载量可以为A*W1+B*W2,其中,W1和W2为权重值。W1为0则表示每个块的负载量为每个块中多边形的各点的绘制指令的复杂度B,W2为0则表示每个块的负载量为每个块中多边形数目A,W1和W2为其他值则表示每个块的负载量为每个块中多边形数目A和每个块中多边形的各点的绘制指令的复杂度B按照一定比例的组合。W1和W2可以为非负值。
对于每个块中多边形数目,示例性的,如图2所示,左上角的块中多边形数目为2,右上角的块中多边形数目为3,左下角的块中多边形数目为1,右下角的块中多边形数目为2。
对于每个块中多边形的各点的绘制指令的复杂度,由于多边形的各点的颜色、透明度可以不完全相同,相应的绘制指令的复杂度也不完全相同,例如,有一定透明度的点的绘制指令的复杂度高于无透明度的点的绘制指令,彩色的绘制指令的复杂度高于黑白色的绘制指令的复杂度。
PLG 101中可以新增第一缓存,用于存储每个块中多边形数目的计数值。PLG 101中还可以新增第二缓存,用于存储每个块中多边形的各点的绘制指令的复杂度的计数值。PLG 101计算每个块的负载量的方式如下:
每处理一个块中的一个多边形,PLG 101可以从第一缓存中检索对应的块中多边形数目的计数值,如果检索到(hit),则将该计数值加1。如果未检测到(miss), 则在第一缓存中新建对应的块中多边形数目的计数值,并赋值为1。
同理,每处理一个块中的一个多边形,PLG 101可以从第二缓存中检索对应的块中多边形的各点的绘制指令的复杂度的计数值,如果检索到(hit),则将该计数值加上新增的多边形的各点的绘制指令的复杂度。如果未检测到(miss),则在第二缓存中新建对应的块中多边形的各点的绘制指令的复杂度的计数值,并赋值为新增的多边形的各点的绘制指令的复杂度。
统计完成后,PLG 101可以将第一缓存中的计数值或第二缓存中的计数值缓存至L2 102,作为每个块的负载量,或者,PLG 101可以将第一缓存和第二缓存中的计数值根据前文所述的公式A*W1+B*W2得到各个块的负载量,将该负载量缓存至L2 102。
可选的,PLG 101还可以根据各个块的负载量对各个块进行排序,例如按照负载量从大到小对各个块进行排序。本申请不限定排序所采用的算法,例如可以是冒泡排序算法、选择排序算法、快速排序算法、归并排序算法、堆排序算法、Top N排序等。
示例性的,以对各块进行Top N排序为例,PLG 101中可以新增N组寄存器,每组寄存器用于存储一个块的负载量,PLG 101将被比较块的负载量与N个寄存器中的负载量进行比较,如果大于N个寄存器中的任一个负载量,则用被比较块的负载量替换N个寄存器中的最小的负载量。最后再对N组寄存器中的块的负载量进行排序,即可以得到Top N排序后的块的负载量。
需要说明的是,N可以等于GPU中内核数量。这样经过多次Top N排序即可。
可选的,PLG 101还可以按照排好的顺序,将各个块的负载量、各个块的位置信息以及各个块对应的多边形列表缓存至L2 102中。
S502、任务调度器103根据每个块的负载量从大到小的顺序,调度多个内核104处理多个块对应的多边形列表,以绘制所述图像。
本步骤可以在渲染过程(Rendering Pass)中执行。
任务调度器103可以从L2 102读取多个块的负载量以及位置信息。任务调度器103可以按照多个块的负载量从大到小的顺序,向多个内核104发送多个块的位置信息,由多个内核104根据分配的块的位置信息从L2 102获取分配的块对应的多边形列表,多个内核104处理多个块对应的多边形列表指:多个内核104根据该多边形列表在显示屏中将该块中的多边形绘制出来,多个内核104处理完所有块后即可以完成在显示屏绘制一帧图像。
可选的,任务调度器103也可以根据各个块的负载量对各个块进行排序。具体可以参照前面PLG 101根据各个块的负载量对各个块进行排序的内容,在此不再重复。需要说明的是,任务调度器103或PLG 101中只需要一个根据各个块的负载量对各个块进行排序。
在一种可能的实施方式中,任务调度器103可以按照多个块的负载量从大到小的顺序,调度多个内核104中空闲的内核处理多个块对应的多边形列表。
示例性的,如图6中所示,图6的A(即上面的图3)中,任务调度器103按照块的位置信息的顺序为各内核分配块。图6的B中,任务调度器103按照多个块的负载量从大到小的顺序,为空闲的内核分配块。其中,各块按照负载量从大到小排序为块13、块1、块2、块3、块5、块6、块7、块9、块10、块11、块0、块14、块15、 块4、块8、块12。在第一轮,任务调度器103将块13、块1、块2、块3依次分配给内核0-内核3,即调度内核0-内核3依次处理块13、块1、块2、块3对应的多边形列表;在第二轮,由于内核0尚未处理完,其他内核空闲,所以任务调度器103将块5、块6、块7依次分配给空闲的内核1-内核3,即调度内核1-内核3依次处理块5、块6、块7对应的多边形列表;在第三轮,由于内核0尚未处理完,其他内核空闲,所以任务调度器103将块9、块10、块11依次分配给空闲的内核1-内核3,即调度内核1-内核3依次处理块9、块10、块11对应的多边形列表;在第四轮,内核0-内核3均空闲,所以任务调度器103将块4、块8、块12依次分配给空闲的内核0-内核2,即调度内核0-内核2依次处理块4、块8、块12对应的多边形列表。从中看出,图6中B相对于图6中A可以节省很多时间。
内核104在处理完或将要处理完一个块对应的多边形列表时,可以向任务调度器103发送指示信息(例如中断),以请求分配新的块,即请求调度处理新的块对应的多边形列表。或者,任务调度器103可以采用轮询的方式向各个内核查询是否处理完一个块对应的多边形列表,如果处理完或将要处理完则任务调度器103为该内核分配新的块,即调度该内核处理新的块对应的多边形列表。这两种方式都可以充分利用内核的处理能力,减少在块与块的处理之间产生内核空闲的情况(对应前文中“处理完”的情况),或者,使得在块与块的处理之间不会产生内核空闲的情况(对应前文中“将要处理完”的情况)。
该实施方式能够促进各内核平衡的原理在于:只要有空闲的内核即优先处理剩余块中负载量最大的块,实现了各内核处理能力的最大化,达到各内核的平衡。
在另一种可能的实施方式中,任务调度器103可以按照多个块的负载量从大到小的顺序,交替调度多个内核104处理多个块对应的多边形列表。交替调度指任务调度器103在相邻两轮调度内核时,后一轮相对于前一轮会反向更换调度内核的顺序。
示例性的,如图7中所示,图7的A中,任务调度器103按照块的位置信息的顺序分配内核。图7的B中,任务调度器103按照多个块的负载量从大到小的顺序,交替为内核分配块。其中,各块按照负载量从大到小排序为块9、块13、块5、块1、块12、块10、块8、块4、块0、块2、块3、块6、块7、块11、块14、块15。在第一轮,任务调度器103将块9、块13、块5、块1依次分配给内核0-内核3,即调度内核0-内核3依次处理块9、块13、块5、块1对应的多边形列表;在第二轮,任务调度器103相对于第一轮反向更换调度内核的顺序,将块12、块10、块8、块4依次分配给内核3-内核0,即调度内核3-内核0依次处理块12、块10、块8、块4对应的多边形列表;在第三轮,任务调度器103相对于第二轮反向更换调度内核的顺序,将块0、块2、块3、块6依次分配给内核0-内核3,即调度内核0-内核3依次处理块0、块2、块3、块6对应的多边形列表;在第四轮,任务调度器103相对于第三轮反向更换调度内核的顺序,将块7、块11、块14、块15依次分配给内核3-内核0,即调度内核3-内核0依次处理块7、块11、块14、块15对应的多边形列表。从中看出,图7中B相对于图7中A可以节省很多时间。
该实施方式能够促进各内核平衡的原理在于:以内核0为例,在第一轮内核0处理第一轮中负载量最大的块,在第二轮内核0处理第二轮中负载量最小的块,所以各 个内核在相邻两轮平均下来处理时间较为接近,经过多轮后平均下来各个内核的处理时间会达到相对平衡。
在又一种可能的实施方式中,任务调度器103可以调度多个内核104先处理负载量最大的N个块对应的多边形列表,后处理其他块对应的多边形列表。
该实施方式能够促进各内核平衡的原理在于:因为较大的负载量的块集中在Top N中,其他负载量小的块进行顺序分配或随机分配时对各内核平衡影响很小。使得任务调度器103可以优先分配Top N的块,后分配其他块,也可以达到各内核的平衡。
需要说明的是,如果两个块的负载量相同,则任务调度器103可以按照多个块的位置信息的顺序,调度多个内核中空闲的内核处理多个块对应的多边形列表。
本申请实施例提供的GPU和图像处理方法,GPU中的PLG在生成多边形列表时,还可以得到各个块的负载量,GPU中的任务调度器可以根据各个块的负载量从大到小的顺序分配给多个内核,使得负载量较大的块优先被处理,后续负载量较小的块可以分配得更均匀,以尽量平均化各个内核的处理时间,以解决多内核的GPU处理时间不平衡的问题,缩短多核的GPU处理图像的时间。
如图8所示,本申请实施例还提供了一种电子设备80,包括如前文所述的GPU 801和显示屏802,GPU 801用于在显示屏802上绘制图像。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,指令在GPU上运行,使得GPU执行图5中对应的方法。
本申请实施例还提供了一种包含指令的计算机程序产品,指令在GPU上运行,使得GPU执行图5中对应的方法。
本申请实施例中涉及的电子设备、计算机可读存储介质和计算机程序产品的技术效果参照前文关于GPU和图像处理方法的技术效果,在此不再重复。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显 示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式来实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或者数据中心通过有线(例如同轴电缆、光纤、数字用户线(Digital Subscriber Line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可以用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带),光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (13)

  1. 一种图形处理器,其特征在于,包括:多边形列表产生器、任务调度器和多个内核;
    所述多边形列表产生器用于获取一帧图像的多个块以及每个块对应的多边形列表,并计算每个块的负载量,其中,所述多边形列表指示对应的块包括的多边形;
    所述任务调度器用于根据所述负载量从大到小的顺序,调度所述多个内核处理所述多个块对应的多边形列表,以绘制所述图像。
  2. 根据权利要求1所述的图形处理器,其特征在于,所述每个块的负载量包括以下信息中的至少一项:
    每个块中多边形数目A、每个块中多边形的各点的绘制指令的复杂度B。
  3. 根据权利要求2所述的图形处理器,其特征在于,所述每个块的负载量为A*W1+B*W2,其中,W1和W2为权重值。
  4. 根据权利要求1-3任一项所述的图形处理器,其特征在于,所述任务调度器用于根据所述负载量从大到小的顺序,调度所述多个内核处理所述多个块对应的多边形列表,包括:
    所述任务调度器按照所述负载量从大到小的顺序,调度多个内核中空闲的内核处理多个块对应的多边形列表。
  5. 根据权利要求1-3任一项所述的图形处理器,其特征在于,所述任务调度器用于根据所述负载量从大到小的顺序,调度所述多个内核处理所述多个块对应的多边形列表,包括:
    所述任务调度器按照所述负载量从大到小的顺序,交替调度多个内核处理多个块对应的多边形列表;其中,所述交替调度指所述任务调度器在相邻两轮调度内核时,后一轮相对于前一轮反向更换调度内核的顺序。
  6. 根据权利要求1-3任一项所述的图形处理器,其特征在于,所述任务调度器用于根据所述负载量从大到小的顺序,调度所述多个内核处理所述多个块对应的多边形列表,包括:
    所述任务调度器用于调度多个内核先处理负载量最大的N个块对应的多边形列表,后处理其他块对应的多边形列表。
  7. 一种图像处理方法,其特征在于,包括:
    获取一帧图像的多个块以及每个块对应的多边形列表,并计算每个块的负载量,其中,所述多边形列表指示对应的块包括的多边形;
    根据所述负载量从大到小的顺序,调度图形处理器的多个内核处理所述多个块对应的多边形列表,以绘制所述图像。
  8. 根据权利要求7所述的方法,其特征在于,所述每个块的负载量包括以下信息中的至少一项:
    每个块中多边形数目A、每个块中多边形的各点的绘制指令的复杂度B。
  9. 根据权利要求8所述的方法,其特征在于,所述每个块的负载量为A*W1+B*W2,其中,W1和W2为权重值。
  10. 根据权利要求7-9任一项所述的方法,其特征在于,所述根据所述负载量从大 到小的顺序,调度所述多个内核处理所述多个块对应的多边形列表,包括:
    按照所述负载量从大到小的顺序,调度多个内核中空闲的内核处理多个块对应的多边形列表。
  11. 根据权利要求7-9任一项所述的方法,其特征在于,所述根据所述负载量从大到小的顺序,调度所述多个内核处理所述多个块对应的多边形列表,包括:
    按照所述负载量从大到小的顺序,交替调度多个内核处理多个块对应的多边形列表;其中,所述交替调度指在相邻两轮调度内核时,后一轮相对于前一轮反向更换调度内核的顺序。
  12. 根据权利要求7-9任一项所述的方法,其特征在于,所述根据所述负载量从大到小的顺序,调度所述多个内核处理所述多个块对应的多边形列表,包括:
    调度多个内核先处理负载量最大的N个块对应的多边形列表,后处理其他块对应的多边形列表。
  13. 一种电子设备,其特征在于,包括如权利要求1-6任一项所述的图形处理器和显示屏,所述图形处理器用于在所述显示屏上绘制图像。
PCT/CN2021/076921 2021-02-19 2021-02-19 图形处理器、图像处理方法和电子设备 WO2022174395A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/076921 WO2022174395A1 (zh) 2021-02-19 2021-02-19 图形处理器、图像处理方法和电子设备
CN202180093752.5A CN116917925A (zh) 2021-02-19 2021-02-19 图形处理器、图像处理方法和电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/076921 WO2022174395A1 (zh) 2021-02-19 2021-02-19 图形处理器、图像处理方法和电子设备

Publications (1)

Publication Number Publication Date
WO2022174395A1 true WO2022174395A1 (zh) 2022-08-25

Family

ID=82931906

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/076921 WO2022174395A1 (zh) 2021-02-19 2021-02-19 图形处理器、图像处理方法和电子设备

Country Status (2)

Country Link
CN (1) CN116917925A (zh)
WO (1) WO2022174395A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630441A (zh) * 2015-12-11 2016-06-01 中国航空工业集团公司西安航空计算技术研究所 一种基于统一染色技术的gpu体系架构
US20160210721A1 (en) * 2015-01-16 2016-07-21 Intel Corporation Graph-based application programming interface architectures with equivalency classes for enhanced image processing parallelism
CN109886407A (zh) * 2019-02-27 2019-06-14 上海商汤智能科技有限公司 数据处理方法、装置、电子设备及计算机可读存储介质
CN109964244A (zh) * 2016-12-22 2019-07-02 苹果公司 用于图形处理的本地图像块
CN111601078A (zh) * 2020-05-12 2020-08-28 西安创腾星泰电子科技有限公司 一种视频数据对地直传的星载视频压缩系统及方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160210721A1 (en) * 2015-01-16 2016-07-21 Intel Corporation Graph-based application programming interface architectures with equivalency classes for enhanced image processing parallelism
CN105630441A (zh) * 2015-12-11 2016-06-01 中国航空工业集团公司西安航空计算技术研究所 一种基于统一染色技术的gpu体系架构
CN109964244A (zh) * 2016-12-22 2019-07-02 苹果公司 用于图形处理的本地图像块
CN109886407A (zh) * 2019-02-27 2019-06-14 上海商汤智能科技有限公司 数据处理方法、装置、电子设备及计算机可读存储介质
CN111601078A (zh) * 2020-05-12 2020-08-28 西安创腾星泰电子科技有限公司 一种视频数据对地直传的星载视频压缩系统及方法

Also Published As

Publication number Publication date
CN116917925A (zh) 2023-10-20

Similar Documents

Publication Publication Date Title
RU2425412C2 (ru) Мультимедиа-процессор, многопоточный по требованию
US8988442B2 (en) Asynchronous notifications for concurrent graphics operations
US9734546B2 (en) Split driver to control multiple graphics processors in a computer system
US11941434B2 (en) Task processing method, processing apparatus, and computer system
US8350864B2 (en) Serializing command streams for graphics processors
US8525841B2 (en) Batching graphics operations with time stamp tracking
US10026145B2 (en) Resource sharing on shader processor of GPU
EP2823459B1 (en) Execution of graphics and non-graphics applications on a graphics processing unit
US8085280B2 (en) Asymmetric two-pass graphics scaling
KR101609079B1 (ko) 그래픽 프로세싱 유닛에서의 명령 선별
US8531470B2 (en) Deferred deletion and cleanup for graphics resources
CN103207774A (zh) 用于解决线程发散的方法和系统
CN101040270A (zh) 命令传输控制装置和命令传输控制方法
EP3794449A1 (en) Scheduling of a plurality of graphic processing units
CN111813541B (zh) 一种任务调度方法、装置、介质和设备
WO2022174395A1 (zh) 图形处理器、图像处理方法和电子设备
US20240005446A1 (en) Methods, systems, and non-transitory storage media for graphics memory allocation
US9165396B2 (en) Graphics processing unit with a texture return buffer and a texture queue
US9405470B2 (en) Data processing system and data processing method
EP2853985A1 (en) Sampler load balancing
CN115269131A (zh) 一种任务调度方法及装置
CN110187957A (zh) 一种下载任务的排队方法、装置及电子设备
CN103164838B (zh) 图形数据处理方法
CN117873664A (zh) 任务调度模块、处理器、电子装置、设备及方法
CN114168300A (zh) 线程调度方法、处理器以及电子装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21926115

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180093752.5

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21926115

Country of ref document: EP

Kind code of ref document: A1