CN117893394A

CN117893394A - Method for accelerating mixing process in graphics pipeline by adopting AI computing power

Info

Publication number: CN117893394A
Application number: CN202311839795.0A
Authority: CN
Inventors: 徐瑞; 项天
Original assignee: Suzhou Graphichina Electronic Technology Co ltd
Current assignee: Suzhou Graphichina Electronic Technology Co ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-16

Abstract

The invention discloses a method for accelerating a color mixing process in a graphics pipeline by adopting AI computing power. In the invention, the data structure is firstly segmented, the fragment is directly stored in the on-chip memory until a certain batch number N is reached, and then the whole color mixing process is completed by the AI computing unit at one time; dividing the whole screen into a plurality of blocks, when fragments fall in the blocks, storing the fragment data of the blocks of N batches into a memory according to the requirement of FIG. 2, and setting all color components of the regions without fragments in the blocks to be 0; the block size is an empirical value related to the application scenario. The AI computing unit is utilized to accelerate the color mixing calculation, reduce related special hardware modules, reduce the area cost of the GPU chip, and the AI computing unit is utilized to accelerate the color mixing calculation, reduce related special hardware modules and reduce the area cost of the GPU chip.

Description

Method for accelerating mixing process in graphics pipeline by adopting AI computing power

Technical Field

The invention belongs to the technical field of graphic processing, and particularly relates to a method for accelerating a mixing process in a graphic pipeline by adopting AI computing power.

Background

Graphics processors (Graphics Processing Unit, GPUs) are components of computers that are dedicated to processing graphics-related computing tasks, the GPUs allocate more computing resources than CPUs on the computing portion than the control portion, and exploit parallelism in the graphics-computing processing tasks as much as possible, and therefore have much faster computing power on graphics processing tasks than CPUs, and are therefore also widely used in artificial intelligence-related computing. With the advent of AI, GPUs typically integrate a neural processor (Neural Processing Unit, NPU) or Tensor Core (Tensor Core) or the like dedicated to AI-related computation in addition to the original processing units for general-purpose computation and graphics processing, because they are designed for AI-related computation acceleration, they are collectively referred to as AI computing units in the present invention.

However, at present, a special ROP unit is usually used for color mixing, which undoubtedly brings additional chip area and data transmission cost, and when only graphics-related computation is performed, the AI computing power unit is in an idle state, which is also wasteful of huge performance.

Disclosure of Invention

The invention aims at: in order to solve the above-mentioned problems, a method for accelerating the blending process in the graphics pipeline by using AI computing power is provided.

The technical scheme adopted by the invention is as follows: a method for accelerating a blending process in a graphics pipeline using AI computing power, the method for accelerating a blending process in a graphics pipeline using AI computing power comprising the steps of:

s1: according to the steps S2-S4, partitioning the data structure, and directly storing the fragments in an on-chip memory until a certain batch number N is reached;

s2: dividing the whole screen space into a plurality of blocks, when fragments fall in the blocks, storing the fragment data of the blocks of N batches into a memory according to the requirement of FIG. 2, and setting all color components of the regions without fragments in the blocks to be 0;

s3: the block size is an empirical value related to the application scenario; for the block without any fragment falling, no calculation is performed;

s4: if no fragment in a certain batch of fragments falls into a certain block, removing a layer corresponding to the batch of fragments from the data stored in the block;

s5: calculating the weight of each element in each batch, namely the duty ratio of the corresponding element color in the color mixing; the subtraction in the hybrid formula can be directly considered as part of the weight, which is equivalent to the weight being negative;

s6: adopting an AI computing unit to complete corresponding computation; in the calculation process, the respective color component data corresponding to all the fragments in each batch in each block are processed according to a matrix byFour matrices representing the nth lot;

s7: firstly, calculating source and destination mixing factors F of all batches, wherein other mixing functions except the mixing function SRC_ALPHA_SATURATE can be completed by using an AI computing unit to perform element-by-element addition operation;

s8, SRC_ALPHA_SATURATE can be completed by performing element-by-element comparison operation after performing element-by-element addition operation by using an AI computing unit; then, carrying out element-by-element addition operation calculation (1-F) on the mixed factors of all layers except the layer with the minimum depth through an AI calculation unit; finally, the AI computing unit is used to perform element-by-element multiplication operation, and the weight matrix used by all batches is calculated, taking the case that the source mixing function is SRC_ALPHA and the target mixing function is ONE_MINUS_SRC_ALPHA as an example:

wherein W is ⁿ Represents a weight matrix, n represents the corresponding lot,representing a source opacity matrix for the corresponding lot, all operations being element-by-element operations;

s9: after the weight calculation is completed, carrying out mixed calculation;

s10: finally, using element-by-element comparison operation, clamping the result to a range of 0 to 1, namely completing color mixing, and ending the whole mixing process in the AI algorithm force acceleration graphics pipeline.

In a preferred embodiment, in the step S1, the color mixing process can have enough computation parallelism, and although this requires more on-chip memory space, the aforementioned shared on-chip memory capacity can fully satisfy the requirement.

In a preferred embodiment, in the step S2, the data structure of the on-chip memory space of the piece metadata is shown in fig. 2; wherein, three numbers of an element are respectively located in a batch z from left to right (because depth test is carried out, depth relation of fragments among batches is fixed, in the invention, the smaller the batch z is, the smaller the depth of the corresponding fragment is, the ordinate y under a screen coordinate system is, and the abscissa x under the screen coordinate system is; each element is a component of the color of a tile and may be one of red R, green G, blue B, and opacity a.

In a preferred embodiment, in the step S4, in the memory, the same color component of all the N batches of fragments is continuously stored as an array, and the coordinate sequence is that the coordinate sequence is firstly increased in x, then y, and finally z; the array of different color components does not require continuity;

as shown in fig. 3, the primitive shapes may be irregular in a batch, and the resulting primitives may not necessarily be stored as regularly as shown in fig. 2.

In a preferred embodiment, in the step S5, for the case that the blending functions are ZERO, ONE, CONSTANT _color, one_mini_CONSTANT_ COLOR, CONSTANT _ALPHA, one_mini_CONSTANT_ALPHA, the weights of all the chips in the same batch are consistent, so only iterative calculation of the weights is needed; for example, there are cases where CONSTANT_ALPHA is used for the source blend factor and ONE_MINUS_CONSTANT_ALPHA is used for the target blend factor

Wherein w is ⁿ Indicating the weight, n indicates the corresponding lot,a constant opacity representing the usage of the corresponding batch, determined by the blending function;

in this case, since the amount of computation is small and can be predetermined, calculation can be completed in advance.

In a preferred embodiment, in the step S5, for the case that the blending function is src_color, one_menu_src_color, dst_color, one_menu_dst_color, src_alpha, one_menu_src_alpha, dst_alpha, one_menu_dst_alpha, or src_alpha_create, the weights of each tile in the same batch are different, and there are:

where x and y represent the screen coordinates of a certain primitive.

In a preferred embodiment, in the step S9, the method for performing the mixing calculation is as follows: and taking the color component matrix as a eigenvalue tensor, taking the corresponding weight matrix as a weight tensor, and performing a two-dimensional convolution operation with the convolution kernel size equal to the block size.

In a preferred embodiment, in the step S9, the method for performing the mixing calculation is as follows: and carrying out element-by-element multiplication on the color component matrix and the weight matrix by using an AI (analog to digital) computing unit in batches, and then carrying out element-by-element addition on the result by using the AI computing unit layer by layer.

In summary, due to the adoption of the technical scheme, the beneficial effects of the invention are as follows:

in the invention, a method for performing color mixing calculation by using an AI (advanced technology) computing unit is provided, so that the AI computing unit is used for realizing acceleration of color mixing calculation, reducing the area cost of a GPU (graphics processing Unit) chip and reducing the area cost of the GPU chip, and the AI computing unit is used for realizing acceleration of color mixing calculation, reducing the related special hardware module and reducing the area cost of the GPU chip.

Drawings

FIG. 1 is a schematic diagram of a system abstraction model according to the present invention;

FIG. 2 is a schematic diagram of a GPU architecture for accelerating discontinuous global memory accesses in accordance with the present invention;

FIG. 3 is a block diagram of a screen according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

With reference to figures 1-3 of the drawings,

examples:

AI computing power units typically have the ability to accelerate generic matrix multiplication (GeneralMatrixMultiplication, GEMM), element-wise multiplication-addition or comparison (Element-wise product/Sum/Max).

The processing units used for general purpose computing in the GPU are collectively referred to herein as general purpose computing units.

The neural processor (Neural Processing Unit, NPU) or Tensor Core (Tensor Core) dedicated to AI-related computation is referred to as an AI computing unit in the present invention.

With the advent of AI, the integration of AI computing units dedicated to AI-related computation in GPUs has become a trend.

AI computing power units typically have the ability to accelerate generic matrix multiplication (General Matrix Multiplication, GEMM), element-wise multiplication-addition or comparison (Element-wise Product/Sum/Max).

AI computing power units are generally capable of providing computing power far beyond general purpose computing power units.

The AI power-calculation unit is generally capable of supporting calculation of 8-bit signed integer, 16-bit floating point number, 32-bit floating point number and other data types.

GPUs with AI computing power units typically have large amounts of on-chip memory for caching various types of computing data.

The invention changes the computing power provided by the general computing power unit into general computing power.

The computing power provided by the AI computing power unit is referred to as AI computing power in the present invention.

In a graphics pipeline, color Blending (Blending) is the process of fusing two or more colors. The blending process typically occurs in the rendering stage, i.e., before the graphical elements are drawn onto the screen.

There are many ways of color mixing, each with different effects. According to the OpenGLES2.0 manual, the mixing mode is determined by a mixing formula and a mixing function, wherein the mixing formula comprises three types of FUNC_ADD and FUNC_ SUBTRACT, FUNC _REVERSE_SUBTRACT; the mixing functions include ZERO, ONE, SRC _color, ONE_MINUS_SRC_color, DST_color, ONE_MINUS_DST_color, SRC_ALPHA, ONE_MINUS_SRC_ALPHA, DST_ALPHA, ONE_MINUS_DST_ ALPHA, CONSTANT _color, ONE_MINUS_CONSTANT_ COLOR, CONSTANT _ALPHA, ONE_MINUS_CONSTANT_ALPHA, SRC_ALPHA_SATURATE.

A method for accelerating a blending process in a graphics pipeline using AI computing power, the method for accelerating a blending process in a graphics pipeline using AI computing power comprising the steps of:

s6: and (5) adopting an AI computing unit to complete corresponding calculation. In the calculation process, the respective color component data corresponding to all the fragments in each batch in each block are processed according to a matrix byFour matrices representing the nth lot;

s8, SRC_ALPHA_SATURATE can be completed by performing element-by-element comparison operation after performing element-by-element addition operation by using an AI computing unit. Then, carrying out element-by-element addition operation calculation (1-F) on the mixed factors of all layers except the layer with the minimum depth through an AI calculation unit; finally, the AI computing unit is used to perform element-by-element multiplication operation, and the weight matrix used by all batches is calculated, taking the case that the source mixing function is SRC_ALPHA and the target mixing function is ONE_MINUS_SRC_ALPHA as an example:

s9: after the weight calculation is completed, carrying out mixed calculation;

In step S1, the color mixing process can have enough computation parallelism, and although this requires more on-chip memory space, the aforementioned shared on-chip memory capacity can fully satisfy the requirement.

In step S2, the data structure of the on-chip memory space of the piece of meta-data is shown in fig. 2. Wherein, three numbers of an element are respectively located in a batch z from left to right (because depth relation among the batches is constant after depth test, in the invention, the smaller the batch z is, the smaller the depth of the corresponding fragment is, the ordinate y under the screen coordinate system is, and the abscissa x under the screen coordinate system is. Each element is a component of the color of a tile and may be one of red R, green G, blue B, and opacity a.

In step S4, in the memory, the same color component of all the N batches of chips is continuously stored as an array, and the coordinate sequence is that the coordinate is increased on x, y and z. The array of different color components does not require continuity;

In step S5, for the case where the blending functions are ZERO, ONE, CONSTANT _color, ONE_MINUS_CONSTANT_ COLOR, CONSTANT _ALPHA, ONE_MINUS_CONSTANT_ALPHA, the weights of all the chips in the same batch are consistent, so only iterative calculation of weights is required. For example, there are cases where CONSTANT_ALPHA is used for the source blend factor and ONE_MINUS_CONSTANT_ALPHA is used for the target blend factor

Wherein w is ⁿ Indicating the weight, n indicates the corresponding lot,the constant opacity used to represent the corresponding lot is determined by the blending function.

In step S5, for the case where the blend functions are SRC_COLOR, ONE_MINUS_SRC_COLOR, DST_COLOR, ONE_MINUS_DST_COLOR, SRC_ALPHA, ONE_MINUS_SRC_ALPHA, DST_ALPHA, ONE_MINUS_DST_ALPHA, SRC_ALPHA_SATURATE, the weights of the chips in the same batch are different, and there are:

where x and y represent the screen coordinates of a certain primitive.

In step S9, the method for performing the hybrid calculation includes: regarding the color component matrix as a eigenvalue tensor, regarding the corresponding weight matrix as a weight tensor, and performing a two-dimensional convolution operation with a convolution kernel size equal to the block size: .

In step S9, the method for performing the hybrid calculation includes: and carrying out element-by-element multiplication on the color component matrix and the weight matrix by using an AI (analog to digital) computing unit in batches, and then carrying out element-by-element addition on the result by using the AI computing unit layer by layer.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for accelerating a blending process in a graphics pipeline using AI computing power, comprising: the method for accelerating the mixing process in the graphics pipeline by adopting AI computing force comprises the following steps:

s5: calculating the weight of each element in each batch, namely the duty ratio of the corresponding element color in the color mixing; the subtraction in the hybrid formula can be directly regarded as part of the weight, which is equivalent to taking the weight negative;

s6: adopting an AI computing unit to complete corresponding computation; in the calculation process, the respective color component data corresponding to all the fragments in each batch in each block are processed according to a matrix byRepresentation ofThe fourth matrix of the nth lot;

s9: after the weight calculation is completed, carrying out mixed calculation;

2. A method for accelerating a blending process in a graphics pipeline using AI computing power as recited in claim 1, wherein: in the step S1, the color mixing process can have sufficient computational parallelism.

3. A method for accelerating a blending process in a graphics pipeline using AI computing power as recited in claim 1, wherein: in the step S2, the data structure of the on-chip memory space of the piece metadata is shown in fig. 2; wherein, three numbers of an element are respectively located in a batch z from left to right (because depth test is carried out, depth relation of fragments among batches is fixed, in the invention, the smaller the batch z is, the smaller the depth of the corresponding fragment is, the ordinate y under a screen coordinate system is, and the abscissa x under the screen coordinate system is; each element is a component of the color of a tile and may be one of red R, green G, blue B, and opacity a.

4. A method for accelerating a blending process in a graphics pipeline using AI computing power as recited in claim 1, wherein: in the step S4, in the memory, the same color component of all the N batches of fragments is continuously stored as an array, and the coordinate sequence is that the coordinate is firstly increased on x, then y and finally z; the array of different color components does not require continuity.

5. A method for accelerating a blending process in a graphics pipeline using AI computing power as recited in claim 1, wherein: in the step S5, for the case that the blending functions are ZERO, ONE, CONSTANT _color, one_menu_CONSTANT_ COLOR, CONSTANT _ALPHA, one_menu_CONSTANT_ALPHA, the weights of all the chips in the same batch are consistent, so only iterative calculation of weights is needed; the source mixing factor is CONSTANT_ALPHA, and the target mixing factor is ONE_MINUS_CONSTANT_ALPHA, which is the case

6. A method for accelerating a blending process in a graphics pipeline using AI computing power as recited in claim 1, wherein: in the step S5, for the case where the blending functions are src_color, one_menu_src_color, dst_color, one_menu_dst_color, src_alpha, one_menu_src_alpha, dst_alpha, one_menu_dst_alpha, src_alpha_sat_alpha, and src_alpha_sat_rate, the weights of the chips in the same batch are different, and there are:

where x and y represent the screen coordinates of a certain primitive.

7. A method for accelerating a blending process in a graphics pipeline using AI computing power as recited in claim 1, wherein: in the step S9, the method for performing the hybrid calculation includes: and taking the color component matrix as a eigenvalue tensor, taking the corresponding weight matrix as a weight tensor, and performing a two-dimensional convolution operation with the convolution kernel size equal to the block size.

8. A method for accelerating a blending process in a graphics pipeline using AI computing power as recited in claim 1, wherein: in the step S9, the method for performing the hybrid calculation includes: and carrying out element-by-element multiplication on the color component matrix and the weight matrix by using an AI (analog to digital) computing unit in batches, and then carrying out element-by-element addition on the result by using the AI computing unit layer by layer.