CN117893394A - Method for accelerating mixing process in graphics pipeline by adopting AI computing power - Google Patents

Method for accelerating mixing process in graphics pipeline by adopting AI computing power Download PDF

Info

Publication number
CN117893394A
CN117893394A CN202311839795.0A CN202311839795A CN117893394A CN 117893394 A CN117893394 A CN 117893394A CN 202311839795 A CN202311839795 A CN 202311839795A CN 117893394 A CN117893394 A CN 117893394A
Authority
CN
China
Prior art keywords
color
alpha
calculation
src
mixing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311839795.0A
Other languages
Chinese (zh)
Inventor
徐瑞
项天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Graphichina Electronic Technology Co ltd
Original Assignee
Suzhou Graphichina Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Graphichina Electronic Technology Co ltd filed Critical Suzhou Graphichina Electronic Technology Co ltd
Priority to CN202311839795.0A priority Critical patent/CN117893394A/en
Publication of CN117893394A publication Critical patent/CN117893394A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Image Generation (AREA)

Abstract

The invention discloses a method for accelerating a color mixing process in a graphics pipeline by adopting AI computing power. In the invention, the data structure is firstly segmented, the fragment is directly stored in the on-chip memory until a certain batch number N is reached, and then the whole color mixing process is completed by the AI computing unit at one time; dividing the whole screen into a plurality of blocks, when fragments fall in the blocks, storing the fragment data of the blocks of N batches into a memory according to the requirement of FIG. 2, and setting all color components of the regions without fragments in the blocks to be 0; the block size is an empirical value related to the application scenario. The AI computing unit is utilized to accelerate the color mixing calculation, reduce related special hardware modules, reduce the area cost of the GPU chip, and the AI computing unit is utilized to accelerate the color mixing calculation, reduce related special hardware modules and reduce the area cost of the GPU chip.

Description

Method for accelerating mixing process in graphics pipeline by adopting AI computing power
Technical Field
The invention belongs to the technical field of graphic processing, and particularly relates to a method for accelerating a mixing process in a graphic pipeline by adopting AI computing power.
Background
Graphics processors (Graphics Processing Unit, GPUs) are components of computers that are dedicated to processing graphics-related computing tasks, the GPUs allocate more computing resources than CPUs on the computing portion than the control portion, and exploit parallelism in the graphics-computing processing tasks as much as possible, and therefore have much faster computing power on graphics processing tasks than CPUs, and are therefore also widely used in artificial intelligence-related computing. With the advent of AI, GPUs typically integrate a neural processor (Neural Processing Unit, NPU) or Tensor Core (Tensor Core) or the like dedicated to AI-related computation in addition to the original processing units for general-purpose computation and graphics processing, because they are designed for AI-related computation acceleration, they are collectively referred to as AI computing units in the present invention.
However, at present, a special ROP unit is usually used for color mixing, which undoubtedly brings additional chip area and data transmission cost, and when only graphics-related computation is performed, the AI computing power unit is in an idle state, which is also wasteful of huge performance.
Disclosure of Invention
The invention aims at: in order to solve the above-mentioned problems, a method for accelerating the blending process in the graphics pipeline by using AI computing power is provided.
The technical scheme adopted by the invention is as follows: a method for accelerating a blending process in a graphics pipeline using AI computing power, the method for accelerating a blending process in a graphics pipeline using AI computing power comprising the steps of:
s1: according to the steps S2-S4, partitioning the data structure, and directly storing the fragments in an on-chip memory until a certain batch number N is reached;
s2: dividing the whole screen space into a plurality of blocks, when fragments fall in the blocks, storing the fragment data of the blocks of N batches into a memory according to the requirement of FIG. 2, and setting all color components of the regions without fragments in the blocks to be 0;
s3: the block size is an empirical value related to the application scenario; for the block without any fragment falling, no calculation is performed;
s4: if no fragment in a certain batch of fragments falls into a certain block, removing a layer corresponding to the batch of fragments from the data stored in the block;
s5: calculating the weight of each element in each batch, namely the duty ratio of the corresponding element color in the color mixing; the subtraction in the hybrid formula can be directly considered as part of the weight, which is equivalent to the weight being negative;
s6: adopting an AI computing unit to complete corresponding computation; in the calculation process, the respective color component data corresponding to all the fragments in each batch in each block are processed according to a matrix byFour matrices representing the nth lot;
s7: firstly, calculating source and destination mixing factors F of all batches, wherein other mixing functions except the mixing function SRC_ALPHA_SATURATE can be completed by using an AI computing unit to perform element-by-element addition operation;
s8, SRC_ALPHA_SATURATE can be completed by performing element-by-element comparison operation after performing element-by-element addition operation by using an AI computing unit; then, carrying out element-by-element addition operation calculation (1-F) on the mixed factors of all layers except the layer with the minimum depth through an AI calculation unit; finally, the AI computing unit is used to perform element-by-element multiplication operation, and the weight matrix used by all batches is calculated, taking the case that the source mixing function is SRC_ALPHA and the target mixing function is ONE_MINUS_SRC_ALPHA as an example:
wherein W is n Represents a weight matrix, n represents the corresponding lot,representing a source opacity matrix for the corresponding lot, all operations being element-by-element operations;
s9: after the weight calculation is completed, carrying out mixed calculation;
s10: finally, using element-by-element comparison operation, clamping the result to a range of 0 to 1, namely completing color mixing, and ending the whole mixing process in the AI algorithm force acceleration graphics pipeline.
In a preferred embodiment, in the step S1, the color mixing process can have enough computation parallelism, and although this requires more on-chip memory space, the aforementioned shared on-chip memory capacity can fully satisfy the requirement.
In a preferred embodiment, in the step S2, the data structure of the on-chip memory space of the piece metadata is shown in fig. 2; wherein, three numbers of an element are respectively located in a batch z from left to right (because depth test is carried out, depth relation of fragments among batches is fixed, in the invention, the smaller the batch z is, the smaller the depth of the corresponding fragment is, the ordinate y under a screen coordinate system is, and the abscissa x under the screen coordinate system is; each element is a component of the color of a tile and may be one of red R, green G, blue B, and opacity a.
In a preferred embodiment, in the step S4, in the memory, the same color component of all the N batches of fragments is continuously stored as an array, and the coordinate sequence is that the coordinate sequence is firstly increased in x, then y, and finally z; the array of different color components does not require continuity;
as shown in fig. 3, the primitive shapes may be irregular in a batch, and the resulting primitives may not necessarily be stored as regularly as shown in fig. 2.
In a preferred embodiment, in the step S5, for the case that the blending functions are ZERO, ONE, CONSTANT _color, one_mini_CONSTANT_ COLOR, CONSTANT _ALPHA, one_mini_CONSTANT_ALPHA, the weights of all the chips in the same batch are consistent, so only iterative calculation of the weights is needed; for example, there are cases where CONSTANT_ALPHA is used for the source blend factor and ONE_MINUS_CONSTANT_ALPHA is used for the target blend factor
Wherein w is n Indicating the weight, n indicates the corresponding lot,a constant opacity representing the usage of the corresponding batch, determined by the blending function;
in this case, since the amount of computation is small and can be predetermined, calculation can be completed in advance.
In a preferred embodiment, in the step S5, for the case that the blending function is src_color, one_menu_src_color, dst_color, one_menu_dst_color, src_alpha, one_menu_src_alpha, dst_alpha, one_menu_dst_alpha, or src_alpha_create, the weights of each tile in the same batch are different, and there are:
where x and y represent the screen coordinates of a certain primitive.
In a preferred embodiment, in the step S9, the method for performing the mixing calculation is as follows: and taking the color component matrix as a eigenvalue tensor, taking the corresponding weight matrix as a weight tensor, and performing a two-dimensional convolution operation with the convolution kernel size equal to the block size.
In a preferred embodiment, in the step S9, the method for performing the mixing calculation is as follows: and carrying out element-by-element multiplication on the color component matrix and the weight matrix by using an AI (analog to digital) computing unit in batches, and then carrying out element-by-element addition on the result by using the AI computing unit layer by layer.
In summary, due to the adoption of the technical scheme, the beneficial effects of the invention are as follows:
in the invention, a method for performing color mixing calculation by using an AI (advanced technology) computing unit is provided, so that the AI computing unit is used for realizing acceleration of color mixing calculation, reducing the area cost of a GPU (graphics processing Unit) chip and reducing the area cost of the GPU chip, and the AI computing unit is used for realizing acceleration of color mixing calculation, reducing the related special hardware module and reducing the area cost of the GPU chip.
Drawings
FIG. 1 is a schematic diagram of a system abstraction model according to the present invention;
FIG. 2 is a schematic diagram of a GPU architecture for accelerating discontinuous global memory accesses in accordance with the present invention;
FIG. 3 is a block diagram of a screen according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
With reference to figures 1-3 of the drawings,
examples:
AI computing power units typically have the ability to accelerate generic matrix multiplication (GeneralMatrixMultiplication, GEMM), element-wise multiplication-addition or comparison (Element-wise product/Sum/Max).
The processing units used for general purpose computing in the GPU are collectively referred to herein as general purpose computing units.
The neural processor (Neural Processing Unit, NPU) or Tensor Core (Tensor Core) dedicated to AI-related computation is referred to as an AI computing unit in the present invention.
With the advent of AI, the integration of AI computing units dedicated to AI-related computation in GPUs has become a trend.
AI computing power units typically have the ability to accelerate generic matrix multiplication (General Matrix Multiplication, GEMM), element-wise multiplication-addition or comparison (Element-wise Product/Sum/Max).
AI computing power units are generally capable of providing computing power far beyond general purpose computing power units.
The AI power-calculation unit is generally capable of supporting calculation of 8-bit signed integer, 16-bit floating point number, 32-bit floating point number and other data types.
GPUs with AI computing power units typically have large amounts of on-chip memory for caching various types of computing data.
The invention changes the computing power provided by the general computing power unit into general computing power.
The computing power provided by the AI computing power unit is referred to as AI computing power in the present invention.
AI computing power units are generally capable of providing computing power far beyond general purpose computing power units.
The AI power-calculation unit is generally capable of supporting calculation of 8-bit signed integer, 16-bit floating point number, 32-bit floating point number and other data types.
GPUs with AI computing power units typically have large amounts of on-chip memory for caching various types of computing data.
In a graphics pipeline, color Blending (Blending) is the process of fusing two or more colors. The blending process typically occurs in the rendering stage, i.e., before the graphical elements are drawn onto the screen.
There are many ways of color mixing, each with different effects. According to the OpenGLES2.0 manual, the mixing mode is determined by a mixing formula and a mixing function, wherein the mixing formula comprises three types of FUNC_ADD and FUNC_ SUBTRACT, FUNC _REVERSE_SUBTRACT; the mixing functions include ZERO, ONE, SRC _color, ONE_MINUS_SRC_color, DST_color, ONE_MINUS_DST_color, SRC_ALPHA, ONE_MINUS_SRC_ALPHA, DST_ALPHA, ONE_MINUS_DST_ ALPHA, CONSTANT _color, ONE_MINUS_CONSTANT_ COLOR, CONSTANT _ALPHA, ONE_MINUS_CONSTANT_ALPHA, SRC_ALPHA_SATURATE.
A method for accelerating a blending process in a graphics pipeline using AI computing power, the method for accelerating a blending process in a graphics pipeline using AI computing power comprising the steps of:
s1: according to the steps S2-S4, partitioning the data structure, and directly storing the fragments in an on-chip memory until a certain batch number N is reached;
s2: dividing the whole screen space into a plurality of blocks, when fragments fall in the blocks, storing the fragment data of the blocks of N batches into a memory according to the requirement of FIG. 2, and setting all color components of the regions without fragments in the blocks to be 0;
s3: the block size is an empirical value related to the application scenario; for the block without any fragment falling, no calculation is performed;
s4: if no fragment in a certain batch of fragments falls into a certain block, removing a layer corresponding to the batch of fragments from the data stored in the block;
s5: calculating the weight of each element in each batch, namely the duty ratio of the corresponding element color in the color mixing; the subtraction in the hybrid formula can be directly considered as part of the weight, which is equivalent to the weight being negative;
s6: and (5) adopting an AI computing unit to complete corresponding calculation. In the calculation process, the respective color component data corresponding to all the fragments in each batch in each block are processed according to a matrix byFour matrices representing the nth lot;
s7: firstly, calculating source and destination mixing factors F of all batches, wherein other mixing functions except the mixing function SRC_ALPHA_SATURATE can be completed by using an AI computing unit to perform element-by-element addition operation;
s8, SRC_ALPHA_SATURATE can be completed by performing element-by-element comparison operation after performing element-by-element addition operation by using an AI computing unit. Then, carrying out element-by-element addition operation calculation (1-F) on the mixed factors of all layers except the layer with the minimum depth through an AI calculation unit; finally, the AI computing unit is used to perform element-by-element multiplication operation, and the weight matrix used by all batches is calculated, taking the case that the source mixing function is SRC_ALPHA and the target mixing function is ONE_MINUS_SRC_ALPHA as an example:
wherein W is n Represents a weight matrix, n represents the corresponding lot,representing a source opacity matrix for the corresponding lot, all operations being element-by-element operations;
s9: after the weight calculation is completed, carrying out mixed calculation;
s10: finally, using element-by-element comparison operation, clamping the result to a range of 0 to 1, namely completing color mixing, and ending the whole mixing process in the AI algorithm force acceleration graphics pipeline.
In step S1, the color mixing process can have enough computation parallelism, and although this requires more on-chip memory space, the aforementioned shared on-chip memory capacity can fully satisfy the requirement.
In step S2, the data structure of the on-chip memory space of the piece of meta-data is shown in fig. 2. Wherein, three numbers of an element are respectively located in a batch z from left to right (because depth relation among the batches is constant after depth test, in the invention, the smaller the batch z is, the smaller the depth of the corresponding fragment is, the ordinate y under the screen coordinate system is, and the abscissa x under the screen coordinate system is. Each element is a component of the color of a tile and may be one of red R, green G, blue B, and opacity a.
In step S4, in the memory, the same color component of all the N batches of chips is continuously stored as an array, and the coordinate sequence is that the coordinate is increased on x, y and z. The array of different color components does not require continuity;
as shown in fig. 3, the primitive shapes may be irregular in a batch, and the resulting primitives may not necessarily be stored as regularly as shown in fig. 2.
In step S5, for the case where the blending functions are ZERO, ONE, CONSTANT _color, ONE_MINUS_CONSTANT_ COLOR, CONSTANT _ALPHA, ONE_MINUS_CONSTANT_ALPHA, the weights of all the chips in the same batch are consistent, so only iterative calculation of weights is required. For example, there are cases where CONSTANT_ALPHA is used for the source blend factor and ONE_MINUS_CONSTANT_ALPHA is used for the target blend factor
Wherein w is n Indicating the weight, n indicates the corresponding lot,the constant opacity used to represent the corresponding lot is determined by the blending function.
In this case, since the amount of computation is small and can be predetermined, calculation can be completed in advance.
In step S5, for the case where the blend functions are SRC_COLOR, ONE_MINUS_SRC_COLOR, DST_COLOR, ONE_MINUS_DST_COLOR, SRC_ALPHA, ONE_MINUS_SRC_ALPHA, DST_ALPHA, ONE_MINUS_DST_ALPHA, SRC_ALPHA_SATURATE, the weights of the chips in the same batch are different, and there are:
where x and y represent the screen coordinates of a certain primitive.
In step S9, the method for performing the hybrid calculation includes: regarding the color component matrix as a eigenvalue tensor, regarding the corresponding weight matrix as a weight tensor, and performing a two-dimensional convolution operation with a convolution kernel size equal to the block size: .
In step S9, the method for performing the hybrid calculation includes: and carrying out element-by-element multiplication on the color component matrix and the weight matrix by using an AI (analog to digital) computing unit in batches, and then carrying out element-by-element addition on the result by using the AI computing unit layer by layer.
In the invention, a method for performing color mixing calculation by using an AI (advanced technology) computing unit is provided, so that the AI computing unit is used for realizing acceleration of color mixing calculation, reducing the area cost of a GPU (graphics processing Unit) chip and reducing the area cost of the GPU chip, and the AI computing unit is used for realizing acceleration of color mixing calculation, reducing the related special hardware module and reducing the area cost of the GPU chip.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. A method for accelerating a blending process in a graphics pipeline using AI computing power, comprising: the method for accelerating the mixing process in the graphics pipeline by adopting AI computing force comprises the following steps:
s1: according to the steps S2-S4, partitioning the data structure, and directly storing the fragments in an on-chip memory until a certain batch number N is reached;
s2: dividing the whole screen space into a plurality of blocks, when fragments fall in the blocks, storing the fragment data of the blocks of N batches into a memory according to the requirement of FIG. 2, and setting all color components of the regions without fragments in the blocks to be 0;
s3: the block size is an empirical value related to the application scenario; for the block without any fragment falling, no calculation is performed;
s4: if no fragment in a certain batch of fragments falls into a certain block, removing a layer corresponding to the batch of fragments from the data stored in the block;
s5: calculating the weight of each element in each batch, namely the duty ratio of the corresponding element color in the color mixing; the subtraction in the hybrid formula can be directly regarded as part of the weight, which is equivalent to taking the weight negative;
s6: adopting an AI computing unit to complete corresponding computation; in the calculation process, the respective color component data corresponding to all the fragments in each batch in each block are processed according to a matrix byRepresentation ofThe fourth matrix of the nth lot;
s7: firstly, calculating source and destination mixing factors F of all batches, wherein other mixing functions except the mixing function SRC_ALPHA_SATURATE can be completed by using an AI computing unit to perform element-by-element addition operation;
s8, SRC_ALPHA_SATURATE can be completed by performing element-by-element comparison operation after performing element-by-element addition operation by using an AI computing unit; then, carrying out element-by-element addition operation calculation (1-F) on the mixed factors of all layers except the layer with the minimum depth through an AI calculation unit; finally, the AI computing unit is used to perform element-by-element multiplication operation, and the weight matrix used by all batches is calculated, taking the case that the source mixing function is SRC_ALPHA and the target mixing function is ONE_MINUS_SRC_ALPHA as an example:
wherein W is n Represents a weight matrix, n represents the corresponding lot,representing a source opacity matrix for the corresponding lot, all operations being element-by-element operations;
s9: after the weight calculation is completed, carrying out mixed calculation;
s10: finally, using element-by-element comparison operation, clamping the result to a range of 0 to 1, namely completing color mixing, and ending the whole mixing process in the AI algorithm force acceleration graphics pipeline.
2. A method for accelerating a blending process in a graphics pipeline using AI computing power as recited in claim 1, wherein: in the step S1, the color mixing process can have sufficient computational parallelism.
3. A method for accelerating a blending process in a graphics pipeline using AI computing power as recited in claim 1, wherein: in the step S2, the data structure of the on-chip memory space of the piece metadata is shown in fig. 2; wherein, three numbers of an element are respectively located in a batch z from left to right (because depth test is carried out, depth relation of fragments among batches is fixed, in the invention, the smaller the batch z is, the smaller the depth of the corresponding fragment is, the ordinate y under a screen coordinate system is, and the abscissa x under the screen coordinate system is; each element is a component of the color of a tile and may be one of red R, green G, blue B, and opacity a.
4. A method for accelerating a blending process in a graphics pipeline using AI computing power as recited in claim 1, wherein: in the step S4, in the memory, the same color component of all the N batches of fragments is continuously stored as an array, and the coordinate sequence is that the coordinate is firstly increased on x, then y and finally z; the array of different color components does not require continuity.
5. A method for accelerating a blending process in a graphics pipeline using AI computing power as recited in claim 1, wherein: in the step S5, for the case that the blending functions are ZERO, ONE, CONSTANT _color, one_menu_CONSTANT_ COLOR, CONSTANT _ALPHA, one_menu_CONSTANT_ALPHA, the weights of all the chips in the same batch are consistent, so only iterative calculation of weights is needed; the source mixing factor is CONSTANT_ALPHA, and the target mixing factor is ONE_MINUS_CONSTANT_ALPHA, which is the case
Wherein w is n Indicating the weight, n indicates the corresponding lot,a constant opacity representing the usage of the corresponding batch, determined by the blending function;
in this case, since the amount of computation is small and can be predetermined, calculation can be completed in advance.
6. A method for accelerating a blending process in a graphics pipeline using AI computing power as recited in claim 1, wherein: in the step S5, for the case where the blending functions are src_color, one_menu_src_color, dst_color, one_menu_dst_color, src_alpha, one_menu_src_alpha, dst_alpha, one_menu_dst_alpha, src_alpha_sat_alpha, and src_alpha_sat_rate, the weights of the chips in the same batch are different, and there are:
where x and y represent the screen coordinates of a certain primitive.
7. A method for accelerating a blending process in a graphics pipeline using AI computing power as recited in claim 1, wherein: in the step S9, the method for performing the hybrid calculation includes: and taking the color component matrix as a eigenvalue tensor, taking the corresponding weight matrix as a weight tensor, and performing a two-dimensional convolution operation with the convolution kernel size equal to the block size.
8. A method for accelerating a blending process in a graphics pipeline using AI computing power as recited in claim 1, wherein: in the step S9, the method for performing the hybrid calculation includes: and carrying out element-by-element multiplication on the color component matrix and the weight matrix by using an AI (analog to digital) computing unit in batches, and then carrying out element-by-element addition on the result by using the AI computing unit layer by layer.
CN202311839795.0A 2023-12-29 2023-12-29 Method for accelerating mixing process in graphics pipeline by adopting AI computing power Pending CN117893394A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311839795.0A CN117893394A (en) 2023-12-29 2023-12-29 Method for accelerating mixing process in graphics pipeline by adopting AI computing power

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311839795.0A CN117893394A (en) 2023-12-29 2023-12-29 Method for accelerating mixing process in graphics pipeline by adopting AI computing power

Publications (1)

Publication Number Publication Date
CN117893394A true CN117893394A (en) 2024-04-16

Family

ID=90645148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311839795.0A Pending CN117893394A (en) 2023-12-29 2023-12-29 Method for accelerating mixing process in graphics pipeline by adopting AI computing power

Country Status (1)

Country Link
CN (1) CN117893394A (en)

Similar Documents

Publication Publication Date Title
US10884734B2 (en) Generalized acceleration of matrix multiply accumulate operations
TWI811291B (en) Deep learning accelerator and method for accelerating deep learning operations
US11106261B2 (en) Optimal operating point estimator for hardware operating under a shared power/thermal constraint
US9262797B2 (en) Multi-sample surface processing using one sample
US11816482B2 (en) Generalized acceleration of matrix multiply accumulate operations
US20190179635A1 (en) Method and apparatus for tensor and convolution operations
US9665958B2 (en) System, method, and computer program product for redistributing a multi-sample processing workload between threads
EP3594905B1 (en) Scalable parallel tessellation
US11645533B2 (en) IR drop prediction with maximum convolutional neural network
KR101609079B1 (en) Instruction culling in graphics processing unit
EP3678037A1 (en) Neural network generator
CN110807827A (en) System generation of stable barycentric coordinates and direct plane equation access
US10114755B2 (en) System, method, and computer program product for warming a cache for a task launch
CN113822975B (en) Techniques for efficient sampling of images
WO2021120577A1 (en) Method for data computation in neural network model, image processing method, and device
CN112084023A (en) Data parallel processing method, electronic equipment and computer readable storage medium
CN116795324A (en) Mixed precision floating-point multiplication device and mixed precision floating-point number processing method
CN117893394A (en) Method for accelerating mixing process in graphics pipeline by adopting AI computing power
CN112801276A (en) Data processing method, processor and electronic equipment
US20230334758A1 (en) Methods and hardware logic for writing ray tracing data from a shader processing unit of a graphics processing unit
TWI798591B (en) Convolutional neural network operation method and device
US20240160406A1 (en) Low-precision floating-point datapath in a computer processor
GB2625797A (en) Retrieving a block of data items in a processor
CN107527320A (en) A kind of method for accelerating bilinear interpolation to calculate
GB2614098A (en) Methods and hardware logic for writing ray tracing data from a shader processing unit of a graphics processing unit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination