CN117546199A - Graphic processing method and device - Google Patents

Graphic processing method and device Download PDF

Info

Publication number
CN117546199A
CN117546199A CN202180099645.3A CN202180099645A CN117546199A CN 117546199 A CN117546199 A CN 117546199A CN 202180099645 A CN202180099645 A CN 202180099645A CN 117546199 A CN117546199 A CN 117546199A
Authority
CN
China
Prior art keywords
texture
instruction
target
instructions
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180099645.3A
Other languages
Chinese (zh)
Inventor
徐顾伟
韩峰
吴任初
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN117546199A publication Critical patent/CN117546199A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Generation (AREA)

Abstract

A graphic processing method and a graphic processing device relate to the technical field of computers. The method comprises the following steps: obtaining a texture instruction, wherein the texture instruction comprises first indication information, and the first indication information is used for indicating target operation required to be executed in a texture sampling process; and executing a texture sampling process according to the texture instruction, and executing the target operation in the texture sampling process to obtain a target sampling result. By the scheme, the power consumption of the GPU is reduced, and the performance of the GPU is improved.

Description

Graphic processing method and device Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a graphics processing method and apparatus.
Background
Graphics processors (graphic process unit, GPUs), also known as display cores, vision processors, display chips, are microprocessors that perform image and graphics related operations on personal computers, workstations, gaming machines, and some mobile devices (e.g., tablet computers, smartphones, etc.). An Execution Unit (EU) is an execution unit in the GPU, and is responsible for executing instructions, and actually has a function of both a controller and an arithmetic unit. Wherein, the conventional arithmetic logic instruction is directly executed in an arithmetic logic unit (arithmetic and logic unit, ALU) in EU, and the operation result is written into a register; texture instructions and memory access instructions are completed by specialized coprocessors (e.g., texture Units (TUs) and memory access units (load store units), etc.).
When the ALU realizes the operation function, data is generally required to be loaded into the register from the memory through the texture unit and the memory access unit, and then the operation is performed, so that the problems of increasing the occupied number of the register, increasing the read-write times of the register and the like are caused, and the power consumption of the GPU is greatly increased, and the performance of the GPU is reduced.
Disclosure of Invention
The application provides a graphics processing method and device, which are beneficial to reducing the power consumption of a GPU and improving the performance of the GPU.
In a first aspect, embodiments of the present application provide a graphics processing method, which may be implemented by a texture unit, and in particular, the method may include: obtaining a texture instruction, wherein the texture instruction comprises first indication information, and the first indication information is used for indicating target operation required to be executed in a texture sampling process; and executing the texture sampling process according to the texture instruction, and executing the target operation in the texture sampling process to obtain a target sampling result.
According to the scheme, according to the first indication information in the texture instruction, the texture unit is utilized to finish texture sampling and finish target operation in the texture instruction, so that the number occupied by the general registers can be reduced by utilizing the thought of 'approaching' memory calculation, the warp concurrency number is greatly increased, the GPU performance is improved, the number of ALU instructions is greatly reduced, the times of reading and writing the general registers by the ALU are reduced, the GPU power consumption is reduced, and the GPU performance is improved. By way of example, the target operations may include, but are not limited to, texture coordinate correlation calculations, such as loops of texture instructions, weighted average operations of texture filtering results, max/min operations, merging operations of multiple related texture instructions, and the like.
In one possible design, performing a texture sampling process according to the texture instruction, and performing the target operation in the texture sampling process to obtain a target sampling result, including: determining target instruction parameters corresponding to the target operation in a plurality of groups of stored instruction parameters corresponding to various operations according to the first indication information in the texture instruction, wherein the target instruction parameters are used for executing the target operation; and in the texture sampling process, executing the target operation according to the target instruction parameter to obtain the target sampling result.
Through the scheme, the parameter storage and the state setting are performed in the texture unit in advance, so that the texture unit can have corresponding operation capability, and corresponding operation processing can be finished additionally after a texture instruction is received subsequently.
In one possible design, the texture instruction further includes first information, where the first information is used to obtain first texture data corresponding to the texture instruction, the target operation includes a first target operation that needs to be performed on the first information and/or a second target operation that needs to be performed on the first texture data, and performing the target operation in a texture sampling process to obtain a target sampling result, where the method includes: in the texture sampling process, running the stored instruction parameters corresponding to the first target operation, and executing the first target operation and texture sampling on the first information to obtain the first texture data; and/or running the saved instruction parameters corresponding to the second target operation to execute the second target operation on the first texture data to obtain second texture data; and obtaining the target sampling result according to the first texture data or the second texture data.
In the present application, the arithmetic processing implemented by the texture unit may be various, and the present application is not limited thereto.
For example, the first information includes first texture coordinates, and the first target operation includes at least one of the following: a preprocessing operation on the first texture coordinates, the preprocessing operation comprising: performing perspective transformation processing operation on the first texture coordinates and/or performing range conversion processing operation on the first texture coordinates; an offset processing operation on the first texture coordinate, wherein, in a case where the texture instruction involves a split processing, the offset processing operation on the first texture coordinate includes: and respectively carrying out offset processing operation on the plurality of sub-texture instructions obtained by splitting the texture instructions.
For example, the first texture data includes a first return value, and the second target operation includes: and a first post-processing operation on the first return value, wherein when the first texture data relates to merging processing, the first post-processing operation on the first return value comprises post-processing operations on first return values corresponding to a plurality of sub-texture instructions obtained by splitting the texture instructions, and the first post-processing operation comprises one or more of the following steps: weighted average operation, maximum value calculation, minimum value calculation, and component range conversion calculation.
Illustratively, at least two sub-texture instructions of the plurality of sub-texture instructions split by the texture instruction satisfy any one of: the at least two sub-texture instructions correspond to different textures, and the first texture coordinates are the same; the offset texture instructions corresponding to the at least two sub-texture instructions correspond to the same texture, the first texture coordinates are the same, and the first offset amounts are different; the at least two sub-texture instructions correspond to the same texture, the first texture coordinates are the same, and the first offsets are different.
In a second aspect, an embodiment of the present application provides a graphics processing apparatus, including: the communication unit is used for acquiring a texture instruction, wherein the texture instruction comprises first indication information, and the first indication information is used for indicating target operation which needs to be executed in a texture sampling process; and the processing unit is used for executing a texture sampling process according to the texture instruction and executing the target operation in the texture sampling process to obtain a target sampling result.
In one possible design, the processing unit is configured to: determining target instruction parameters corresponding to the target operation in a plurality of groups of stored instruction parameters corresponding to various operations according to the first indication information in the texture instruction, wherein the target instruction parameters are used for executing the target operation; and in the texture sampling process, executing the target operation according to the target instruction parameters to obtain the target sampling result.
In one possible design, the texture instruction further includes first information, the first information is used for acquiring first texture data corresponding to the texture instruction, the target operation includes a first target operation to be executed on the first information and/or a second target operation to be executed on the first texture data, and the processing unit is used for: executing the target operation in the texture sampling process to obtain a target sampling result, wherein the target operation comprises the following steps: in the texture sampling process, running the stored instruction parameters corresponding to the first target operation, and executing the first target operation and texture sampling on the first information to obtain the first texture data; and/or running the saved instruction parameters corresponding to the second target operation to execute the second target operation on the first texture data to obtain second texture data; and obtaining the target sampling result according to the first texture data or the second texture data.
In one possible design, the first information includes first texture coordinates therein, and the first target operation includes at least one of: a preprocessing operation on the first texture coordinates, the preprocessing operation comprising: performing perspective transformation processing operation on the first texture coordinates and/or performing range conversion processing operation on the first texture coordinates; an offset processing operation on the first texture coordinate, wherein, in a case where the texture instruction involves a split processing, the offset processing operation on the first texture coordinate includes: and respectively carrying out offset processing operation on the plurality of sub-texture instructions obtained by splitting the texture instructions.
In one possible design, the first texture data includes a first return value, and the second target operation includes: and a first post-processing operation on the first return value, wherein when the first texture data relates to merging processing, the first post-processing operation on the first return value comprises post-processing operations on first return values corresponding to a plurality of sub-texture instructions obtained by splitting the texture instructions, and the first post-processing operation comprises one or more of the following steps: weighted average operation, maximum value calculation, minimum value calculation, and component range conversion calculation.
In one possible design, at least two sub-texture instructions of the plurality of sub-texture instructions split by the texture instruction satisfy any one of: the at least two sub-texture instructions correspond to different textures, and the first texture coordinates are the same; the offset texture instructions corresponding to the at least two sub-texture instructions correspond to the same texture, the first texture coordinates are the same, and the first offset amounts are different; the at least two sub-texture instructions correspond to the same texture, the first texture coordinates are the same, and the first offsets are different.
In a third aspect, embodiments of the present application provide a computer-readable storage medium having stored therein computer-readable instructions that, when read and executed by a computer, cause the computer to perform the method of any one of the possible designs described above.
In a fourth aspect, embodiments of the present application provide a computer program product which, when read and executed by a computer, causes the computer to perform the method of any one of the possible designs described above.
The embodiment of the application provides a chip for reading and executing a software program stored in a memory to realize the method in any one of the possible designs. The memory is connected with the chip or is built in the chip.
Drawings
FIG. 1 is a schematic diagram of a GPU system;
FIG. 2 is a schematic diagram of a texture unit;
FIG. 3 shows a schematic diagram of Filtering with Bi-linear Filtering;
FIG. 4 is a schematic diagram of a texture unit suitable for use in embodiments of the present application;
FIG. 5 shows a flow diagram of a graphics processing method of an embodiment of the present application;
FIGS. 6 a-6 b are flow diagrams illustrating a graphics processing method according to an embodiment of the present application;
FIG. 7 is a schematic diagram showing the configuration of a graphic processing apparatus according to an embodiment of the present application;
fig. 8 shows a schematic structural diagram of a communication device according to an embodiment of the present application.
Detailed Description
1. GPU (graphics processing Unit)
The GPU, also called a display core, a graphics processor, and a display chip, is a microprocessor that performs image and graphics related operations on a personal computer, a workstation, a game machine, and some mobile devices (such as a tablet computer, a smart phone, etc.).
Referring to fig. 1, the system architecture of the GPU can be generally divided into three layers:
(1) The application layer, including GPU Applications (APP), is referred to as GPU applications for short.
(2) The driving layer comprises a GPU device driver development component (driver development kit, DDK) which is called GPU driving for short.
(3) Hardware layers, including GPU hardware, such as job manager (job manager), N loader task creator (task builder), N graphics processing cores (graphic process cluster, GPC), XBAR bus, secondary cache (L2 cache), memory (e.g., double Data Rate (DDR) storage space), and system cache (system cache), N may be 4, 8, 16, etc. Each GPC includes, for example, an instruction cache (instruction cache), a scheduler (scheduler), a plurality of Execution Units (EUs), a register file (which may also be referred to herein simply as a register), and coprocessors such as Texture Units (TUs) and memory access units (LS units), etc.
Conventional GPU applications include games, graphics rendering programs, and the like. Because of the high concurrency characteristics of GPUs, GPUs have now been widely used in general-purpose parallel algorithm processing, such as image processing, audio processing, and other general-purpose concurrency programs. The programs executed by the GPU are Shader (loader) programs, and many different types of loader programs may be included in the GPU application, such as Vertex shading (Vertex loader) programs, geometry shading (Geometry loader) programs, fragment shading (Fragment loader) programs, computation processing (computer loader) programs, and so forth. A compiler (compiler) will compile all of these loader programs in advance for subsequent execution by the GPU hardware.
The GPU application may drive the GPU by invoking a standard interface. Typical GPU application standard interfaces are divided into desktop domain standard interfaces, mobile domain standard interfaces and hybrid standard interfaces. The standard interface commonly used in the desktop field is DirectX, openGL; the mainstream standard interfaces in the mobile field include OpenGLES and Vulkan, where Vulkan can also be applied in the desktop field, and belongs to a hybrid standard interface. The GPU driver may be configured to parse these standard interface functions and translate them into task commands that the GPU hardware is capable of recognizing and send them to the GPU hardware. Meanwhile, the GPU driver can also manage and allocate resources for the work of the GPU.
In the GPU hardware, the work manager may be configured to receive a task command issued by the GPU driver, process a dependency relationship between tasks, and then allocate the tasks to corresponding GPCs according to the task load states of the N GPCs. The work manager may also issue tasks to multiple GPCs for concurrent execution. The main function of the loader task creator is to generate different loader tasks which can be executed on GPC according to task commands issued by the work manager, wherein each loader task corresponds to a pre-compiled loader program in different GPU applications. For example, the Fragment loader task corresponds to a Fragment loader program.
After each GPC receives the loader task, instruction execution is typically performed at the warp granularity. Among other things, a warp may generally be considered a group of threads executing instructions, which typically contains multiple threads, e.g., 16, 32, 64, 128, etc. Each thread executes instructions in the same program, each instruction processing a different piece of data, which is a single instruction multiple data (single instruction multiple data, SIMD) or Single Instruction Multiple Thread (SIMT) processing unit architecture. Multiple warp concurrencies are typically allowed in one GPC, e.g., a maximum number of warp supports 32, 64, etc.
In each GPC, an instruction cache unit is used to cache dependent instructions. The scheduler is responsible for selecting instruction execution corresponding to one warp from a plurality of warp. Where the operands of the warp corresponding instruction are stored in a register file or some immediate or the like (not shown). For N warp, register spaces corresponding to N warp need to be allocated for storing operands and results of the operation. When the scheduler selects an instruction, the instruction is acquired from the instruction cache unit, and if no corresponding instruction exists in the instruction cache unit, the instruction cache unit can request to acquire the instruction from the memory (not shown in the figure) and return the instruction to the scheduler. After obtaining the instruction, the scheduler sends the instruction to an Execution Unit (EU) for execution.
Typically, a set of EUs can support multiple sets of thread data corresponding to an instruction to execute in parallel, e.g., 16 sets, 32 sets, etc. Conventional arithmetic, logical instructions are directly executed in an arithmetic logic unit (arithmetic and logic unit, ALU) in EUs to complete, and the result of the operation is written to a register file. Memory access instructions and texture instructions need to be completed by special coprocessors (e.g. LS units and TUs) which may be integrated in the EU or may be set independently of the EU, only the case of coprocessors independent of the EU being schematically shown in fig. 1. The LS unit has the main function of retrieving data from memory and then returning to the register file for operation of subsequent instructions. The LS unit also supports the storage of data into memory. And TU acquires texture data from the memory, then filters according to a preset mode, and returns to the register file after finishing the filtering. TU and LS units typically include a first level cache within them for temporarily storing data to increase hit rates. Texture instructions and memory access instructions in each GPC may also be coupled to the secondary cache via an XBar bus (also known as an interconnect) where data that misses in the primary cache may be read into the secondary cache. If the secondary cache is not hit, the data is read to the system cache or the memory.
In GPU, a loader program typically determines the number of register allocations that the loader program uses at compile time. The total capacity of the register file in the GPC is fixed, e.g., 32KB. N warp concurrent executions are typically supported in GPC, but due to some limitations of resources, e.g., register space, N warp are not always concurrent in GPC. When a single loader occupies too much register space, the loader occupation of other warp programs is limited, so that the number of warp concurrency is reduced. Increasing the size of the register file is one of the most straightforward ways to mitigate, but also means greater power consumption, which has a significant impact on the mobile end.
By way of example, the main function of the program code is to perform gaussian filtering, as shown in the following program code:
the program code reads texture data from the memory to the register by texture coordinates and multiplies the texture data by the weights. The N texture data are obtained by using different offsets (offsets). And outputting the result after the N data weighted average. A striped instruction return value typically contains 4-channel color RGBA, requiring 4 registers to be occupied. An N stripe instruction would require 4*N registers. When N is large, the number of warp concurrency in GPC is greatly limited.
2. TU (TU)
The main functions of TU include: and receiving a texture instruction and operands (texture coordinate information and the like) thereof issued by the scheduler, and simultaneously reading and filtering texture data according to the texture state set by software.
Referring to fig. 2, TUs may be conceptually divided into the following unit modules: a texture state unit (texture state unit, TSU), a texture pre-process unit (texture pre-process unit), a level of detail (LOD) calculation unit (calculation unit), a sample point generation and boundary crossing processing unit (sample generation and addressing unit), a texture cache control unit (texture cache control unit) and a texture filtering unit (texture filtering unit).
The main function of the TSU is, among other things, to store texture descriptors.
Game developers typically set specific states in texture descriptors of GPU applications according to requirements, indicating information such as size, format, filtering mode, out-of-range processing mode, etc. of textures. The GPU driver applies for storage space in the memory and stores the corresponding texture descriptor. At the same time, the address of the texture descriptor in the memory is passed to the GPU hardware. When the GPU issues a task, a texture descriptor for indicating a texture state is sent to the TSU in the TU, where the texture state may include a size, a format, a memory address, a cross-border setting, a filtering mode, and the like, so as to complete the state setting. After the TSU completes the state set, the scheduler may issue a texture instruction to the TU.
The texture preprocessing unit mainly comprises range conversion (range conversion) logic (for example, from [ -1.0,1.0] coordinate conversion to [0,1.0] space, etc.), request splitting loop control (request loop control) logic, etc., and can be used for completing coordinate conversion of cube (cube) texture, preprocessing of perspective texture coordinates, splitting function of partial texture instruction, etc.
When three-linear filtering (tri-linear filtering) is used in the process of performing texture mapping, the texture preprocessing unit has different requirements on resolution of texture pictures when the distance between an object and a viewpoint is different. For example, the material details of the surface of an object eventually blur as the object moves away from the viewpoint, and finally there is only one pixel. Before filtering, a user typically performs a texture mapping (Mipmap) texture algorithm, that is, downsampling the texture before tri-linear filtering is used, and downsampling the most clear original size of the texture multiple times to obtain textures of 64x64, 32x32, 16x16,8x8,4x4,2x2,1x1 sizes, where the combination of textures of different sizes is called Mipmap. The Mipmap may be pre-stored in memory to select an appropriate texture size for texture mapping based on the distance of the object. The appropriate texture size may be referred to herein as LOD, i.e., LOD represents the miplayer that needs to be selected. The LOD may also be a non-integer, for example lod=1.3, which represents that two mips of the mips 1 and 2 need to be selected, and then a weighted average is performed according to the distance between the LOD value and the two mips to obtain a two-layer sampling result, which generates a linear interpolation in the third dimension (a linear interpolation between the two mips).
After the texture preprocessing unit finishes preprocessing, the texture instruction enters an LOD calculation unit to calculate an LOD value corresponding to the texture instruction. The LOD calculation unit performs LOD calculation using texture coordinates. The LOD calculation unit includes processing function modules of various special functions, such as inversion and the like. Typically one 2x2 (4) instructions will combine to calculate the LOD value. After LOD calculation is completed, the texture instruction enters a sampling point generation and boundary crossing processing unit.
The sampling point generation and boundary crossing processing unit selects a proper Mipmap level size according to the LOD value corresponding to the texture instruction, then calculates the position of the texture coordinate in the texture, and simultaneously carries out boundary crossing processing. For bilinear filtering, typically one texture instruction will produce 4 texel samples, while for near point filtering one texture instruction will produce one sample. For out-of-range sampling points, according to a descriptor preset by a user in a texture state unit, determining what out-of-range processing means is adopted, and usually, there are a repetition mode, a mirror mode, a clamping mode and the like. The sampling point generation and crossing processing unit comprises split loop control (loop control) logic and addition (offset order) logic of an anisotropic filter coordinate point.
The sampling point texel coordinates enter a texture buffer control unit, and the texture buffer control unit searches whether the needed sampling point corresponding texel exists in a texture buffer. If so, the hit is considered, and the texture cache control unit directly reads texture data from the texture cache and provides the texture data to the texture filtering unit for filtering. Otherwise, the texture data is required to be returned from the external secondary cache or the internal memory and then provided to the texture filtering unit for filtering.
The texture filtering unit performs weighted average processing according to the weight of each texel, or performs minimum/maximum processing for a plurality of texels, or direct output processing for adjacent filtering, or the like. For example, referring to fig. 3, when Filtering is performed by Bi-linear Filtering, the position of the T point represents the sampling point of one texel of the target texture, and T0, T1, T2, and T3 represent 4 texels around the sampling point T. To determine the color value of the sampling point T, the Filtering results of the texels T0 and T1 relative to the sampling point T and the Filtering results of the texels T2 and T3 relative to the sampling point T are determined by bilinear difference, and the final Filtering result of Bi-linear Filtering is determined according to the Filtering results of the two times, namely the color value of the sampling point T. The calculation process can be as follows:
The filtering result T01 of texel T0 and texel T1 with respect to the sampling point T can be expressed as:
T01=Lerp(s,t0,t1)=t0+s*(t1–t0);
wherein t0 represents a color value corresponding to texel t0, and t1 represents a color value corresponding to texel t 1; s represents the weight of texel T0 or T2 relative to the sampling point T;
the filtering result T23 of texel T2 and texel T3 with respect to the sampling point T can be expressed as:
T23=Lerp(s,t2,t3)=t2+s*(t3–t2);
wherein t2 represents a color value corresponding to texel t2, and t3 represents a color value corresponding to texel t 3;
thus, the color value corresponding to the sampling point T can be expressed as: lerp (T, T01, T23). Where T represents the weight value of T01 relative to the sampling point T.
And the texture filtering unit finally sends the filtered result to the loader task execution unit.
Although various operation and control units are already included in the TU, these units are only used when a small amount of texture instructions are used, and conventional arithmetic is still mainly completed based on the ALU in the EU, in this case, the ALU generally needs to load data from the memory into the register through the texture unit and the memory access unit and then perform operation, which causes problems of increasing the number of occupied registers, increasing the number of times of reading and writing the register, and the like, so that the GPU power consumption is greatly increased and the GPU performance is reduced.
The present application provides a graphics processing method and apparatus, which define a texture instruction extension bit, so that a texture instruction can make full use of an internal logic unit of a texture unit, and finish texture sampling, and at the same time, finish related calculations including but not limited to texture coordinate, such as circulation of the texture instruction, weighted average operation of texture filtering results, maximum/minimum value calculation, and the like. The scheme utilizes the thought of 'approaching' memory calculation, and can reduce the number occupied by the general registers, thereby greatly increasing the number of warp concurrency, improving the GPU performance, greatly reducing the number of ALU instructions, reducing the times of reading and writing the general registers by the ALU, and reducing the GPU power consumption. The scheme is also applicable to other memory access instructions and hardware units in the GPU, which is not limited in this application. The method and the device are based on the same technical conception, and because the principle of solving the problems by the method and the device is similar, the implementation of the device and the method can be mutually referred to, and the repeated parts are not repeated.
In addition, it is to be understood that in the description of the present application, a plurality means two or more. At least one, meaning one or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The words "first," "second," and the like are used merely for distinguishing between the descriptions and not be construed as indicating or implying a relative importance or order.
Embodiments of the present application are described in detail below with reference to the accompanying drawings.
The graphics processing principles of the present application are first described below.
In general, behavior patterns based on algorithms such as graphics, parallel computing, etc., include:
(1) Mode 0:
in the graphics algorithm, some mathematical calculation is usually used on texture coordinates to dynamically adjust the sampling position of the texture, so as to achieve the corresponding effect. Typically, these operations are performed in an EU ALU, and mathematical calculations on texture coordinates include, for example:
perspective transformation of coordinates: texcor 1/w;
Range conversion of coordinates: textacord/2.0+0.5;
offset of coordinates: corodinate+offset;
wherein texcooord represents texture coordinates, and coodinate represents the value of the texture coordinates.
(2) Mode 1:
in a graphics algorithm, texture data is typically obtained from the same location on multiple textures and then further processed. The corresponding texture instruction is as follows:
r0=texture(texture_unit0,texcoord,bias);
r1=texture(texture_unit1,texcoord,bias);
r2=texture(texture_unit2,texcoord,bias);
r3=texture(texture_unit3,texcoord,bias);
wherein r0-r3 respectively represent different returned register addresses, texture_unit0-texture_unit3 respectively represent different textures, texcord represents the same texture coordinate, and bias is the offset of LOD.
(3) Mode 2:
and capturing texture data of different offsets nearby the same texture coordinate by the same texture through a texture instruction with offset. The coordinate offset is set at offset0, offset1, offset2, offset3 in the following instructions, and the corresponding texture instruction is as follows:
r0=textureOffset(texture_unit,texcoord,offset0,bias);
r1=textureOffset(texture_unit,texcoord,offset1,bias);
r2=textureOffset(texture_unit,texcoord,offset2,bias);
r3=textureOffset(texture_unit,texcoord,offset3,bias);
wherein r0-r3 respectively represent different returned register addresses, texture offset represents an offset operation on a texture instruction, texture_unit represents the same texture, texcor represents the same texture coordinates, offset0-offset3 represents different coordinate offsets, bias is an offset of LOD.
(4) Mode 3:
Texture data return values range from [0,1.0] to [ -1.0,1.0]:
vec3r=texture(texture_unit0,texcoord,0).xyz;
r.xyz=r.xyz*2.0-1.0。
(5) Mode 4:
texture data return values, x, y component ranges are converted from [0,1.0] to [ -1.0,1.0], z=sqrt (1-dot (xy, xy)):
vec2r=texture(texture_unit0,texcoord,0).xyz;
r.xy=r.xy*2.0-1.0;
r.z=sqrt(1.0–dot(r.xy,r.xy))。
(6) Mode 5:
a block on the texture is sampled and the data is convolved. For example n=9, then the 3x3 block is convolved. The method involves adding an offset to texture coordinates, and multiplying a texture data return value by a weight to perform an accumulation operation. The multiply-accumulate mode may be as follows:
among other things, there may be variations of this multiply-accumulate mode, such as:
variant 1:
float[3]r0=texture(texture_unit0,texcoord0,0.000000).xyz;
float[3]r1=texture(texture_unit0,texcoord0,1.000000).xyz;
float[3]r2=texture(texture_unit0,texcoord0,2.000000).xyz;
float[3]r3=texture(texture_unit0,texcoord0,3.000000).xyz;
float[3]bloom=((((r0+r1)+r2)+r3)*0.250000)。
variant 2:
float[3]env_color0=texture(envmap0,reflect_vector).xyz;
float[3]env_color1=texture(envmap1,reflect_vector).xyz;
float[3]env_color=mix(env_color1.vmaps_interpolator_pad3.x)。
variant 3:
neighboors=texture(lighting_texture,texcoord+vec2(inv_resolution.x,0.0)).xyz;
neighboors+=texture(lighting_texture,texcoord-vec2(inv_resolution.x,0.0)).xyz;
neighboors+=texture(lighting_texture,texcoord+vec2(0.0,inv_resolution.y)).xyz;
neighboors+=texture(lighting_texture,texcoord-vec2(0.0,inv_resolution.y)).xyz。
(7) Mode 6:
calculating the maximum value of the texture data return value:
highp float depth00=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](0,0)).r;
highp float depth01=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](0,1)).r;
highp float depth10=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](1,0)).r;
highp float depth11=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](1,1)).r;
highp float max_depth=max(max(depth00,depth01),max(depth11,depth10))。
(8) Mode 7:
and calculating the minimum value of the texture data return value:
highp float depth00=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](0,0)).r;
highp float depth01=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](0,1)).r;
highp float depth10=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](1,0)).r;
highp float depth11=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](1,1)).r;
highp float min_depth=min(min(depth00,depth01),min(depth11,depth10))。
the inventor based on the observation of the behavior patterns of the algorithms of graphics, parallel computing and the like, has found that in order to reduce the power consumption of the GPU and improve the performance of the GPU, one possible implementation manner is to use texture units or add a small amount of logic cost to accelerate the graphics rendering algorithm and game. Specifically, the coordinate processing instruction of different modes of the texture instruction can be executed by using the texture unit, the multiple related texture instructions of different modes can be combined by using the texture unit, the processing instruction of the texture instruction return value of different modes can be executed by using the texture unit, the combined processing among the different modes can be executed by using the texture unit, and the like.
It should be understood that, in this application, the texture unit is taken as an example for convenience of description, and in the specific implementation, other memory read instructions and hardware units in the GPU may be used to implement execution of the instructions in this scheme or memory read similar to this scheme, which is not described herein again.
In this application, in order to improve the GPU performance and reduce the GPU power consumption, the texture instruction may be optimized, specifically, a format of a stripe texture instruction input to the texture unit may be defined, for example, including an original texture instruction, including a texture descriptor, a texture coordinate, an offset of LOD, and the like, and an extended bit added on the basis of the original texture instruction, where the extended bit may be used to carry a texture modifier, where the texture modifier may be used to implement an extended instruction of an associated processing instruction of the original texture instruction (such as a conventional texture instruction), so that the texture unit may additionally complete a related coordinate preprocessing operation, a merging processing operation of the instruction, a post-processing operation of a texture data return value, and the like while responding to the texture instruction to complete texture sampling. Correspondingly, the texture unit performs sampling and correlation calculation based on the input texture instruction, and finally filters texture data obtained by sampling and returns the filtered texture data to the corresponding loader task execution unit.
Illustratively, the extended modified texture instruction is shown in the following expression (1):
r0=texture. Extension bit (texture_unit0, texture) 1
Where r0 represents a return register address, and 0 may be replaced with other values to represent a different register address. texture_unit0 is a texture descriptor, and comprises the type, size, format, texture coordinate out-of-range processing mode, filtering mode and the like of the texture, and different values of 0 are used for distinguishing different texture descriptors. texcoord is texture coordinates, including normalized texture coordinates of texture samples, whose dimensions vary according to the type of texture, e.g. 2D texture, then the texture may be coordinates (s, t), including two components; in the case of a 3D texture, the texture coordinates may be (s, t, r), containing three components. bias is the offset of LOD. The "extension bit" is used to carry different texture modifiers. It should be noted that the foregoing is merely a schematic illustration of the format of the optimized texture instruction in the present application, and is not limited in any way, and in other embodiments, the optimized texture instruction may have other formats, which are not described herein.
Referring to table 1 below, three extended modifiers may be defined for a texture instruction to identify preprocessing operations for texture coordinates contained in the above 8 modes, merging operations for texture instructions by sharing texture coordinates, post-processing operations for texture data return values, and the like.
TABLE 1 texture instruction extensions
Wherein, the extensions schematically represent classifications of operations performed on texture instructions, and as an example, "preap" may be used to schematically represent preprocessing operations on texture coordinates, "ReuseMode" may be used to schematically represent merging operations on multiple identical texture instructions, "PostOp" may be used to schematically represent post-processing operations on instruction results corresponding to texture instructions, and different extension values corresponding to each extension may be used as texture modifiers to specifically indicate different operations, details of which are described in table 1 above and will not be repeated here.
Through the instruction extension shown in the above table 1, the compiler may optimize the original loader program to obtain an optimized texture instruction, so that the texture unit performs the optimized texture instruction to complete the texture sampling, and concurrently performs the related coordinate preprocessing operation, the merging operation of the texture instruction, the post-processing operation of the texture data return value, and the like. Compared with the original texture sampling function which is finished by only using the texture unit, the optimization scheme does not increase the processing time, but can greatly reduce the number of EU ALU instructions and the distribution, reading and writing of registers, thereby reducing the power consumption of the GPU and improving the performance of the GPU.
For example, as shown in table 2 below, the case of optimizing the texture instructions in the above 8 different modes based on the instruction extension of table 1 to obtain the optimized texture instructions may include:
TABLE 2 texture instruction optimization examples
After the instruction is optimized, relevant instruction parameters can be directly stored in a texture state unit through software and used as the state reading of textures.
In addition, as shown in fig. 4, advanced filter (ADF) mode logic may be added to the texture state unit to implement control over multiple modes proposed in table 1 in the present application, and ADF parameter buffer (parameter buffer) logic may be added to store relevant coefficients and indexes of Convolution (Convolution) processing and the like. And/or, with a small amount of logic cost added in the texture unit, the execution of the optimized texture instruction is realized by utilizing the internal logic unit of the texture unit. For example, a Range conversion (Range conversion) logic, a processing function module (for example, inversion (recovery) logic) of various special functions, an anisotropic addition (anisotropic offset adder) logic are utilized to complete a coordinate preprocessing instruction, a request splitting control (request Loop Control) logic is utilized to complete an instruction merging processing operation, an anisotropic splitting loop control (Anisotropic Loop control) logic is utilized to complete an accumulation loop, a multiplication accumulation (Multiply and ADD ALU and Accumulator) logic is utilized to complete a multiplication accumulation operation, a minimum/maximum (min/max) logic is utilized to complete an operation of obtaining a minimum value or a maximum value, a post-processing operation module, for example, a Range conversion (Range Conv) logic module, a square root (SQRT) calculation logic module and the like are added in a texture filtering unit, so that the instruction expansion functions such as BUMPMAP0, BUMPMAP1 and the like are realized.
Therefore, the optimized texture unit shown in fig. 4 and the texture instruction shown in the expression (1) are adopted, and compared with the texture unit shown in fig. 2, minor changes are made, so that the extended support of the optimized texture instruction can be completed by utilizing the internal logic unit of the texture unit, and the texture unit can finish texture sampling by executing the optimized texture instruction and simultaneously finish related coordinate preprocessing operation, merging processing operation of the texture instruction, post-processing operation of a texture data return value and the like. The functional implementation of the same logic units in fig. 4 and fig. 2 may be referred to the above description related to fig. 2, and will not be repeated here.
Since texture instructions are widely applied to games and graphics applications, in the application, by defining modifiers of the texture instructions, instruction expansion of the texture instructions is realized through the modifiers, so that preprocessing operation of texture coordinates, merging processing operation of the texture instructions, post-processing operation of texture data return values and combination of the modes are realized in a texture unit, thereby reducing the number of ALU instructions (including ALU operation instructions, circulation instructions and the like), reducing the number of instruction reading of instruction buffers, the number of operand reading and the like. In addition, as the sampled texture data can directly complete operation and circulation in the texture unit, the multi-stripe instruction can be prevented from occupying a plurality of registers, so that the warp concurrency quantity is improved, the ALU unit is fully utilized, and the performance of the GPU is improved. Further, because some ALU operations are completed in the texture unit instead, the ALUs do not need to frequently write texture data results into and take out of registers through the texture unit, so that the number of ALU instructions is reduced, the number of register reads and writes is reduced, and the GPU power consumption is reduced.
It should be noted that, the above scheme is only an example of "near" memory computation implemented in a texture unit, and is not limited in any way, and the concepts described above may be equally applicable to other memory access instructions and hardware units, so that based on the thought of "near" memory computation, the number occupied by the general purpose registers may be reduced, thereby greatly increasing the number of warp concurrency, improving GPU performance, and greatly reducing the number of ALU instructions, reducing the number of times that the ALU reads and writes the general purpose registers, reducing GPU power consumption, and improving GPU performance.
For ease of understanding, the graphics processing method implemented by the texture unit of FIG. 4 is illustrated below in conjunction with the flow chart of FIG. 5.
Referring to fig. 5, the graphic processing method may include the steps of:
s510: the task execution unit (e.g., a loader task execution unit) sends texture instructions to the texture unit, which in turn receives texture instructions from the task execution unit.
In this application, the texture instruction may include first indication information, where the first indication information is used to indicate that a target operation performed in a texture sampling process is required.
For example, the texture instruction may have a format shown in the foregoing expression (1), in which an extension bit is used to carry the first indication information, and other bits (bit) than the extension bit may be used to carry other information than the first indication information in the texture instruction, for example, a texture descriptor, texture coordinates, offset information, and the like that may be included in a conventional texture instruction, which are not described herein.
Optionally, before implementing S510, the texture unit may be further configured to implement the following steps: and obtaining a plurality of groups of instruction parameters corresponding to a plurality of operations and storing the plurality of groups of instruction parameters. Wherein each of the plurality of sets of instruction parameters is for being executed to implement an arithmetic process.
For example, the plurality of instruction parameters may include instruction parameters corresponding to the optimized texture instruction shown in the above table 2, where the plurality of instruction parameters may be directly stored in a texture state unit of the texture unit through software, so as to be read as a state of the texture, and complete specific state setting. Subsequently, after the texture unit receives the texture instruction, the size, format, filtering mode, out-of-range processing mode, operation processing in different modes and the like of the texture adopted by executing the texture instruction can be decided according to the descriptors preset in the texture state unit and the multiple groups of instruction parameters.
S520: and executing a texture sampling process according to the texture instruction, and executing the target operation in the texture sampling process to obtain a target sampling result.
In the present application, based on the modifier shown in table 1 and the texture instruction in different modes shown in table 2, without losing generality, the texture unit may be used to execute coordinate preprocessing instructions in different modes of the texture instruction; combining a plurality of related texture instructions in different modes by using a texture unit; executing post-processing instructions of texture data return values of different modes by using a texture unit; and executing combination processing among the different modes by using a texture unit to obtain corresponding target sampling results.
In practice, S520 may include the steps of:
s521: determining target instruction parameters corresponding to the target operation in a plurality of groups of stored instruction parameters corresponding to various operations according to the first indication information in the texture instruction, wherein the target instruction parameters are used for executing the target operation;
s522: and in the texture sampling process, executing the target operation according to the target instruction parameters to obtain the target sampling result.
In this application, the operations that can be implemented by the texture unit may be various, and the specific implementation steps of S521-S522 may be different for different operations. In a possible implementation manner, the texture instruction may further include first information, where the first information may be used to obtain first texture data corresponding to the texture instruction, the target operation may include a first target operation that needs to be performed on the first information and/or a second target operation that needs to be performed on the first texture data, and performing the target operation in a texture sampling process to obtain a target sampling result, where the target sampling result includes: in the texture sampling process, running the stored instruction parameters corresponding to the first target operation, and executing the first target operation and texture sampling on the first information to obtain the first texture data; and/or running the saved instruction parameters corresponding to the second target operation to execute the second target operation on the first texture data to obtain second texture data; and obtaining the target sampling result according to the first texture data or the second texture data.
For example, the first information includes first texture coordinates, and the first target operation includes at least one of the following: a preprocessing operation on the first texture coordinates, the preprocessing operation comprising: performing perspective transformation processing operation on the first texture coordinates and/or performing range conversion processing operation on the first texture coordinates; an offset processing operation on the first texture coordinate, wherein, in a case where the texture instruction involves a split processing, the offset processing operation on the first texture coordinate includes: and respectively carrying out offset processing operation on the plurality of sub-texture instructions obtained by splitting the texture instructions.
For example, the first texture data includes a first return value, and the second target operation includes: and a first post-processing operation on the first return value, wherein when the first texture data relates to merging processing, the first post-processing operation on the first return value comprises post-processing operations on first return values corresponding to a plurality of sub-texture instructions obtained by splitting the texture instructions, and the first post-processing operation comprises one or more of the following steps: weighted average operation, maximum value calculation, minimum value calculation, and component range conversion calculation.
Illustratively, at least two sub-texture instructions of the plurality of sub-texture instructions split by the texture instruction satisfy any one of: the at least two sub-texture instructions correspond to different textures, and the first texture coordinates are the same; the offset texture instructions corresponding to the at least two sub-texture instructions correspond to the same texture, the first texture coordinates are the same, and the first offset amounts are different; the at least two sub-texture instructions correspond to the same texture, the first texture coordinates are the same, and the first offsets are different.
Since the operations involved in the execution of the texture instruction are different in different situations, the internal logic units implementing the corresponding operations are also different, and for ease of understanding, the following description will be given for different situations.
Case one: the first target operation includes at least one of: a preprocessing operation on the first texture coordinates, the preprocessing operation comprising: performing perspective transformation processing operation on the first texture coordinates and/or performing range conversion processing operation on the first texture coordinates; and performing offset processing operation on the first texture coordinates.
Example one: the texture unit is utilized to perform different modes of coordinate preprocessing operations and/or offset processing operations of the texture instruction.
In this example one, the texture unit may perform different modes of coordinate preprocessing operations and/or offset processing operations of the texture instruction by corresponding logic units in the texture preprocessing unit, including, but not limited to, the following modes:
(1) And (3) performing perspective transformation processing on coordinates: corodinate 1/w;
(2) Coordinate range conversion processing: corodinate/2.0+0.5;
(3) Coordinate range conversion processing and perspective transformation processing: (cordinate/2.0+0.5) 1/w;
(4) And (3) coordinate offset processing: corodinate+offset;
(5) Coordinate shift processing and perspective transformation processing: (reorderate+offset) 1/w;
wherein, the chord represents the first texture coordinate in each stripe command, and the offset represents the first offset.
Taking the coordinate range conversion process as an example, as shown in fig. 6a, the graphics processing flow of the texture unit may include the following steps:
s611: the texture unit receives a texture instruction issued by the task execution unit, wherein the texture instruction carries first information, such as texture coordinates cordinate; and first indication information, such as modifiers on extension bits: pre=rc.
S612: the texture unit performs range conversion processing concerning texture coordinates in the texture coordinate preprocessing unit: coordinates/2.0+0.5.
S613: the texture unit enters a subsequent relevant logic unit of the texture unit based on the coordinate after the range conversion as the texture coordinate so as to finish texture sampling, and returns a target sampling result to the task execution unit after the sampling is finished, so that detailed implementation can be seen from the relevant description of the functional introduction of the texture unit in the foregoing, and the detailed description is omitted.
And a second case: the first target operation includes: an offset processing operation on the first texture coordinate, wherein, in a case where the texture instruction involves a split processing, the offset processing operation on the first texture coordinate includes: and respectively carrying out offset processing operation on the plurality of sub-texture instructions obtained by splitting the texture instructions. At least two sub-texture instructions in the plurality of sub-texture instructions obtained by splitting the texture instruction meet any one of the following: the at least two sub-texture instructions correspond to different textures, and the first texture coordinates are the same; the offset texture instructions corresponding to the at least two sub-texture instructions correspond to the same texture, the first texture coordinates are the same, and the first offset amounts are different; the at least two sub-texture instructions correspond to the same texture, the first texture coordinates are the same, and the first offsets are different.
Example two: a texture unit is utilized to merge a plurality of related texture instructions in different modes.
Because the compiler optimizes the original loader program based on the instruction extension shown in table 1, the loader program can combine a plurality of related texture instructions to obtain a texture instruction based on modifier extension and send the texture instruction to the texture unit. The texture unit may split the received optimized texture instruction into a corresponding plurality of related texture instructions based on the modifier, and for convenience of distinction, the plurality of related texture instructions may be referred to as a plurality of sub-texture instructions, and further, the texture instructions may respectively execute corresponding operations on each sub-texture instruction in the plurality of sub-texture instructions until the execution of the texture instructions is completed. Because a plurality of related texture instructions are used as a plurality of sub-texture instructions in a stripe management instruction, the combination processing is realized, the number of ALU instructions and the number of read-write registers can be greatly reduced, the power consumption of the GPU is reduced, and the performance of the GPU is improved.
In this example two, the texture unit may cooperatively implement a merge operation on multiple related texture instructions of different modes, including but not limited to the following modes (instruction number is not limited), based on internal logic units such as a texture state unit, a texture preprocessing unit, a LOD calculation unit, a sample point product and out-of-bounds processing unit, a texture buffer control unit, a texture filtering unit, and the like, within the texture unit:
(1) Mode 0:
r0=texture(texture_unit0,texcoord,bias);
r1=texture(texture_unit1,texcoord,bias);
r2=texture(texture_unit2,texcoord,bias);
r3=texture(texture_unit3,texcoord,bias);
the instruction indicated by r0-r3 is a plurality of sub-texture instructions corresponding to the same optimized texture instruction, and in the mode 0, the texture unit may execute the plurality of sub-texture instructions respectively until the number of read instruction merging processes reaches the upper limit of the number of corresponding merging processes, ending the processing flow of the texture instruction, and continuing to perform texture sampling to obtain texture data from the same position on the plurality of textures.
Namely, the combination operation of different texture instructions is realized through sharing of texture coordinates, so that texture data are acquired from the same positions on a plurality of textures, the number of ALU instructions, the number of occupied registers and the like are reduced, the power consumption of the GPU is reduced, and the performance of the GPU is improved.
(2) Mode 1:
r0=textureFetchOffset(texture_unit,texcoord,lod,offset0);
r1=textureFetchOffset(texture_unit,texcoord,lod,offset1);
r2=textureFetchOffset(texture_unit,texcoord,lod,offset2);
r3=textureFetchOffset(texture_unit,texcoord,lod,offset3);
the instruction shown in r0-r3 is a plurality of sub-texture instructions corresponding to the same optimized texture instruction, and in the mode 2, the texture unit may execute the plurality of sub-texture instructions respectively until the number of read instruction merging processes reaches the upper limit of the number of corresponding merging processes, ending the processing flow of the texture instruction, and continuing to sample the texture to obtain different offsets.
(3) Mode 2:
r0=textureOffset(texture_unit,texcoord,offset0,bias);
r1=textureOffset(texture_unit,texcoord,offset1,bias);
r2=textureOffset(texture_unit,texcoord,offset2,bias);
r3=textureOffset(texture_unit,texcoord,offset3,bias);
the instruction shown in r0-r3 is a plurality of sub-texture instructions corresponding to the same optimized texture instruction, and in the mode 2, the texture unit may execute the plurality of sub-texture instructions respectively until the number of read instruction merging processes reaches the upper limit of the number of corresponding merging processes, ending the processing flow of the texture instruction, and continuing to perform texture sampling to capture texture data with different offsets near the same texture coordinate.
Namely, the combination operation of different texture instructions is realized through sharing of texture coordinates, so that texture data are acquired from the same positions on a plurality of textures, the number of ALU instructions, the number of occupied registers and the like are reduced, the power consumption of the GPU is reduced, and the performance of the GPU is improved.
(4) Mode 3:
r0=gatherOffset(texture_unit,texcoord,offset0,component_index);
r1=gatherOffset(texture_unit,texcoord,offset1,component_index);
r2=gatherOffset(texture_unit,texcoord,offset2,component_index);
r3=gatherOffset(texture_unit,texcoord,offset3,component_index);
taking the above mode 2 as an example, as shown in fig. 6b, the graphics processing flow of the texture unit may include the following steps:
s621: the texture unit receives a texture instruction issued by the task execution unit, wherein the texture instruction carries first information, such as texture coordinates cordinate; and first indication information, such as modifiers on extension bits: reusemode=poluop.
S622: the texture unit reads texture descriptor information from the texture state unit.
S623: the texture unit reads a preset coordinate offset (offset) information from the texture state unit.
S624: the texture unit completes the offset operation of texture coordinates and a coordinate offset (offset) in the sampling point generation and boundary crossing processing unit.
S625: the texture unit sends the texture coordinates (which may be referred to as second texture coordinates for convenience of distinction) after the offset operation is performed to a subsequent correlation module of the texture unit, and performs a texture sampling operation.
S626: the texture unit checks the number of instruction combinations, returns to S623 to continue processing other texture instructions that are not combined when the instruction combinations are not read, otherwise, ends the processing flow, and returns a target sampling result to the task execution unit after the sampling is ended, which is not described herein.
Case three: the first texture data includes a first return value, and the second target operation includes: and a first post-processing operation on the first return value, wherein when the first texture data relates to merging processing, the first post-processing operation on the first return value comprises post-processing operations on first return values corresponding to a plurality of sub-texture instructions obtained by splitting the texture instructions, and the first post-processing operation comprises one or more of the following steps: weighted average operation, maximum value calculation, minimum value calculation, and component range conversion calculation.
Example three: the texture unit is used for executing post-processing operation of texture data return values of different modes.
In this example three, the texture unit may perform post-processing operations of the texture data return values through corresponding logic units (e.g., multiply-accumulate logic, min/max logic, post-processing operation modules, etc.) that have been stored or added in the texture filter unit, including, but not limited to, the following modes:
(1) Mode 1: range switching pattern range conversion to [ -1.0,1.0]
vec3ts_normal=texture(texture_unit3,out_texcoord0).xyz;
ts_normal.xyz=ts_normal.xyz*2.0-1.0;//component number could be.x,.xy..xyz
(2) Mode 2: x, y range conversion to [ -1.0,1.0], z=sqrt (1-dot (xy, xy))
vec2ts_normal=texture(texture_unit3,out_texcoord0).xyz;
ts_normal.xy=ts_normal.xy*2.0-1.0;
ts_normal.z=sqrt(1.0–dot(ts_normal.xy,ts_normal.xy))
(3) Mode 3: add clamp based on mode2
vec2ts_normal=texture(texture_unit3,out_texcoord0).xyz;
ts_normal.xy=ts_normal.xy*2.0-1.0;
ts_normal.z=sqrt(clamp((1.0–dot(ts_normal.xy,ts_normal.xy)),0.0,1.0))
(4) Mode 4: condition mode
(5) Mode 5: condition mode
float[3]b1=texture(texture_unit2,out_texcoord0,0.000000).xyz;
float[3]b2=texture(texture_unit2,out_texcoord0,1.000000).xyz;
float[3]b3=texture(texture_unit2,out_texcoord0,2.000000).xyz;
float[3]b4=texture(texture_unit2,out_texcoord0,3.000000).xyz;
float[3]bloom=((((b1+b2)+b3)+b4)*0.250000);
(6) Mode 6: condition mode
float[3]env_color0=texture(envmap0,reflect_vector).xyz;
float[3]env_color1=texture(envmap1,reflect_vector).xyz;
float[3]env_color=mix(env_color1.vmaps_interpolator_pad3.x);
(7) Mode 7: condition mode
neighboors=texture(lighting_texture,texcoord+vec2(inv_resolution.x,0.0)).xyz;
neighboors+=texture(lighting_texture,texcoord-vec2(inv_resolution.x,0.0)).xyz;
neighboors+=texture(lighting_texture,texcoord+vec2(0.0,inv_resolution.y)).xyz;
neighboors+=texture(lighting_texture,texcoord-vec2(0.0,inv_resolution.y)).xyz;
(8) Mode 8: max mode
highp float depth00=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](0,0)).r;
highp float depth01=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](0,1)).r;
highp float depth10=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](1,0)).r;
highp float depth11=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](1,1)).r;
highp float max_depth=max(max(depth00,depth01),max(depth11,depth10));
(9) Mode 9: min mode
highp float depth00=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](0,0)).r;
highp float depth01=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](0,1)).r;
highp float depth10=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](1,0)).r;
highp float depth11=textureLodOffset(input_texture_lod,texcoord,lod_level,int[2](1,1)).r;
highp float min_depth=min(mmin(depth00,depth01),min(depth11,depth10))。
Case four: the combination processing of the different modes involved in the above case one, case two, case three is performed using texture units.
For example, referring to table 1, the modifier in the texture instruction received by the texture unit may include any one or more of the extension values corresponding to "pre op", "ReuseMode", or "PostOp", for example, "pst=0", "polop=2", and "bumppa0=4", respectively. The texture unit can respectively execute preprocessing operation on texture coordinates, merging operation on a plurality of sub-texture instructions and range conversion operation on x and y components of an instruction result according to modifiers in the texture instructions in different modes, so that corresponding target operation is accomplished additionally while complete texture sampling is performed, the number of ALU instructions and the number occupied by registers are reduced, the power consumption of the GPU is reduced, and the performance of the GPU is improved. The detailed description of each operation can be found in the above description of the first case, the second case and the third case, and will not be repeated here.
S530: and the texture unit returns the target sampling result to the task execution unit corresponding to the texture instruction, and correspondingly, the task execution unit receives the target sampling result.
Further, the task execution unit may continue to execute the subsequent task based on the target sampling result, which is not described herein.
Therefore, by using the texture unit to execute different operation processes under different conditions, the texture unit can finish the operation such as coordinate preprocessing operation, instruction merging operation, and post-processing of texture data return value while finishing texture sampling, compared with the function of finishing original texture sampling by using the texture unit only, the processing time is not increased, but the number of EU ALU instructions and the distribution and reading and writing of registers can be greatly reduced, thereby reducing the GPU power consumption and improving the GPU performance.
It should be noted that, the graphics processing scheme of the present application may be further extended to a more general technology to implement input preprocessing of all texture instructions, merging of texture instructions, and post-processing of results of texture instructions. Similarly, the scheme can be applied to other memory reading instructions and hardware units in other GPUs, so that the number occupied by the general registers can be reduced through the thought of adjacent memory calculation, the number of warp concurrency is greatly increased, the performance of the GPU is improved, the number of ALU instructions is greatly reduced, the times of ALU reading and writing the general registers are reduced, the power consumption of the GPU is reduced, and the description is omitted here.
Based on the same technical concept, the embodiments of the present application also provide a graphic processing apparatus, as shown in fig. 7, the graphic processing apparatus 700 may include a communication unit 701 and a processing unit 702. The communication unit 701 is configured to obtain a texture instruction, where the texture instruction includes first indication information, and the first indication information is used to indicate a target operation that needs to be executed in a texture sampling process; the processing unit 702 is configured to execute the texture sampling process according to the texture instruction, and execute the target operation in the texture sampling process, so as to obtain a target sampling result.
In one possible implementation, the processing unit 702 is configured to: determining target instruction parameters corresponding to the target operation in a plurality of groups of stored instruction parameters corresponding to various operations according to the first indication information in the texture instruction, wherein the target instruction parameters are used for executing the target operation; and in the texture sampling process, executing the target operation according to the target instruction parameter to obtain the target sampling result.
In a possible implementation, the texture instruction further includes first information, where the first information is used to obtain first texture data corresponding to the texture instruction, the target operation includes a first target operation that needs to be performed on the first information and/or a second target operation that needs to be performed on the first texture data, and the processing unit is configured to: executing the target operation in the texture sampling process to obtain a target sampling result, wherein the target operation comprises the following steps: in the texture sampling process, running the stored instruction parameters corresponding to the first target operation, and executing the first target operation and texture sampling on the first information to obtain the first texture data; and/or running the saved instruction parameters corresponding to the second target operation to execute the second target operation on the first texture data to obtain second texture data; and obtaining the target sampling result according to the first texture data or the second texture data.
In one possible implementation, the first information includes first texture coordinates, and the first target operation includes at least one of: a preprocessing operation on the first texture coordinates, the preprocessing operation comprising: performing perspective transformation processing operation on the first texture coordinates and/or performing range conversion processing operation on the first texture coordinates; an offset processing operation on the first texture coordinate, wherein, in a case where the texture instruction involves a split processing, the offset processing operation on the first texture coordinate includes: and respectively carrying out offset processing operation on the plurality of sub-texture instructions obtained by splitting the texture instructions.
In one possible implementation, the first texture data includes a first return value, and the second target operation includes: and a first post-processing operation on the first return value, wherein when the first texture data relates to merging processing, the first post-processing operation on the first return value comprises post-processing operations on first return values corresponding to a plurality of sub-texture instructions obtained by splitting the texture instructions, and the first post-processing operation comprises one or more of the following steps: weighted average operation, maximum value calculation, minimum value calculation, and component range conversion calculation.
In one possible implementation, at least two sub-texture instructions of the plurality of sub-texture instructions split by the texture instruction satisfy any one of: the at least two sub-texture instructions correspond to different textures, and the first texture coordinates are the same; the offset texture instructions corresponding to the at least two sub-texture instructions correspond to the same texture, the first texture coordinates are the same, and the first offset amounts are different; the at least two sub-texture instructions correspond to the same texture, the first texture coordinates are the same, and the first offsets are different.
Fig. 8 is a schematic diagram of a communication device according to an embodiment of the present application. The communication device has a structure as shown in fig. 8, and includes a processor 801 and a memory 802. The memory has stored therein one or more computer programs, the one or more computer programs comprising instructions; when the processor invokes the instruction, the communication apparatus is caused to perform the method provided in the above embodiment and the embodiment, and the functions of each unit device of the communication apparatus are described below.
The processor 801 and the memory 802 are connected to each other through a bus 803. The bus 803 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 8, but not only one bus or one type of bus.
The memory 802 has stored therein one or more computer programs, including instructions. The memory 802 may include random access memory (random access memory, RAM) and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. The processor 801 executes program instructions in the memory 802 and uses the data stored in the memory 802 to implement the functions described above, thereby implementing the methods provided in the above embodiments.
It is to be appreciated that the memory 802 in fig. 8 of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
It should be noted that, in the above embodiments of the present application, the division of the modules is merely schematic, and there may be another division manner in actual implementation, and in addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or may exist separately and physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution, in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Based on the above embodiments, the present application further provides a computer program, which when run on a computer causes the computer to perform the method provided by the above embodiments.
Based on the above embodiments, the present application further provides a computer-readable storage medium having stored therein a computer program, which when executed by a computer, causes the computer to perform the method provided in the above embodiments.
Wherein a storage medium may be any available medium that can be accessed by a computer. Taking this as an example but not limited to: the computer readable medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
Based on the above embodiments, the present application further provides a chip, where the chip is coupled to the memory, and the chip is configured to read the computer program stored in the memory, so as to implement the method provided in the above embodiments.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
The various illustrative logical blocks and circuits described in the embodiments of the present application may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the general purpose processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.
The steps of a method or algorithm described in the embodiments of the present application may be embodied directly in hardware, in a software element executed by a processor, or in a combination of the two. The software elements may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. In an example, a storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may reside in a terminal device. In the alternative, the processor and the storage medium may reside in different components in a terminal device.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the invention has been described in connection with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made without departing from the spirit and scope of the invention. Accordingly, the specification and drawings are merely exemplary illustrations of the present invention as defined in the appended claims and are considered to cover any and all modifications, variations, combinations, or equivalents that fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (14)

  1. A method of graphics processing, comprising:
    obtaining a texture instruction, wherein the texture instruction comprises first indication information, and the first indication information is used for indicating target operation required to be executed in a texture sampling process;
    and executing the texture sampling process according to the texture instruction, and executing the target operation in the texture sampling process to obtain a target sampling result.
  2. The method of claim 1, wherein performing the texture sampling process according to the texture instruction and performing the target operation during the texture sampling process to obtain a target sampling result comprises:
    determining target instruction parameters corresponding to the target operation in a plurality of groups of stored instruction parameters corresponding to various operations according to the first indication information in the texture instruction, wherein the target instruction parameters are used for executing the target operation;
    and in the texture sampling process, executing the target operation according to the target instruction parameter to obtain the target sampling result.
  3. The method according to claim 2, wherein the texture instruction further includes first information, the first information is used for obtaining first texture data corresponding to the texture instruction, the target operation includes a first target operation that needs to be performed on the first information and/or a second target operation that needs to be performed on the first texture data, and performing the target operation in a texture sampling process to obtain a target sampling result includes:
    In the texture sampling process, running the stored instruction parameters corresponding to the first target operation, and executing the first target operation and texture sampling on the first information to obtain the first texture data; and/or running the saved instruction parameters corresponding to the second target operation to execute the second target operation on the first texture data to obtain second texture data;
    and obtaining the target sampling result according to the first texture data or the second texture data.
  4. The method of claim 3, wherein the first information includes first texture coordinates therein, and wherein the first target operation includes at least one of:
    a preprocessing operation on the first texture coordinates, the preprocessing operation comprising: performing perspective transformation processing operation on the first texture coordinates and/or performing range conversion processing operation on the first texture coordinates;
    an offset processing operation on the first texture coordinate, wherein, in a case where the texture instruction involves a split processing, the offset processing operation on the first texture coordinate includes: and respectively carrying out offset processing operation on the plurality of sub-texture instructions obtained by splitting the texture instructions.
  5. The method of claim 3 or 4, wherein the first texture data comprises a first return value, and the second target operation comprises:
    and a first post-processing operation on the first return value, wherein when the first texture data relates to merging processing, the first post-processing operation on the first return value comprises post-processing operations on first return values corresponding to a plurality of sub-texture instructions obtained by splitting the texture instructions, and the first post-processing operation comprises one or more of the following steps: weighted average operation, maximum value calculation, minimum value calculation, and component range conversion calculation.
  6. The method of claim 4 or 5, wherein at least two sub-texture instructions of the plurality of sub-texture instructions split by the texture instruction satisfy any one of:
    the at least two sub-texture instructions correspond to different textures, and the first texture coordinates are the same;
    the offset texture instructions corresponding to the at least two sub-texture instructions correspond to the same texture, the first texture coordinates are the same, and the first offset amounts are different;
    the at least two sub-texture instructions correspond to the same texture, the first texture coordinates are the same, and the first offsets are different.
  7. A graphics processing apparatus, comprising:
    the communication unit is used for acquiring a texture instruction, wherein the texture instruction comprises first indication information, and the first indication information is used for indicating target operation which needs to be executed in a texture sampling process;
    and the processing unit is used for executing a texture sampling process according to the texture instruction and executing the target operation in the texture sampling process to obtain a target sampling result.
  8. The apparatus of claim 7, wherein the processing unit is configured to:
    determining target instruction parameters corresponding to the target operation in a plurality of groups of stored instruction parameters corresponding to various operations according to the first indication information in the texture instruction, wherein the target instruction parameters are used for executing the target operation;
    and in the texture sampling process, executing the target operation according to the target instruction parameters to obtain the target sampling result.
  9. The apparatus according to claim 8, wherein the texture instruction further includes first information, the first information is used to obtain first texture data corresponding to the texture instruction, the target operation includes a first target operation that needs to be performed on the first information and/or a second target operation that needs to be performed on the first texture data, and the processing unit is configured to: executing the target operation in the texture sampling process to obtain a target sampling result, wherein the target operation comprises the following steps:
    In the texture sampling process, running the stored instruction parameters corresponding to the first target operation, and executing the first target operation and texture sampling on the first information to obtain the first texture data; and/or running the saved instruction parameters corresponding to the second target operation to execute the second target operation on the first texture data to obtain second texture data;
    and obtaining the target sampling result according to the first texture data or the second texture data.
  10. The apparatus of claim 9, wherein the first information includes first texture coordinates, and wherein the first target operation includes at least one of:
    a preprocessing operation on the first texture coordinates, the preprocessing operation comprising: performing perspective transformation processing operation on the first texture coordinates and/or performing range conversion processing operation on the first texture coordinates;
    an offset processing operation on the first texture coordinate, wherein, in a case where the texture instruction involves a split processing, the offset processing operation on the first texture coordinate includes: and respectively carrying out offset processing operation on the plurality of sub-texture instructions obtained by splitting the texture instructions.
  11. The apparatus of claim 9 or 10, wherein the first texture data comprises a first return value, and the second target operation comprises:
    and a first post-processing operation on the first return value, wherein when the first texture data relates to merging processing, the first post-processing operation on the first return value comprises post-processing operations on first return values corresponding to a plurality of sub-texture instructions obtained by splitting the texture instructions, and the first post-processing operation comprises one or more of the following steps: weighted average operation, maximum value calculation, minimum value calculation, and component range conversion calculation.
  12. The apparatus of claim 10 or 11, wherein at least two sub-texture instructions of the plurality of sub-texture instructions split by the texture instruction satisfy any one of:
    the at least two sub-texture instructions correspond to different textures, and the first texture coordinates are the same;
    the offset texture instructions corresponding to the at least two sub-texture instructions correspond to the same texture, the first texture coordinates are the same, and the first offset amounts are different;
    the at least two sub-texture instructions correspond to the same texture, the first texture coordinates are the same, and the first offsets are different.
  13. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when run on a computer, causes the computer to perform the method of any of claims 1-6.
  14. A computer program product, characterized in that the computer program product, when run on a computer, causes the computer to perform the method of any of claims 1-6.
CN202180099645.3A 2021-06-22 2021-06-22 Graphic processing method and device Pending CN117546199A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/101602 WO2022266851A1 (en) 2021-06-22 2021-06-22 Graphics processing method and apparatus

Publications (1)

Publication Number Publication Date
CN117546199A true CN117546199A (en) 2024-02-09

Family

ID=84543840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180099645.3A Pending CN117546199A (en) 2021-06-22 2021-06-22 Graphic processing method and device

Country Status (2)

Country Link
CN (1) CN117546199A (en)
WO (1) WO2022266851A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0818279D0 (en) * 2008-10-06 2008-11-12 Advanced Risc Mach Ltd Graphics processing systems
US9659341B2 (en) * 2014-06-25 2017-05-23 Qualcomm Incorporated Texture pipe as an image processing engine
US9905040B2 (en) * 2016-02-08 2018-02-27 Apple Inc. Texture sampling techniques
GB2566468B (en) * 2017-09-13 2020-09-09 Advanced Risc Mach Ltd Graphics processing

Also Published As

Publication number Publication date
WO2022266851A1 (en) 2022-12-29

Similar Documents

Publication Publication Date Title
US9947084B2 (en) Multiresolution consistent rasterization
US9286647B2 (en) Pixel shader bypass for low power graphics rendering
JP7253488B2 (en) Composite world-space pipeline shader stage
JP6918919B2 (en) Primitive culling with an automatically compiled compute shader
CN106575430B (en) Method and apparatus for pixel hashing
JP7282675B2 (en) Out of order cash return
KR102266962B1 (en) Compiler-assisted technologies to reduce memory usage in the graphics pipeline
US9558573B2 (en) Optimizing triangle topology for path rendering
US9659402B2 (en) Filtering multi-sample surfaces
JP2023525725A (en) Data compression method and apparatus
CN108292426B (en) Partial span based rasterization
CN117546199A (en) Graphic processing method and device
US20220309606A1 (en) Dynamically reconfigurable register file
US20210304488A1 (en) Sampling for partially resident textures
US11656877B2 (en) Wavefront selection and execution
US20240202862A1 (en) Graphics and compute api extension for cache auto tiling
KR20230162023A (en) Synchronized free cross-pass binning with sub-pass interleaving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination