CN116467227A - TMU system and operation optimization method thereof - Google Patents
TMU system and operation optimization method thereof Download PDFInfo
- Publication number
- CN116467227A CN116467227A CN202310723241.8A CN202310723241A CN116467227A CN 116467227 A CN116467227 A CN 116467227A CN 202310723241 A CN202310723241 A CN 202310723241A CN 116467227 A CN116467227 A CN 116467227A
- Authority
- CN
- China
- Prior art keywords
- texture
- data
- memory
- access
- mapping unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 31
- 238000005457 optimization Methods 0.000 title claims description 10
- 238000013507 mapping Methods 0.000 claims abstract description 94
- 238000013075 data extraction Methods 0.000 claims abstract description 49
- 238000013500 data storage Methods 0.000 claims abstract description 14
- 230000006837 decompression Effects 0.000 claims abstract description 10
- 238000004891 communication Methods 0.000 claims description 22
- 238000000605 extraction Methods 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 6
- 230000001960 triggered effect Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Image Generation (AREA)
Abstract
The embodiment of the invention discloses a TMU system, which comprises: a processor core, a data memory, a cache memory, and a texture mapping unit; the processor core configures a texture mapping unit according to the first configuration information, the texture mapping unit analyzes the first configuration information, generates a first access request and sends the first access request to the cache memory; the cache memory accesses texture data in the data memory according to the first access request, generates an access result and returns the access result to the texture mapping unit; the texture mapping unit generates an end signal according to the access result and sends the end signal to the processor core to end data extraction. The uncompressed texture data corresponding to the texture pixels are directly obtained from the data memory through the cache memory, and the decompression operation and the data storage operation are not needed because the corresponding texture data in the data memory are directly called, so that the efficiency of the TMU system is higher, the occupancy rate of the memory is reduced, and the aim of optimizing the performance is fulfilled.
Description
Technical Field
The present invention relates to the field of graphics processing technologies, and in particular, to a TMU system and an operation optimization method for the TMU system.
Background
The texture mapping unit (Texture mapping unit, TMU) is a component of modern graphics processors (Graphics Processing Unit, GPUs) that are capable of rotating, resizing, and warping bitmap images for placement as textures on any plane of a given 3D model, a process known as texture mapping. The texture mapping unit occurs due to the computational requirements of sampling and converting the planar image (as a texture map) to the correct angles and perspective needed in 3D space. The texture mapping unit is part of the shader and is separate from the rendering output unit (ROP).
In the TMU system, the texture mapping unit is used for calculating a storage address corresponding to texture value data of the texture pixels in the memory according to information configured by the processor core, and storing the texture value data of the texture pixels in the data memory after the texture data of the texture pixels are read, and when the TMU system has the problems of repeated operation and high storage space occupation rate.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a TMU system to solve the problems of repeated computation and high storage space occupation in the existing TMU system.
To achieve the above object, a first aspect of the present application provides a TMU system, including: a processor core, a data memory, a cache memory, and a texture mapping unit; the processor core is in communication with the data store, the cache memory, and the texture mapping unit, respectively; the cache memory is in communication with the data memory and the texture mapping unit, respectively;
the processor core is used for acquiring first configuration information according to the data extraction instruction when receiving the data extraction instruction, and configuring the texture mapping unit according to the first configuration information, wherein the first configuration information at least comprises texture coordinates of texture pixels to be processed;
the texture mapping unit is used for analyzing the first configuration information, generating a first access request and sending the first access request to the cache;
the cache memory is used for accessing texture data corresponding to the texture coordinates in the data memory if the data memory is determined to be accessed according to the first access request, generating an access result, and returning the access result to the texture mapping unit;
The data memory is used for storing uncompressed texture data;
the texture mapping unit is further configured to generate an end signal when the received access result is that texture data corresponding to the texture coordinates is accessed, and send the end signal to the processor core to end data extraction.
Further, the texture mapping unit is specifically configured to:
calculating a data storage address according to the texture coordinates to obtain a first storage address, wherein the first storage address is an address of uncompressed texture data corresponding to the texture coordinates;
generating a first flag signal based on the first deposit address, and generating the first access request at least comprising the first deposit address and the first flag signal, wherein the first flag signal is used for determining that an access object is the data memory by the cache memory.
Further, the cache memory is specifically configured to:
determining that an access object is the data memory based on the first flag signal;
and accessing the first deposit address in the data memory and generating an access result.
Further, the accessing the first storage address in the data memory and generating an access result specifically includes:
Determining, in the data store, whether the first deposit address contains uncompressed first texture data;
if the first storage address contains the first texture data, generating a first access result, wherein the first access result indicates that uncompressed texture data corresponding to the texture coordinates are accessed in the data memory;
and if the first storage address does not contain the first texture data, generating a second access result, wherein the second access result indicates that uncompressed texture data corresponding to the texture coordinates are not accessed in the data memory.
Further, the TMU system further includes a memory, the memory having a communication connection with the cache memory for storing compressed texture data, the access object further including the memory, the cache memory further being configured to send the second access result to the texture mapping unit;
the texture mapping unit is further configured to obtain a second access request after receiving the second access result, and send the second access request to the cache memory, where the second access request includes at least a second storage address and a second flag signal, the second storage address is an address of compressed texture data corresponding to the texture coordinate, and the second flag signal is used by the cache memory to determine that an access object is the memory;
When the second flag signal indicates that the access object is the memory, the cache memory is further configured to access the second storage address in the memory after receiving a second access request, extract compressed second texture data corresponding to the second storage address, and send the second texture data to the texture mapping unit;
the texture mapping unit is further configured to decompress the second texture data to obtain decompressed third texture data, write the third texture data into the data memory for storage, and generate a third access result, where the third access result indicates that extraction of texture data corresponding to the texture coordinates is completed.
Further, the first configuration information further includes an ID of the thread, and the end signal includes an operation end signal and an ID number;
the processor core is further configured to create a target thread when receiving a data extraction instruction, obtain first configuration information through the target thread, and configure the texture mapping unit according to the first configuration information;
the texture mapping unit is specifically further configured to generate the ID number according to the ID of the target thread, generate the operation end signal according to the received first access result or third access result, and send the ID number and the operation end signal to the processor core;
The processor core is further configured to send the operation end signal to a target thread corresponding to the ID number, and end a data extraction operation of the target thread.
To achieve the above object, a second aspect of the present application provides an operation optimization method of a TMU system, where the method is applied to the TMU system, and the TMU system includes: a processor core, a data memory, a cache memory, and a texture mapping unit; the processor core is in communication with the data store, the cache memory, and the texture mapping unit, respectively; the cache memory having a communication connection with the data memory and the texture mapping unit, respectively, the method comprising:
analyzing first configuration information, generating a first access request, and sending the first access request to the cache memory, wherein the first configuration information is acquired and configured by the processor core according to a data extraction instruction when the processor core receives the data extraction instruction, the first configuration information at least comprises texture coordinates of texture pixels to be processed, and the cache memory is used for accessing uncompressed texture data corresponding to the texture coordinates in the data memory and generating an access result if the data memory is determined to be accessed according to the first access request, and returning the access result to the texture mapping unit;
And when the received access result is that the texture data corresponding to the texture coordinates are accessed, generating an end signal, and sending the end signal to the processor core to end data extraction.
Further, the parsing the first configuration information to generate a first access request specifically includes:
calculating a data storage address according to the texture coordinates to obtain a first storage address, wherein the first storage address is an address of uncompressed texture data corresponding to the texture coordinates;
generating a first flag signal based on the first deposit address, and generating the first access request at least comprising the first deposit address and the first flag signal, wherein the first flag signal is used for determining that an access object is the data memory by the cache memory.
Further, the TMU system further includes a memory in communication with the cache memory for storing compressed texture data, the access object further including the memory;
after the received access result is that the texture data is not accessed in the data memory, the method further comprises:
Acquiring a second access request and sending the second access request to the cache memory, wherein the second access request at least comprises a second deposit address and a second flag signal, the second deposit address is an address of compressed texture data corresponding to the texture coordinates, the second flag signal is used for the cache memory to determine that an access object is the memory, the cache memory is triggered to access the second deposit address in the memory, and compressed second texture data corresponding to the second deposit address is extracted and sent to the texture mapping unit;
acquiring the second texture data, and performing decompression operation on the second texture data to obtain decompressed third texture data;
and writing the third texture data into the data memory for storage, and generating a third access result, wherein the third access result indicates that the extraction of the texture data corresponding to the texture coordinates is completed.
Further, the first configuration information further includes an ID of the thread, and the end signal includes an operation end signal and an ID number;
and when the received access result is that the texture data corresponding to the texture coordinates is accessed, generating an end signal, and sending the end signal to the processor core to end data extraction, wherein the method specifically comprises the following steps of:
Generating the ID number according to the ID of a target thread, and generating the operation ending signal according to the access result, wherein the target thread is created by the processor core when receiving a data extraction instruction;
and sending the ID number and the operation ending signal to the processor core, wherein the processor core is used for sending the operation ending signal to a target thread corresponding to the ID number and ending the data extraction of the target thread.
The embodiment of the invention has the following beneficial effects:
the invention discloses a TMU system, which comprises: a processor core, a data memory, a cache memory, and a texture mapping unit; the processor core is in communication with the data memory, the cache memory, and the texture mapping unit, respectively; the cache memory is in communication connection with the data memory and the texture mapping unit, respectively; the processor core is used for acquiring first configuration information according to the data extraction instruction when the data extraction instruction is received, and configuring the texture mapping unit according to the first configuration information, wherein the first configuration information at least comprises texture coordinates of texture pixels to be processed; the texture mapping unit is used for analyzing the first configuration information, generating a first access request and sending the first access request to the cache; the cache memory is used for accessing texture data corresponding to texture coordinates in the data memory if the data memory is determined to be accessed according to the first access request, generating an access result, and returning the access result to the texture mapping unit; the data memory is used for storing uncompressed texture data; the texture mapping unit is further configured to generate an end signal when the received access result is that texture data corresponding to the texture coordinates is accessed, and send the end signal to the processor core to end data extraction. According to the method, uncompressed texture data corresponding to the texture pixels to be processed are obtained from the data memory through the cache memory, if the data memory contains the uncompressed texture data corresponding to the texture pixels to be processed, the uncompressed texture data can be directly called, decompression operation and data storage operation are not needed, so that the efficiency of the TMU system is higher, the occupancy rate of the memory is reduced, and the purpose of optimizing the performance is achieved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Wherein:
FIG. 1a is a block diagram illustrating the coordinate positions of texels corresponding to texture data to be extracted during a first data extraction operation initiated by a processor core;
FIG. 1b is a block diagram illustrating the coordinate positions of texels corresponding to texture data to be extracted during a second data extraction operation initiated by a processor core;
FIG. 1c is a superposition of two data fetch operations initiated by a processor core;
FIG. 2 is a block diagram of a TMU system of an embodiment of the invention;
FIG. 3 is a block diagram of a refined TMU system of an embodiment of the invention;
fig. 4 is a flow chart of an operation optimization method of a TMU system according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the TMU system, the flow of data extraction is approximately: the processor checks the texture mapping unit to carry out information configuration, the texture mapping unit analyzes the configured information to generate an access address, extracts compressed data from the memory, decompresses the obtained compressed data to obtain decompressed data, and stores the decompressed data. If the data to be extracted by the plurality of operations is identical or partially identical, a plurality of repeated operations and repeated storage operations may occur. For example, when fig. 1a is a first data extraction operation initiated by a processor core, the coordinate positions of the texels corresponding to the texture data to be extracted are shown as 4×4 texels of block0, and fig. 1b is a second data extraction operation initiated by the processor core, the coordinate positions of the texels corresponding to the texture data to be extracted are shown as 1×4 texels of block0, and 3×4 texels of block1, so that a repeated portion of the two extraction operations may be obtained, as shown in fig. 1c, and then, in the process of two times of extraction, 1×4 texels of the overlapped portion undergo two times of extraction of compressed data in the memory, and perform data decompression calculation, and two times of storage operations, so that there is a problem that the operation is repeated and the storage space occupation rate is high in the TMU system.
Based on this, the TMU system provided in the embodiment of the present invention may solve the problems existing in the TMU system, referring specifically to fig. 2, fig. 2 is a block diagram of a TMU system according to an embodiment of the present invention, where the TMU system includes: a processor core 110, a data memory 140, a cache memory 130, and a texture mapping unit 120; processor core 110 has a communication connection with data store 140, cache 130, and texture mapping unit 120, respectively; the cache memory 130 is in communication with the data memory 140 and the texture mapping unit 120, respectively. Specifically, with the processor core 110 as a master, the data store 140, the cache memory 130, and the texture mapping unit 120 as slaves, communication between the master and the slaves may be performed, optionally, via an AXI.
The processor core 110 is configured to, when receiving the data extraction instruction, obtain first configuration information according to the data extraction instruction, and configure the texture mapping unit 120 according to the first configuration information, where the first configuration information at least includes texture coordinates of a texture pixel to be processed.
Specifically, upon receipt of the data fetch instruction by the processor core 110, the texture coordinates of the texels to be processed are read in accordance with the data fetch instruction, wherein typically the texture coordinates of the texels are represented using a two-dimensional variable (u, v), where u is the transverse coordinate and v is the longitudinal coordinate, and thus the texture coordinates are also referred to as uv coordinates. The texture mapping unit 120 is configured with the texture coordinates as the first configuration information after the processor core 110 reads the texture coordinates, and it is understood that the internal registers are included in the texture mapping unit 120, so that the processor core 110 configures the internal registers of the texture mapping unit 120 with the first configuration information.
After the processor core 110 configures the internal registers of the texture mapping unit 120 with information, the texture mapping unit 120 is configured to parse the first configuration information, generate a first access request, and send the first access request to the cache 130. Specifically, the texture mapping unit 120 parses the texture coordinates in the internal register to obtain the first access request.
When the cache memory 130 receives the first access request, the cache memory 130 is configured to access texture data corresponding to the texture coordinates in the data memory 140, generate an access result, and return the access result to the texture mapping unit 120 if it is determined that the data memory 140 needs to be accessed according to the first access request.
In the embodiment of the present invention, the data storage 140 is used for storing uncompressed texture data, and the first access request indicates that the cache memory 130 accesses the data storage 140, so when the cache memory 130 receives the first access request, it determines that access to the data storage 140 is required, and after the access is completed, two access results of accessing the uncompressed texture data and accessing the uncompressed texture data are generated. After the access result is generated, the access result is returned to the texture mapping unit 120. Alternatively, the data store 140 may be a tightly coupled data store (Tightly Coupled Data Memory, TCDM).
In the TMU system of the present invention, when the texture coordinates need to be acquired, the decompressed texture data already stored in the data memory 140 is accessed through the cache memory 130, and it can be understood that the decompressed texture data is uncompressed texture data or decompressed texture data obtained by decompressing compressed texture data. In the case that the cache memory 130 accesses the texture data corresponding to the texture coordinates to be extracted, the extraction and decompression operations of the compressed data are not required, so that the calculation in the data extraction process can be effectively reduced, the efficiency is improved, the operation of re-storing due to repeated calculation is further avoided, and the occupancy rate of the memory is effectively reduced.
After the texture mapping unit 120 receives the access result returned by the cache 130, the texture mapping unit 120 is further configured to generate an end signal when the received access result is that texture data corresponding to texture coordinates is accessed, and send the end signal to the processor core 110 to end data extraction.
In the embodiment of the present invention, when uncompressed texture data corresponding to texture coordinates is accessed in the data memory 140, the texture mapping unit 120 may generate an end signal and send the end signal back to the processor core 110, and after receiving the end signal, the processor core 110 may directly extract data in the data memory 140, and after obtaining the texture data, may end the data extraction.
According to the TMU system provided by the embodiment of the invention, the address of the data memory 140 maintained by the cache memory 130 can be directly accessed to the data memory 140 through the first access request by changing the maintenance address of the cache memory 130, so that the operation of decompression calculation and the data storage operation after decompression are effectively reduced, and the storage space is further saved.
In one possible embodiment of the present invention, to instruct the cache memory 130 to determine the access object, the first access request sent by the texture mapping unit 120 to the cache memory 130 includes at least: deposit address and flag signals, then texture mapping unit 120 is specifically configured to: calculating a data storage address according to the texture coordinates to obtain a first storage address, wherein the first storage address is an address of uncompressed texture data corresponding to the texture coordinates; a first flag signal is generated based on the first deposit address, and a first access request is generated that includes at least the first deposit address and the first flag signal, where the first flag signal is used by the cache memory 130 to determine that the access object is the data memory 140.
Specifically, in order to check whether the cache processor 130 first checks whether there is uncompressed texture data corresponding to the texture coordinates in the data memory 140, the texture mapping unit 120 is caused to calculate according to the texture coordinates in the first configuration information, to obtain a first storage address of the uncompressed texture data, and to generate a first flag signal according to the first storage address, where the first flag signal is used to instruct the cache memory 130 to determine that the access object is the data memory 140, so that a first access request including the first flag signal and the first storage address is sent to the cache memory 130, so that the cache memory 130 accesses the access object according to the first access request.
Based on this, after the cache memory 130 receives the first access request including the first flag signal and the first deposit address, the cache memory 130 is specifically configured to: determining the access object as the data memory 140 based on the first flag signal; the first deposit address is accessed in the data store 140 and an access result is generated. Two situations may occur when the cache memory 130 accesses the data memory 140, one being accessed and the other being not accessed, and the access result may be generated according to the two situations, specifically, the cache memory 130 determines whether the first storage address includes uncompressed first texture data in the data memory 140; if it is determined that the first storage address contains first texture data, generating a first access result, where the first access result indicates that uncompressed texture data corresponding to texture coordinates is accessed in the data storage 140; if it is determined that the first texture data is not included in the first deposit address, a second access result is generated, where the second access result indicates that uncompressed texture data corresponding to texture coordinates is not accessed in the data storage 140. When the access result returned by the cache memory 130 is the first access result, it indicates that the data memory 140 stores the uncompressed texture data corresponding to the texture pixel, so that data extraction and decompression and storage operations are not needed, and the calculation time and the storage space are saved.
Referring to fig. 3, specifically, fig. 3 is a block diagram of the refined TMU system according to the embodiment of the present invention, as shown in fig. 3, the refined TMU system further includes a memory 150, where the memory 150 is in communication with the cache 130 and is used for storing compressed texture data, the access object further includes the memory 150, and the cache 130 is further used for sending the second access result to the texture mapping unit 120. Specifically, uncompressed texture data corresponding to the texel will not be stored in the data memory 140.
After receiving the second access result, the texture mapping unit 120 is further configured to obtain a second access request, and send the second access request to the cache memory 130, where the second access request includes at least a second deposit address and a second flag signal, the second deposit address is an address of the compressed texture data corresponding to the texture coordinates, and the second flag signal is used by the cache memory 130 to determine that the access object is the memory 150.
Since the 2D/3D texture image is usually stored in the memory and compressed by the preset texture, when the cache memory 130 does not access the texture data in the data memory 140, the texture mapping unit 120 may calculate the texture coordinates according to the texture coordinates and the preset compression mode to obtain compressed second storage addresses corresponding to the texture coordinates, and generate the second flag signal according to the second storage addresses, and an optional texture compression algorithm may be DXT1/DXT3/DXT5, etc. The second address and the second flag signal constitute a second access request, which is sent again to the cache memory 130.
When the second flag signal indicates that the access object is the memory 150, the cache memory 130 is further configured to, after receiving the second access request, access the second storage address in the memory 150, extract compressed second texture data corresponding to the second storage address, and send the second texture data to the texture mapping unit 120. The texture mapping unit 120 is further configured to decompress the second texture data to obtain decompressed third texture data, write the third texture data into the data memory 140 for storage, and generate a third access result, where the third access result indicates that extraction of texture data corresponding to texture coordinates is completed.
In particular, since the texture data corresponding to the texture coordinates to be extracted is not accessed in the data memory 140, meaning that the texture data is not extracted before, the cache memory 130 performs data extraction on the memory 150 after the texture data is not accessed in the data memory 140, and the texture mapping unit 120 decompresses the compressed data and stores the compressed data in the data memory 140 to provide data for the subsequent extraction operation, it is understood that the same cache memory may be used for maintaining the first access address and the second access address, and that the two caches may be used for maintaining the first access address and the second access address for clearer address management, respectively.
Since the processor core generates the thread to configure the texture mapping unit 120 when receiving the data fetch instruction, the first configuration information further includes an ID of the thread, and the end signal includes an operation end signal and an ID number in order to make the processor core better recognize the end signal.
Based on this, the processor core 110 is further configured to create a target thread when receiving the data extraction instruction, obtain the first configuration information through the target thread, and configure the texture mapping unit 120 according to the first configuration information; the texture mapping unit 120 is specifically further configured to generate an ID number according to the ID of the target thread, generate an operation end signal according to the received first access result or the received third access result, and send the ID number and the operation end signal to the processor core 110; the processor core 110 is further configured to send an operation end signal to the target thread corresponding to the ID number, and end the data extraction operation of the target thread.
Specifically, while creating the target thread, an ID, it is understood that each thread is independent of the others, does not communicate with each other, and is unique. After the texture mapping unit 120 finishes the data extraction, the ID number and the operation end signal are sent to the processor core 110, and the processor core 110 sends the operation end signal to the target thread corresponding to the ID number, so that the data extraction operation of the target thread can be ended, where the operation end signal may be a monocycle pulse signal.
In the TMU system provided in the embodiment of the present invention, by adjusting the connection structure of the TMU system, the cache processor 130 is respectively connected with the texture mapping unit 120, the data memory 140 and the memory 150 in a communication manner, the cache processor 13 recognizes the flag signal to confirm the access object, before the data extraction is performed on the memory 150, the data memory 140 is accessed, if the data memory 140 accesses the texture data, the data extraction is not required to be performed on the memory 150, and the data is further decompressed and stored, thereby effectively reducing the operation process and the storage space occupation rate.
The embodiment of the invention also provides an operation optimization method of the TMU system, the method is applied to the TMU system, and the TMU system comprises: a processor core 110, a data memory 140, a cache memory 130, and a texture mapping unit 120; processor core 110 has a communication connection with data store 140, cache 130, and texture mapping unit 120, respectively; the cache memory 130 is in communication with the data memory 140 and the texture mapping unit 120, respectively.
Referring specifically to fig. 4, fig. 4 is a flow chart of an operation optimization method of a TMU system according to an embodiment of the invention, and as shown in fig. 4, the method includes:
in step 410, the first configuration information is parsed to generate a first access request, and the first access request is sent to the cache memory 130, where the first configuration information is obtained and configured by the processor core 110 according to the data extraction instruction when the data extraction instruction is received, the first configuration information at least includes texture coordinates of a texture pixel to be processed, and the cache memory 130 is configured to access uncompressed texture data corresponding to the texture coordinates in the data memory 140 and generate an access result if the data memory 140 is determined to be accessed according to the first access request, and return the access result to the texture mapping unit 120.
In step 420, when the received access result is that the texture data corresponding to the texture coordinates is accessed, an end signal is generated, and the end signal is sent to the processor core 110, so as to end the data extraction.
Further, in step 410, the first configuration information is parsed to generate a first access request, which specifically includes: calculating a data storage address according to the texture coordinates to obtain a first storage address, wherein the first storage address is an address of uncompressed texture data corresponding to the texture coordinates; a first flag signal is generated based on the first deposit address, and a first access request is generated that includes at least the first deposit address and the first flag signal, where the first flag signal is used by the cache memory 130 to determine that the access object is the data memory 140.
The TMU system to which the optimization method of the embodiment of the present invention is applied further includes a memory 150, the memory 150 is in communication with the cache 130, for storing compressed texture data, the access object further includes the memory 150, and the access result includes: texture data is accessed in data store 140 and is not accessed in data store 140. Then, after the received access result is that texture data is not accessed in the data store 140, the method further comprises:
step1, acquiring a second access request, and sending the second access request to the cache memory 130, where the second access request includes at least a second deposit address and a second flag signal, the second deposit address is an address of compressed texture data corresponding to texture coordinates, the second flag signal is used by the cache memory 130 to determine that an access object is the memory 150, and trigger the cache memory 130 to access the second deposit address in the memory 150, extract the compressed second texture data corresponding to the second deposit address, and send the compressed second texture data to the texture mapping unit 120.
Step2, obtaining second texture data, and performing decompression operation on the second texture data to obtain decompressed third texture data.
Step3, writing the third texture data into the data memory 140 for storage, and generating a third access result, wherein the third access result indicates that the extraction of the texture data corresponding to the texture coordinates is completed.
Since the processor core generates the thread to configure the texture mapping unit 120 when receiving the data fetch instruction, in order to make the processor core better identify the end signal, the first configuration information further includes the ID of the thread, the end signal includes the operation end signal and the ID number, and then step 420 generates the end signal when the received access result is that the texture data corresponding to the texture coordinates is accessed, and sends the end signal to the processor core 110, and the data fetch is finished, which specifically includes: generating an ID number according to the ID of a target thread, which is created by the processor core 110 when receiving the data extraction instruction, and generating an operation end signal according to the access result; the ID number and the operation end signal are sent to the processor core 110, and the processor core 110 is configured to send the operation end signal to the target thread corresponding to the ID number, and end data extraction of the target thread.
According to the operation optimization method of the TMU system, the cache memory 130 is accessed to the data memory 140 preferentially, and if texture data to be extracted are not accessed to the data memory 140, the texture data are extracted from the memory; if texture data to be fetched is accessed in the data store 140, no fetching into the store is required, thereby reducing the computational process and the process of repeating the storage.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not thereby to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.
Claims (10)
1. A TMU system, the TMU system comprising: a processor core, a data memory, a cache memory, and a texture mapping unit; the processor core is in communication with the data store, the cache memory, and the texture mapping unit, respectively; the cache memory is in communication with the data memory and the texture mapping unit, respectively;
The processor core is used for acquiring first configuration information according to the data extraction instruction when receiving the data extraction instruction, and configuring the texture mapping unit according to the first configuration information, wherein the first configuration information at least comprises texture coordinates of texture pixels to be processed;
the texture mapping unit is used for analyzing the first configuration information, generating a first access request and sending the first access request to the cache;
the cache memory is used for accessing texture data corresponding to the texture coordinates in the data memory if the data memory is determined to be accessed according to the first access request, generating an access result, and returning the access result to the texture mapping unit;
the data memory is used for storing uncompressed texture data;
the texture mapping unit is further configured to generate an end signal when the received access result is that texture data corresponding to the texture coordinates is accessed, and send the end signal to the processor core to end data extraction.
2. The TMU system of claim 1, wherein the texture mapping unit is specifically configured to:
Calculating a data storage address according to the texture coordinates to obtain a first storage address, wherein the first storage address is an address of uncompressed texture data corresponding to the texture coordinates;
generating a first flag signal based on the first deposit address, and generating the first access request at least comprising the first deposit address and the first flag signal, wherein the first flag signal is used for determining that an access object is the data memory by the cache memory.
3. The TMU system of claim 2, wherein the cache is specifically configured to:
determining that an access object is the data memory based on the first flag signal;
and accessing the first deposit address in the data memory and generating an access result.
4. The TMU system of claim 3, wherein said accessing said first deposit address in said data store and generating an access result specifically comprises:
determining, in the data store, whether the first deposit address contains uncompressed first texture data;
if the first storage address contains the first texture data, generating a first access result, wherein the first access result indicates that uncompressed texture data corresponding to the texture coordinates are accessed in the data memory;
And if the first storage address does not contain the first texture data, generating a second access result, wherein the second access result indicates that uncompressed texture data corresponding to the texture coordinates are not accessed in the data memory.
5. The TMU system of claim 4, further comprising a memory in communication with said cache memory for storing compressed texture data, said access object further comprising said memory, said cache memory further for sending said second access result to said texture mapping unit;
the texture mapping unit is further configured to obtain a second access request after receiving the second access result, and send the second access request to the cache memory, where the second access request includes at least a second storage address and a second flag signal, the second storage address is an address of compressed texture data corresponding to the texture coordinate, and the second flag signal is used by the cache memory to determine that an access object is the memory;
When the second flag signal indicates that the access object is the memory, the cache memory is further configured to access the second storage address in the memory after receiving a second access request, extract compressed second texture data corresponding to the second storage address, and send the second texture data to the texture mapping unit;
the texture mapping unit is further configured to decompress the second texture data to obtain decompressed third texture data, write the third texture data into the data memory for storage, and generate a third access result, where the third access result indicates that extraction of texture data corresponding to the texture coordinates is completed.
6. The TMU system of claim 5, wherein said first configuration information further comprises an ID of a thread, said end signal comprising an operation end signal and an ID number;
the processor core is further configured to create a target thread when receiving a data extraction instruction, obtain first configuration information through the target thread, and configure the texture mapping unit according to the first configuration information;
The texture mapping unit is specifically further configured to generate the ID number according to the ID of the target thread, generate the operation end signal according to the received first access result or third access result, and send the ID number and the operation end signal to the processor core;
the processor core is further configured to send the operation end signal to a target thread corresponding to the ID number, and end a data extraction operation of the target thread.
7. An operation optimization method of a TMU system, wherein the method is applied to the TMU system, and the TMU system comprises: a processor core, a data memory, a cache memory, and a texture mapping unit; the processor core is in communication with the data store, the cache memory, and the texture mapping unit, respectively; the cache memory having a communication connection with the data memory and the texture mapping unit, respectively, the method comprising:
analyzing first configuration information, generating a first access request, and sending the first access request to the cache memory, wherein the first configuration information is acquired and configured by the processor core according to a data extraction instruction when the processor core receives the data extraction instruction, the first configuration information at least comprises texture coordinates of texture pixels to be processed, and the cache memory is used for accessing uncompressed texture data corresponding to the texture coordinates in the data memory and generating an access result if the data memory is determined to be accessed according to the first access request, and returning the access result to the texture mapping unit;
And when the received access result is that the texture data corresponding to the texture coordinates are accessed, generating an end signal, and sending the end signal to the processor core to end data extraction.
8. The method of claim 7, wherein the parsing the first configuration information to generate the first access request specifically includes:
calculating a data storage address according to the texture coordinates to obtain a first storage address, wherein the first storage address is an address of uncompressed texture data corresponding to the texture coordinates;
generating a first flag signal based on the first deposit address, and generating the first access request at least comprising the first deposit address and the first flag signal, wherein the first flag signal is used for determining that an access object is the data memory by the cache memory.
9. The method of claim 7, wherein the TMU system further comprises a memory in communication with the cache memory for storing compressed texture data, the access object further comprising the memory;
after the received access result is that the texture data is not accessed in the data memory, the method further comprises:
Acquiring a second access request and sending the second access request to the cache memory, wherein the second access request at least comprises a second deposit address and a second flag signal, the second deposit address is an address of compressed texture data corresponding to the texture coordinates, the second flag signal is used for the cache memory to determine that an access object is the memory, the cache memory is triggered to access the second deposit address in the memory, and compressed second texture data corresponding to the second deposit address is extracted and sent to the texture mapping unit;
acquiring the second texture data, and performing decompression operation on the second texture data to obtain decompressed third texture data;
and writing the third texture data into the data memory for storage, and generating a third access result, wherein the third access result indicates that the extraction of the texture data corresponding to the texture coordinates is completed.
10. The method of claim 7, wherein the first configuration information further comprises an ID of a thread, and wherein the end signal comprises an operation end signal and an ID number;
And when the received access result is that the texture data corresponding to the texture coordinates is accessed, generating an end signal, and sending the end signal to the processor core to end data extraction, wherein the method specifically comprises the following steps of:
generating the ID number according to the ID of a target thread, and generating the operation ending signal according to the access result, wherein the target thread is created by the processor core when receiving a data extraction instruction;
and sending the ID number and the operation ending signal to the processor core, wherein the processor core is used for sending the operation ending signal to a target thread corresponding to the ID number and ending the data extraction of the target thread.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310723241.8A CN116467227B (en) | 2023-06-19 | 2023-06-19 | TMU system and operation optimization method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310723241.8A CN116467227B (en) | 2023-06-19 | 2023-06-19 | TMU system and operation optimization method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116467227A true CN116467227A (en) | 2023-07-21 |
CN116467227B CN116467227B (en) | 2023-08-25 |
Family
ID=87179261
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310723241.8A Active CN116467227B (en) | 2023-06-19 | 2023-06-19 | TMU system and operation optimization method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116467227B (en) |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103608848A (en) * | 2011-06-17 | 2014-02-26 | 超威半导体公司 | Real time on-chip texture decompression using shader processors |
CN105550126A (en) * | 2014-10-22 | 2016-05-04 | 三星电子株式会社 | Cache memory system and method of operating the same |
CN106683158A (en) * | 2016-12-12 | 2017-05-17 | 中国航空工业集团公司西安航空计算技术研究所 | Modeling structure of GPU texture mapping non-blocking memory Cache |
CN107153617A (en) * | 2016-03-04 | 2017-09-12 | 三星电子株式会社 | For the cache architecture using buffer efficient access data texturing |
US20180096515A1 (en) * | 2016-10-05 | 2018-04-05 | Samsung Electronics Co., Ltd. | Method and apparatus for processing texture |
CN108022269A (en) * | 2017-11-24 | 2018-05-11 | 中国航空工业集团公司西安航空计算技术研究所 | A kind of modeling structure of GPU compressed textures storage Cache |
US20180182155A1 (en) * | 2016-12-22 | 2018-06-28 | Advanced Micro Devices, Inc. | Shader writes to compressed resources |
US20190096027A1 (en) * | 2017-09-25 | 2019-03-28 | Arm Limited | Cache arrangement for graphics processing systems |
US10706607B1 (en) * | 2019-02-20 | 2020-07-07 | Arm Limited | Graphics texture mapping |
US20220206950A1 (en) * | 2020-12-28 | 2022-06-30 | Advanced Micro Devices, Inc. | Selective generation of miss requests for cache lines |
CN115345769A (en) * | 2021-05-14 | 2022-11-15 | 辉达公司 | Accelerated processing via physics-based rendering engine |
CN115409882A (en) * | 2022-09-02 | 2022-11-29 | 中国船舶集团有限公司第七一六研究所 | Device and method for realizing texture sampling in GPU |
CN115617499A (en) * | 2022-12-20 | 2023-01-17 | 深流微智能科技(深圳)有限公司 | System and method for GPU multi-core hyper-threading technology |
-
2023
- 2023-06-19 CN CN202310723241.8A patent/CN116467227B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103608848A (en) * | 2011-06-17 | 2014-02-26 | 超威半导体公司 | Real time on-chip texture decompression using shader processors |
CN105550126A (en) * | 2014-10-22 | 2016-05-04 | 三星电子株式会社 | Cache memory system and method of operating the same |
CN107153617A (en) * | 2016-03-04 | 2017-09-12 | 三星电子株式会社 | For the cache architecture using buffer efficient access data texturing |
US20180096515A1 (en) * | 2016-10-05 | 2018-04-05 | Samsung Electronics Co., Ltd. | Method and apparatus for processing texture |
CN106683158A (en) * | 2016-12-12 | 2017-05-17 | 中国航空工业集团公司西安航空计算技术研究所 | Modeling structure of GPU texture mapping non-blocking memory Cache |
US20180182155A1 (en) * | 2016-12-22 | 2018-06-28 | Advanced Micro Devices, Inc. | Shader writes to compressed resources |
US20190096027A1 (en) * | 2017-09-25 | 2019-03-28 | Arm Limited | Cache arrangement for graphics processing systems |
CN108022269A (en) * | 2017-11-24 | 2018-05-11 | 中国航空工业集团公司西安航空计算技术研究所 | A kind of modeling structure of GPU compressed textures storage Cache |
US10706607B1 (en) * | 2019-02-20 | 2020-07-07 | Arm Limited | Graphics texture mapping |
US20220206950A1 (en) * | 2020-12-28 | 2022-06-30 | Advanced Micro Devices, Inc. | Selective generation of miss requests for cache lines |
CN115345769A (en) * | 2021-05-14 | 2022-11-15 | 辉达公司 | Accelerated processing via physics-based rendering engine |
CN115409882A (en) * | 2022-09-02 | 2022-11-29 | 中国船舶集团有限公司第七一六研究所 | Device and method for realizing texture sampling in GPU |
CN115617499A (en) * | 2022-12-20 | 2023-01-17 | 深流微智能科技(深圳)有限公司 | System and method for GPU multi-core hyper-threading technology |
Non-Patent Citations (1)
Title |
---|
邵绪强;聂霄;王保义;: "GPU并行计算加速的实时可视外壳三维重建及其虚实交互", 计算机辅助设计与图形学学报, no. 01, pages 52 - 54 * |
Also Published As
Publication number | Publication date |
---|---|
CN116467227B (en) | 2023-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9779536B2 (en) | Graphics processing | |
US9406149B2 (en) | Selecting and representing multiple compression methods | |
JP3453088B2 (en) | Compressed texture data structure | |
KR102258100B1 (en) | Method and apparatus for processing texture | |
US7880745B2 (en) | Systems and methods for border color handling in a graphics processing unit | |
KR20040069500A (en) | Pixel cache, 3D graphic accelerator using it, and method therefor | |
JP2000057369A (en) | Method for taking out texture data | |
US8243086B1 (en) | Variable length data compression using a geometry shading unit | |
US8254701B1 (en) | Data compression using a geometry shading unit | |
KR20060116916A (en) | Texture cache and 3-dimensional graphics system including the same, and control method thereof | |
US20210358174A1 (en) | Method and apparatus of data compression | |
CN116467227B (en) | TMU system and operation optimization method thereof | |
EP3355275B1 (en) | Out of order pixel shader exports | |
US10726607B1 (en) | Data processing systems | |
US11954038B2 (en) | Efficient evict for cache block memory | |
CN112419463B (en) | Model data processing method, device, equipment and readable storage medium | |
US10706607B1 (en) | Graphics texture mapping | |
US20220207644A1 (en) | Data compression support for accelerated processor | |
US11205243B2 (en) | Data processing systems | |
US10956338B2 (en) | Low latency dirty RAM for cache invalidation speed improvement | |
CN112416489A (en) | Engineering drawing display method and related device | |
US10395424B2 (en) | Method and apparatus of copying data to remote memory | |
CN116758175B (en) | Primitive block compression device and method, graphic processor and electronic equipment | |
CN116263981B (en) | Graphics processor, system, apparatus, device, and method | |
US20230186523A1 (en) | Method and system for integrating compression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |