CN116467227A - TMU system and operation optimization method thereof - Google Patents

TMU system and operation optimization method thereof Download PDF

Info

Publication number
CN116467227A
CN116467227A CN202310723241.8A CN202310723241A CN116467227A CN 116467227 A CN116467227 A CN 116467227A CN 202310723241 A CN202310723241 A CN 202310723241A CN 116467227 A CN116467227 A CN 116467227A
Authority
CN
China
Prior art keywords
texture
data
memory
access
mapping unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310723241.8A
Other languages
Chinese (zh)
Other versions
CN116467227B (en
Inventor
江靖华
韩会莲
张坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenliu Micro Intelligent Technology Shenzhen Co ltd
Original Assignee
Shenliu Micro Intelligent Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenliu Micro Intelligent Technology Shenzhen Co ltd filed Critical Shenliu Micro Intelligent Technology Shenzhen Co ltd
Priority to CN202310723241.8A priority Critical patent/CN116467227B/en
Publication of CN116467227A publication Critical patent/CN116467227A/en
Application granted granted Critical
Publication of CN116467227B publication Critical patent/CN116467227B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Image Generation (AREA)

Abstract

The embodiment of the invention discloses a TMU system, which comprises: a processor core, a data memory, a cache memory, and a texture mapping unit; the processor core configures a texture mapping unit according to the first configuration information, the texture mapping unit analyzes the first configuration information, generates a first access request and sends the first access request to the cache memory; the cache memory accesses texture data in the data memory according to the first access request, generates an access result and returns the access result to the texture mapping unit; the texture mapping unit generates an end signal according to the access result and sends the end signal to the processor core to end data extraction. The uncompressed texture data corresponding to the texture pixels are directly obtained from the data memory through the cache memory, and the decompression operation and the data storage operation are not needed because the corresponding texture data in the data memory are directly called, so that the efficiency of the TMU system is higher, the occupancy rate of the memory is reduced, and the aim of optimizing the performance is fulfilled.

Description

TMU system and operation optimization method thereof
Technical Field
The present invention relates to the field of graphics processing technologies, and in particular, to a TMU system and an operation optimization method for the TMU system.
Background
The texture mapping unit (Texture mapping unit, TMU) is a component of modern graphics processors (Graphics Processing Unit, GPUs) that are capable of rotating, resizing, and warping bitmap images for placement as textures on any plane of a given 3D model, a process known as texture mapping. The texture mapping unit occurs due to the computational requirements of sampling and converting the planar image (as a texture map) to the correct angles and perspective needed in 3D space. The texture mapping unit is part of the shader and is separate from the rendering output unit (ROP).
In the TMU system, the texture mapping unit is used for calculating a storage address corresponding to texture value data of the texture pixels in the memory according to information configured by the processor core, and storing the texture value data of the texture pixels in the data memory after the texture data of the texture pixels are read, and when the TMU system has the problems of repeated operation and high storage space occupation rate.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a TMU system to solve the problems of repeated computation and high storage space occupation in the existing TMU system.
To achieve the above object, a first aspect of the present application provides a TMU system, including: a processor core, a data memory, a cache memory, and a texture mapping unit; the processor core is in communication with the data store, the cache memory, and the texture mapping unit, respectively; the cache memory is in communication with the data memory and the texture mapping unit, respectively;
the processor core is used for acquiring first configuration information according to the data extraction instruction when receiving the data extraction instruction, and configuring the texture mapping unit according to the first configuration information, wherein the first configuration information at least comprises texture coordinates of texture pixels to be processed;
the texture mapping unit is used for analyzing the first configuration information, generating a first access request and sending the first access request to the cache;
the cache memory is used for accessing texture data corresponding to the texture coordinates in the data memory if the data memory is determined to be accessed according to the first access request, generating an access result, and returning the access result to the texture mapping unit;
The data memory is used for storing uncompressed texture data;
the texture mapping unit is further configured to generate an end signal when the received access result is that texture data corresponding to the texture coordinates is accessed, and send the end signal to the processor core to end data extraction.
Further, the texture mapping unit is specifically configured to:
calculating a data storage address according to the texture coordinates to obtain a first storage address, wherein the first storage address is an address of uncompressed texture data corresponding to the texture coordinates;
generating a first flag signal based on the first deposit address, and generating the first access request at least comprising the first deposit address and the first flag signal, wherein the first flag signal is used for determining that an access object is the data memory by the cache memory.
Further, the cache memory is specifically configured to:
determining that an access object is the data memory based on the first flag signal;
and accessing the first deposit address in the data memory and generating an access result.
Further, the accessing the first storage address in the data memory and generating an access result specifically includes:
Determining, in the data store, whether the first deposit address contains uncompressed first texture data;
if the first storage address contains the first texture data, generating a first access result, wherein the first access result indicates that uncompressed texture data corresponding to the texture coordinates are accessed in the data memory;
and if the first storage address does not contain the first texture data, generating a second access result, wherein the second access result indicates that uncompressed texture data corresponding to the texture coordinates are not accessed in the data memory.
Further, the TMU system further includes a memory, the memory having a communication connection with the cache memory for storing compressed texture data, the access object further including the memory, the cache memory further being configured to send the second access result to the texture mapping unit;
the texture mapping unit is further configured to obtain a second access request after receiving the second access result, and send the second access request to the cache memory, where the second access request includes at least a second storage address and a second flag signal, the second storage address is an address of compressed texture data corresponding to the texture coordinate, and the second flag signal is used by the cache memory to determine that an access object is the memory;
When the second flag signal indicates that the access object is the memory, the cache memory is further configured to access the second storage address in the memory after receiving a second access request, extract compressed second texture data corresponding to the second storage address, and send the second texture data to the texture mapping unit;
the texture mapping unit is further configured to decompress the second texture data to obtain decompressed third texture data, write the third texture data into the data memory for storage, and generate a third access result, where the third access result indicates that extraction of texture data corresponding to the texture coordinates is completed.
Further, the first configuration information further includes an ID of the thread, and the end signal includes an operation end signal and an ID number;
the processor core is further configured to create a target thread when receiving a data extraction instruction, obtain first configuration information through the target thread, and configure the texture mapping unit according to the first configuration information;
the texture mapping unit is specifically further configured to generate the ID number according to the ID of the target thread, generate the operation end signal according to the received first access result or third access result, and send the ID number and the operation end signal to the processor core;
The processor core is further configured to send the operation end signal to a target thread corresponding to the ID number, and end a data extraction operation of the target thread.
To achieve the above object, a second aspect of the present application provides an operation optimization method of a TMU system, where the method is applied to the TMU system, and the TMU system includes: a processor core, a data memory, a cache memory, and a texture mapping unit; the processor core is in communication with the data store, the cache memory, and the texture mapping unit, respectively; the cache memory having a communication connection with the data memory and the texture mapping unit, respectively, the method comprising:
analyzing first configuration information, generating a first access request, and sending the first access request to the cache memory, wherein the first configuration information is acquired and configured by the processor core according to a data extraction instruction when the processor core receives the data extraction instruction, the first configuration information at least comprises texture coordinates of texture pixels to be processed, and the cache memory is used for accessing uncompressed texture data corresponding to the texture coordinates in the data memory and generating an access result if the data memory is determined to be accessed according to the first access request, and returning the access result to the texture mapping unit;
And when the received access result is that the texture data corresponding to the texture coordinates are accessed, generating an end signal, and sending the end signal to the processor core to end data extraction.
Further, the parsing the first configuration information to generate a first access request specifically includes:
calculating a data storage address according to the texture coordinates to obtain a first storage address, wherein the first storage address is an address of uncompressed texture data corresponding to the texture coordinates;
generating a first flag signal based on the first deposit address, and generating the first access request at least comprising the first deposit address and the first flag signal, wherein the first flag signal is used for determining that an access object is the data memory by the cache memory.
Further, the TMU system further includes a memory in communication with the cache memory for storing compressed texture data, the access object further including the memory;
after the received access result is that the texture data is not accessed in the data memory, the method further comprises:
Acquiring a second access request and sending the second access request to the cache memory, wherein the second access request at least comprises a second deposit address and a second flag signal, the second deposit address is an address of compressed texture data corresponding to the texture coordinates, the second flag signal is used for the cache memory to determine that an access object is the memory, the cache memory is triggered to access the second deposit address in the memory, and compressed second texture data corresponding to the second deposit address is extracted and sent to the texture mapping unit;
acquiring the second texture data, and performing decompression operation on the second texture data to obtain decompressed third texture data;
and writing the third texture data into the data memory for storage, and generating a third access result, wherein the third access result indicates that the extraction of the texture data corresponding to the texture coordinates is completed.
Further, the first configuration information further includes an ID of the thread, and the end signal includes an operation end signal and an ID number;
and when the received access result is that the texture data corresponding to the texture coordinates is accessed, generating an end signal, and sending the end signal to the processor core to end data extraction, wherein the method specifically comprises the following steps of:
Generating the ID number according to the ID of a target thread, and generating the operation ending signal according to the access result, wherein the target thread is created by the processor core when receiving a data extraction instruction;
and sending the ID number and the operation ending signal to the processor core, wherein the processor core is used for sending the operation ending signal to a target thread corresponding to the ID number and ending the data extraction of the target thread.
The embodiment of the invention has the following beneficial effects:
the invention discloses a TMU system, which comprises: a processor core, a data memory, a cache memory, and a texture mapping unit; the processor core is in communication with the data memory, the cache memory, and the texture mapping unit, respectively; the cache memory is in communication connection with the data memory and the texture mapping unit, respectively; the processor core is used for acquiring first configuration information according to the data extraction instruction when the data extraction instruction is received, and configuring the texture mapping unit according to the first configuration information, wherein the first configuration information at least comprises texture coordinates of texture pixels to be processed; the texture mapping unit is used for analyzing the first configuration information, generating a first access request and sending the first access request to the cache; the cache memory is used for accessing texture data corresponding to texture coordinates in the data memory if the data memory is determined to be accessed according to the first access request, generating an access result, and returning the access result to the texture mapping unit; the data memory is used for storing uncompressed texture data; the texture mapping unit is further configured to generate an end signal when the received access result is that texture data corresponding to the texture coordinates is accessed, and send the end signal to the processor core to end data extraction. According to the method, uncompressed texture data corresponding to the texture pixels to be processed are obtained from the data memory through the cache memory, if the data memory contains the uncompressed texture data corresponding to the texture pixels to be processed, the uncompressed texture data can be directly called, decompression operation and data storage operation are not needed, so that the efficiency of the TMU system is higher, the occupancy rate of the memory is reduced, and the purpose of optimizing the performance is achieved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Wherein:
FIG. 1a is a block diagram illustrating the coordinate positions of texels corresponding to texture data to be extracted during a first data extraction operation initiated by a processor core;
FIG. 1b is a block diagram illustrating the coordinate positions of texels corresponding to texture data to be extracted during a second data extraction operation initiated by a processor core;
FIG. 1c is a superposition of two data fetch operations initiated by a processor core;
FIG. 2 is a block diagram of a TMU system of an embodiment of the invention;
FIG. 3 is a block diagram of a refined TMU system of an embodiment of the invention;
fig. 4 is a flow chart of an operation optimization method of a TMU system according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the TMU system, the flow of data extraction is approximately: the processor checks the texture mapping unit to carry out information configuration, the texture mapping unit analyzes the configured information to generate an access address, extracts compressed data from the memory, decompresses the obtained compressed data to obtain decompressed data, and stores the decompressed data. If the data to be extracted by the plurality of operations is identical or partially identical, a plurality of repeated operations and repeated storage operations may occur. For example, when fig. 1a is a first data extraction operation initiated by a processor core, the coordinate positions of the texels corresponding to the texture data to be extracted are shown as 4×4 texels of block0, and fig. 1b is a second data extraction operation initiated by the processor core, the coordinate positions of the texels corresponding to the texture data to be extracted are shown as 1×4 texels of block0, and 3×4 texels of block1, so that a repeated portion of the two extraction operations may be obtained, as shown in fig. 1c, and then, in the process of two times of extraction, 1×4 texels of the overlapped portion undergo two times of extraction of compressed data in the memory, and perform data decompression calculation, and two times of storage operations, so that there is a problem that the operation is repeated and the storage space occupation rate is high in the TMU system.
Based on this, the TMU system provided in the embodiment of the present invention may solve the problems existing in the TMU system, referring specifically to fig. 2, fig. 2 is a block diagram of a TMU system according to an embodiment of the present invention, where the TMU system includes: a processor core 110, a data memory 140, a cache memory 130, and a texture mapping unit 120; processor core 110 has a communication connection with data store 140, cache 130, and texture mapping unit 120, respectively; the cache memory 130 is in communication with the data memory 140 and the texture mapping unit 120, respectively. Specifically, with the processor core 110 as a master, the data store 140, the cache memory 130, and the texture mapping unit 120 as slaves, communication between the master and the slaves may be performed, optionally, via an AXI.
The processor core 110 is configured to, when receiving the data extraction instruction, obtain first configuration information according to the data extraction instruction, and configure the texture mapping unit 120 according to the first configuration information, where the first configuration information at least includes texture coordinates of a texture pixel to be processed.
Specifically, upon receipt of the data fetch instruction by the processor core 110, the texture coordinates of the texels to be processed are read in accordance with the data fetch instruction, wherein typically the texture coordinates of the texels are represented using a two-dimensional variable (u, v), where u is the transverse coordinate and v is the longitudinal coordinate, and thus the texture coordinates are also referred to as uv coordinates. The texture mapping unit 120 is configured with the texture coordinates as the first configuration information after the processor core 110 reads the texture coordinates, and it is understood that the internal registers are included in the texture mapping unit 120, so that the processor core 110 configures the internal registers of the texture mapping unit 120 with the first configuration information.
After the processor core 110 configures the internal registers of the texture mapping unit 120 with information, the texture mapping unit 120 is configured to parse the first configuration information, generate a first access request, and send the first access request to the cache 130. Specifically, the texture mapping unit 120 parses the texture coordinates in the internal register to obtain the first access request.
When the cache memory 130 receives the first access request, the cache memory 130 is configured to access texture data corresponding to the texture coordinates in the data memory 140, generate an access result, and return the access result to the texture mapping unit 120 if it is determined that the data memory 140 needs to be accessed according to the first access request.
In the embodiment of the present invention, the data storage 140 is used for storing uncompressed texture data, and the first access request indicates that the cache memory 130 accesses the data storage 140, so when the cache memory 130 receives the first access request, it determines that access to the data storage 140 is required, and after the access is completed, two access results of accessing the uncompressed texture data and accessing the uncompressed texture data are generated. After the access result is generated, the access result is returned to the texture mapping unit 120. Alternatively, the data store 140 may be a tightly coupled data store (Tightly Coupled Data Memory, TCDM).
In the TMU system of the present invention, when the texture coordinates need to be acquired, the decompressed texture data already stored in the data memory 140 is accessed through the cache memory 130, and it can be understood that the decompressed texture data is uncompressed texture data or decompressed texture data obtained by decompressing compressed texture data. In the case that the cache memory 130 accesses the texture data corresponding to the texture coordinates to be extracted, the extraction and decompression operations of the compressed data are not required, so that the calculation in the data extraction process can be effectively reduced, the efficiency is improved, the operation of re-storing due to repeated calculation is further avoided, and the occupancy rate of the memory is effectively reduced.
After the texture mapping unit 120 receives the access result returned by the cache 130, the texture mapping unit 120 is further configured to generate an end signal when the received access result is that texture data corresponding to texture coordinates is accessed, and send the end signal to the processor core 110 to end data extraction.
In the embodiment of the present invention, when uncompressed texture data corresponding to texture coordinates is accessed in the data memory 140, the texture mapping unit 120 may generate an end signal and send the end signal back to the processor core 110, and after receiving the end signal, the processor core 110 may directly extract data in the data memory 140, and after obtaining the texture data, may end the data extraction.
According to the TMU system provided by the embodiment of the invention, the address of the data memory 140 maintained by the cache memory 130 can be directly accessed to the data memory 140 through the first access request by changing the maintenance address of the cache memory 130, so that the operation of decompression calculation and the data storage operation after decompression are effectively reduced, and the storage space is further saved.
In one possible embodiment of the present invention, to instruct the cache memory 130 to determine the access object, the first access request sent by the texture mapping unit 120 to the cache memory 130 includes at least: deposit address and flag signals, then texture mapping unit 120 is specifically configured to: calculating a data storage address according to the texture coordinates to obtain a first storage address, wherein the first storage address is an address of uncompressed texture data corresponding to the texture coordinates; a first flag signal is generated based on the first deposit address, and a first access request is generated that includes at least the first deposit address and the first flag signal, where the first flag signal is used by the cache memory 130 to determine that the access object is the data memory 140.
Specifically, in order to check whether the cache processor 130 first checks whether there is uncompressed texture data corresponding to the texture coordinates in the data memory 140, the texture mapping unit 120 is caused to calculate according to the texture coordinates in the first configuration information, to obtain a first storage address of the uncompressed texture data, and to generate a first flag signal according to the first storage address, where the first flag signal is used to instruct the cache memory 130 to determine that the access object is the data memory 140, so that a first access request including the first flag signal and the first storage address is sent to the cache memory 130, so that the cache memory 130 accesses the access object according to the first access request.
Based on this, after the cache memory 130 receives the first access request including the first flag signal and the first deposit address, the cache memory 130 is specifically configured to: determining the access object as the data memory 140 based on the first flag signal; the first deposit address is accessed in the data store 140 and an access result is generated. Two situations may occur when the cache memory 130 accesses the data memory 140, one being accessed and the other being not accessed, and the access result may be generated according to the two situations, specifically, the cache memory 130 determines whether the first storage address includes uncompressed first texture data in the data memory 140; if it is determined that the first storage address contains first texture data, generating a first access result, where the first access result indicates that uncompressed texture data corresponding to texture coordinates is accessed in the data storage 140; if it is determined that the first texture data is not included in the first deposit address, a second access result is generated, where the second access result indicates that uncompressed texture data corresponding to texture coordinates is not accessed in the data storage 140. When the access result returned by the cache memory 130 is the first access result, it indicates that the data memory 140 stores the uncompressed texture data corresponding to the texture pixel, so that data extraction and decompression and storage operations are not needed, and the calculation time and the storage space are saved.
Referring to fig. 3, specifically, fig. 3 is a block diagram of the refined TMU system according to the embodiment of the present invention, as shown in fig. 3, the refined TMU system further includes a memory 150, where the memory 150 is in communication with the cache 130 and is used for storing compressed texture data, the access object further includes the memory 150, and the cache 130 is further used for sending the second access result to the texture mapping unit 120. Specifically, uncompressed texture data corresponding to the texel will not be stored in the data memory 140.
After receiving the second access result, the texture mapping unit 120 is further configured to obtain a second access request, and send the second access request to the cache memory 130, where the second access request includes at least a second deposit address and a second flag signal, the second deposit address is an address of the compressed texture data corresponding to the texture coordinates, and the second flag signal is used by the cache memory 130 to determine that the access object is the memory 150.
Since the 2D/3D texture image is usually stored in the memory and compressed by the preset texture, when the cache memory 130 does not access the texture data in the data memory 140, the texture mapping unit 120 may calculate the texture coordinates according to the texture coordinates and the preset compression mode to obtain compressed second storage addresses corresponding to the texture coordinates, and generate the second flag signal according to the second storage addresses, and an optional texture compression algorithm may be DXT1/DXT3/DXT5, etc. The second address and the second flag signal constitute a second access request, which is sent again to the cache memory 130.
When the second flag signal indicates that the access object is the memory 150, the cache memory 130 is further configured to, after receiving the second access request, access the second storage address in the memory 150, extract compressed second texture data corresponding to the second storage address, and send the second texture data to the texture mapping unit 120. The texture mapping unit 120 is further configured to decompress the second texture data to obtain decompressed third texture data, write the third texture data into the data memory 140 for storage, and generate a third access result, where the third access result indicates that extraction of texture data corresponding to texture coordinates is completed.
In particular, since the texture data corresponding to the texture coordinates to be extracted is not accessed in the data memory 140, meaning that the texture data is not extracted before, the cache memory 130 performs data extraction on the memory 150 after the texture data is not accessed in the data memory 140, and the texture mapping unit 120 decompresses the compressed data and stores the compressed data in the data memory 140 to provide data for the subsequent extraction operation, it is understood that the same cache memory may be used for maintaining the first access address and the second access address, and that the two caches may be used for maintaining the first access address and the second access address for clearer address management, respectively.
Since the processor core generates the thread to configure the texture mapping unit 120 when receiving the data fetch instruction, the first configuration information further includes an ID of the thread, and the end signal includes an operation end signal and an ID number in order to make the processor core better recognize the end signal.
Based on this, the processor core 110 is further configured to create a target thread when receiving the data extraction instruction, obtain the first configuration information through the target thread, and configure the texture mapping unit 120 according to the first configuration information; the texture mapping unit 120 is specifically further configured to generate an ID number according to the ID of the target thread, generate an operation end signal according to the received first access result or the received third access result, and send the ID number and the operation end signal to the processor core 110; the processor core 110 is further configured to send an operation end signal to the target thread corresponding to the ID number, and end the data extraction operation of the target thread.
Specifically, while creating the target thread, an ID, it is understood that each thread is independent of the others, does not communicate with each other, and is unique. After the texture mapping unit 120 finishes the data extraction, the ID number and the operation end signal are sent to the processor core 110, and the processor core 110 sends the operation end signal to the target thread corresponding to the ID number, so that the data extraction operation of the target thread can be ended, where the operation end signal may be a monocycle pulse signal.
In the TMU system provided in the embodiment of the present invention, by adjusting the connection structure of the TMU system, the cache processor 130 is respectively connected with the texture mapping unit 120, the data memory 140 and the memory 150 in a communication manner, the cache processor 13 recognizes the flag signal to confirm the access object, before the data extraction is performed on the memory 150, the data memory 140 is accessed, if the data memory 140 accesses the texture data, the data extraction is not required to be performed on the memory 150, and the data is further decompressed and stored, thereby effectively reducing the operation process and the storage space occupation rate.
The embodiment of the invention also provides an operation optimization method of the TMU system, the method is applied to the TMU system, and the TMU system comprises: a processor core 110, a data memory 140, a cache memory 130, and a texture mapping unit 120; processor core 110 has a communication connection with data store 140, cache 130, and texture mapping unit 120, respectively; the cache memory 130 is in communication with the data memory 140 and the texture mapping unit 120, respectively.
Referring specifically to fig. 4, fig. 4 is a flow chart of an operation optimization method of a TMU system according to an embodiment of the invention, and as shown in fig. 4, the method includes:
in step 410, the first configuration information is parsed to generate a first access request, and the first access request is sent to the cache memory 130, where the first configuration information is obtained and configured by the processor core 110 according to the data extraction instruction when the data extraction instruction is received, the first configuration information at least includes texture coordinates of a texture pixel to be processed, and the cache memory 130 is configured to access uncompressed texture data corresponding to the texture coordinates in the data memory 140 and generate an access result if the data memory 140 is determined to be accessed according to the first access request, and return the access result to the texture mapping unit 120.
In step 420, when the received access result is that the texture data corresponding to the texture coordinates is accessed, an end signal is generated, and the end signal is sent to the processor core 110, so as to end the data extraction.
Further, in step 410, the first configuration information is parsed to generate a first access request, which specifically includes: calculating a data storage address according to the texture coordinates to obtain a first storage address, wherein the first storage address is an address of uncompressed texture data corresponding to the texture coordinates; a first flag signal is generated based on the first deposit address, and a first access request is generated that includes at least the first deposit address and the first flag signal, where the first flag signal is used by the cache memory 130 to determine that the access object is the data memory 140.
The TMU system to which the optimization method of the embodiment of the present invention is applied further includes a memory 150, the memory 150 is in communication with the cache 130, for storing compressed texture data, the access object further includes the memory 150, and the access result includes: texture data is accessed in data store 140 and is not accessed in data store 140. Then, after the received access result is that texture data is not accessed in the data store 140, the method further comprises:
step1, acquiring a second access request, and sending the second access request to the cache memory 130, where the second access request includes at least a second deposit address and a second flag signal, the second deposit address is an address of compressed texture data corresponding to texture coordinates, the second flag signal is used by the cache memory 130 to determine that an access object is the memory 150, and trigger the cache memory 130 to access the second deposit address in the memory 150, extract the compressed second texture data corresponding to the second deposit address, and send the compressed second texture data to the texture mapping unit 120.
Step2, obtaining second texture data, and performing decompression operation on the second texture data to obtain decompressed third texture data.
Step3, writing the third texture data into the data memory 140 for storage, and generating a third access result, wherein the third access result indicates that the extraction of the texture data corresponding to the texture coordinates is completed.
Since the processor core generates the thread to configure the texture mapping unit 120 when receiving the data fetch instruction, in order to make the processor core better identify the end signal, the first configuration information further includes the ID of the thread, the end signal includes the operation end signal and the ID number, and then step 420 generates the end signal when the received access result is that the texture data corresponding to the texture coordinates is accessed, and sends the end signal to the processor core 110, and the data fetch is finished, which specifically includes: generating an ID number according to the ID of a target thread, which is created by the processor core 110 when receiving the data extraction instruction, and generating an operation end signal according to the access result; the ID number and the operation end signal are sent to the processor core 110, and the processor core 110 is configured to send the operation end signal to the target thread corresponding to the ID number, and end data extraction of the target thread.
According to the operation optimization method of the TMU system, the cache memory 130 is accessed to the data memory 140 preferentially, and if texture data to be extracted are not accessed to the data memory 140, the texture data are extracted from the memory; if texture data to be fetched is accessed in the data store 140, no fetching into the store is required, thereby reducing the computational process and the process of repeating the storage.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not thereby to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (10)

1. A TMU system, the TMU system comprising: a processor core, a data memory, a cache memory, and a texture mapping unit; the processor core is in communication with the data store, the cache memory, and the texture mapping unit, respectively; the cache memory is in communication with the data memory and the texture mapping unit, respectively;
The processor core is used for acquiring first configuration information according to the data extraction instruction when receiving the data extraction instruction, and configuring the texture mapping unit according to the first configuration information, wherein the first configuration information at least comprises texture coordinates of texture pixels to be processed;
the texture mapping unit is used for analyzing the first configuration information, generating a first access request and sending the first access request to the cache;
the cache memory is used for accessing texture data corresponding to the texture coordinates in the data memory if the data memory is determined to be accessed according to the first access request, generating an access result, and returning the access result to the texture mapping unit;
the data memory is used for storing uncompressed texture data;
the texture mapping unit is further configured to generate an end signal when the received access result is that texture data corresponding to the texture coordinates is accessed, and send the end signal to the processor core to end data extraction.
2. The TMU system of claim 1, wherein the texture mapping unit is specifically configured to:
Calculating a data storage address according to the texture coordinates to obtain a first storage address, wherein the first storage address is an address of uncompressed texture data corresponding to the texture coordinates;
generating a first flag signal based on the first deposit address, and generating the first access request at least comprising the first deposit address and the first flag signal, wherein the first flag signal is used for determining that an access object is the data memory by the cache memory.
3. The TMU system of claim 2, wherein the cache is specifically configured to:
determining that an access object is the data memory based on the first flag signal;
and accessing the first deposit address in the data memory and generating an access result.
4. The TMU system of claim 3, wherein said accessing said first deposit address in said data store and generating an access result specifically comprises:
determining, in the data store, whether the first deposit address contains uncompressed first texture data;
if the first storage address contains the first texture data, generating a first access result, wherein the first access result indicates that uncompressed texture data corresponding to the texture coordinates are accessed in the data memory;
And if the first storage address does not contain the first texture data, generating a second access result, wherein the second access result indicates that uncompressed texture data corresponding to the texture coordinates are not accessed in the data memory.
5. The TMU system of claim 4, further comprising a memory in communication with said cache memory for storing compressed texture data, said access object further comprising said memory, said cache memory further for sending said second access result to said texture mapping unit;
the texture mapping unit is further configured to obtain a second access request after receiving the second access result, and send the second access request to the cache memory, where the second access request includes at least a second storage address and a second flag signal, the second storage address is an address of compressed texture data corresponding to the texture coordinate, and the second flag signal is used by the cache memory to determine that an access object is the memory;
When the second flag signal indicates that the access object is the memory, the cache memory is further configured to access the second storage address in the memory after receiving a second access request, extract compressed second texture data corresponding to the second storage address, and send the second texture data to the texture mapping unit;
the texture mapping unit is further configured to decompress the second texture data to obtain decompressed third texture data, write the third texture data into the data memory for storage, and generate a third access result, where the third access result indicates that extraction of texture data corresponding to the texture coordinates is completed.
6. The TMU system of claim 5, wherein said first configuration information further comprises an ID of a thread, said end signal comprising an operation end signal and an ID number;
the processor core is further configured to create a target thread when receiving a data extraction instruction, obtain first configuration information through the target thread, and configure the texture mapping unit according to the first configuration information;
The texture mapping unit is specifically further configured to generate the ID number according to the ID of the target thread, generate the operation end signal according to the received first access result or third access result, and send the ID number and the operation end signal to the processor core;
the processor core is further configured to send the operation end signal to a target thread corresponding to the ID number, and end a data extraction operation of the target thread.
7. An operation optimization method of a TMU system, wherein the method is applied to the TMU system, and the TMU system comprises: a processor core, a data memory, a cache memory, and a texture mapping unit; the processor core is in communication with the data store, the cache memory, and the texture mapping unit, respectively; the cache memory having a communication connection with the data memory and the texture mapping unit, respectively, the method comprising:
analyzing first configuration information, generating a first access request, and sending the first access request to the cache memory, wherein the first configuration information is acquired and configured by the processor core according to a data extraction instruction when the processor core receives the data extraction instruction, the first configuration information at least comprises texture coordinates of texture pixels to be processed, and the cache memory is used for accessing uncompressed texture data corresponding to the texture coordinates in the data memory and generating an access result if the data memory is determined to be accessed according to the first access request, and returning the access result to the texture mapping unit;
And when the received access result is that the texture data corresponding to the texture coordinates are accessed, generating an end signal, and sending the end signal to the processor core to end data extraction.
8. The method of claim 7, wherein the parsing the first configuration information to generate the first access request specifically includes:
calculating a data storage address according to the texture coordinates to obtain a first storage address, wherein the first storage address is an address of uncompressed texture data corresponding to the texture coordinates;
generating a first flag signal based on the first deposit address, and generating the first access request at least comprising the first deposit address and the first flag signal, wherein the first flag signal is used for determining that an access object is the data memory by the cache memory.
9. The method of claim 7, wherein the TMU system further comprises a memory in communication with the cache memory for storing compressed texture data, the access object further comprising the memory;
after the received access result is that the texture data is not accessed in the data memory, the method further comprises:
Acquiring a second access request and sending the second access request to the cache memory, wherein the second access request at least comprises a second deposit address and a second flag signal, the second deposit address is an address of compressed texture data corresponding to the texture coordinates, the second flag signal is used for the cache memory to determine that an access object is the memory, the cache memory is triggered to access the second deposit address in the memory, and compressed second texture data corresponding to the second deposit address is extracted and sent to the texture mapping unit;
acquiring the second texture data, and performing decompression operation on the second texture data to obtain decompressed third texture data;
and writing the third texture data into the data memory for storage, and generating a third access result, wherein the third access result indicates that the extraction of the texture data corresponding to the texture coordinates is completed.
10. The method of claim 7, wherein the first configuration information further comprises an ID of a thread, and wherein the end signal comprises an operation end signal and an ID number;
And when the received access result is that the texture data corresponding to the texture coordinates is accessed, generating an end signal, and sending the end signal to the processor core to end data extraction, wherein the method specifically comprises the following steps of:
generating the ID number according to the ID of a target thread, and generating the operation ending signal according to the access result, wherein the target thread is created by the processor core when receiving a data extraction instruction;
and sending the ID number and the operation ending signal to the processor core, wherein the processor core is used for sending the operation ending signal to a target thread corresponding to the ID number and ending the data extraction of the target thread.
CN202310723241.8A 2023-06-19 2023-06-19 TMU system and operation optimization method thereof Active CN116467227B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310723241.8A CN116467227B (en) 2023-06-19 2023-06-19 TMU system and operation optimization method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310723241.8A CN116467227B (en) 2023-06-19 2023-06-19 TMU system and operation optimization method thereof

Publications (2)

Publication Number Publication Date
CN116467227A true CN116467227A (en) 2023-07-21
CN116467227B CN116467227B (en) 2023-08-25

Family

ID=87179261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310723241.8A Active CN116467227B (en) 2023-06-19 2023-06-19 TMU system and operation optimization method thereof

Country Status (1)

Country Link
CN (1) CN116467227B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103608848A (en) * 2011-06-17 2014-02-26 超威半导体公司 Real time on-chip texture decompression using shader processors
CN105550126A (en) * 2014-10-22 2016-05-04 三星电子株式会社 Cache memory system and method of operating the same
CN106683158A (en) * 2016-12-12 2017-05-17 中国航空工业集团公司西安航空计算技术研究所 Modeling structure of GPU texture mapping non-blocking memory Cache
CN107153617A (en) * 2016-03-04 2017-09-12 三星电子株式会社 For the cache architecture using buffer efficient access data texturing
US20180096515A1 (en) * 2016-10-05 2018-04-05 Samsung Electronics Co., Ltd. Method and apparatus for processing texture
CN108022269A (en) * 2017-11-24 2018-05-11 中国航空工业集团公司西安航空计算技术研究所 A kind of modeling structure of GPU compressed textures storage Cache
US20180182155A1 (en) * 2016-12-22 2018-06-28 Advanced Micro Devices, Inc. Shader writes to compressed resources
US20190096027A1 (en) * 2017-09-25 2019-03-28 Arm Limited Cache arrangement for graphics processing systems
US10706607B1 (en) * 2019-02-20 2020-07-07 Arm Limited Graphics texture mapping
US20220206950A1 (en) * 2020-12-28 2022-06-30 Advanced Micro Devices, Inc. Selective generation of miss requests for cache lines
CN115345769A (en) * 2021-05-14 2022-11-15 辉达公司 Accelerated processing via physics-based rendering engine
CN115409882A (en) * 2022-09-02 2022-11-29 中国船舶集团有限公司第七一六研究所 Device and method for realizing texture sampling in GPU
CN115617499A (en) * 2022-12-20 2023-01-17 深流微智能科技(深圳)有限公司 System and method for GPU multi-core hyper-threading technology

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103608848A (en) * 2011-06-17 2014-02-26 超威半导体公司 Real time on-chip texture decompression using shader processors
CN105550126A (en) * 2014-10-22 2016-05-04 三星电子株式会社 Cache memory system and method of operating the same
CN107153617A (en) * 2016-03-04 2017-09-12 三星电子株式会社 For the cache architecture using buffer efficient access data texturing
US20180096515A1 (en) * 2016-10-05 2018-04-05 Samsung Electronics Co., Ltd. Method and apparatus for processing texture
CN106683158A (en) * 2016-12-12 2017-05-17 中国航空工业集团公司西安航空计算技术研究所 Modeling structure of GPU texture mapping non-blocking memory Cache
US20180182155A1 (en) * 2016-12-22 2018-06-28 Advanced Micro Devices, Inc. Shader writes to compressed resources
US20190096027A1 (en) * 2017-09-25 2019-03-28 Arm Limited Cache arrangement for graphics processing systems
CN108022269A (en) * 2017-11-24 2018-05-11 中国航空工业集团公司西安航空计算技术研究所 A kind of modeling structure of GPU compressed textures storage Cache
US10706607B1 (en) * 2019-02-20 2020-07-07 Arm Limited Graphics texture mapping
US20220206950A1 (en) * 2020-12-28 2022-06-30 Advanced Micro Devices, Inc. Selective generation of miss requests for cache lines
CN115345769A (en) * 2021-05-14 2022-11-15 辉达公司 Accelerated processing via physics-based rendering engine
CN115409882A (en) * 2022-09-02 2022-11-29 中国船舶集团有限公司第七一六研究所 Device and method for realizing texture sampling in GPU
CN115617499A (en) * 2022-12-20 2023-01-17 深流微智能科技(深圳)有限公司 System and method for GPU multi-core hyper-threading technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邵绪强;聂霄;王保义;: "GPU并行计算加速的实时可视外壳三维重建及其虚实交互", 计算机辅助设计与图形学学报, no. 01, pages 52 - 54 *

Also Published As

Publication number Publication date
CN116467227B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
US9779536B2 (en) Graphics processing
US9406149B2 (en) Selecting and representing multiple compression methods
JP3453088B2 (en) Compressed texture data structure
KR102258100B1 (en) Method and apparatus for processing texture
US7880745B2 (en) Systems and methods for border color handling in a graphics processing unit
KR20040069500A (en) Pixel cache, 3D graphic accelerator using it, and method therefor
JP2000057369A (en) Method for taking out texture data
US8243086B1 (en) Variable length data compression using a geometry shading unit
US8254701B1 (en) Data compression using a geometry shading unit
KR20060116916A (en) Texture cache and 3-dimensional graphics system including the same, and control method thereof
US20210358174A1 (en) Method and apparatus of data compression
CN116467227B (en) TMU system and operation optimization method thereof
EP3355275B1 (en) Out of order pixel shader exports
US10726607B1 (en) Data processing systems
US11954038B2 (en) Efficient evict for cache block memory
CN112419463B (en) Model data processing method, device, equipment and readable storage medium
US10706607B1 (en) Graphics texture mapping
US20220207644A1 (en) Data compression support for accelerated processor
US11205243B2 (en) Data processing systems
US10956338B2 (en) Low latency dirty RAM for cache invalidation speed improvement
CN112416489A (en) Engineering drawing display method and related device
US10395424B2 (en) Method and apparatus of copying data to remote memory
CN116758175B (en) Primitive block compression device and method, graphic processor and electronic equipment
CN116263981B (en) Graphics processor, system, apparatus, device, and method
US20230186523A1 (en) Method and system for integrating compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant