CN113051194B - Buffer memory, GPU, processing system and buffer access method - Google Patents

Buffer memory, GPU, processing system and buffer access method Download PDF

Info

Publication number
CN113051194B
CN113051194B CN202110228263.8A CN202110228263A CN113051194B CN 113051194 B CN113051194 B CN 113051194B CN 202110228263 A CN202110228263 A CN 202110228263A CN 113051194 B CN113051194 B CN 113051194B
Authority
CN
China
Prior art keywords
access
data
address
block
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110228263.8A
Other languages
Chinese (zh)
Other versions
CN113051194A (en
Inventor
龙斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Jingmei Integrated Circuit Design Co ltd
Changsha Jingjia Microelectronics Co ltd
Original Assignee
Changsha Jingmei Integrated Circuit Design Co ltd
Changsha Jingjia Microelectronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Jingmei Integrated Circuit Design Co ltd, Changsha Jingjia Microelectronics Co ltd filed Critical Changsha Jingmei Integrated Circuit Design Co ltd
Priority to CN202110228263.8A priority Critical patent/CN113051194B/en
Priority to PCT/CN2021/087350 priority patent/WO2022183571A1/en
Publication of CN113051194A publication Critical patent/CN113051194A/en
Application granted granted Critical
Publication of CN113051194B publication Critical patent/CN113051194B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1673Details of memory controller using buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides a buffer memory, a GPU, a processing system and a buffer access method, wherein the buffer memory comprises: the multiple access address comparison units are used for receiving access addresses sent by multiple access sources and comparing the access addresses with block addresses stored in the address storage unit to generate comparison results; the access address comparison units are mutually independent; the data access management unit is used for receiving the comparison result sent by the access address comparison unit, and controlling the gating of the corresponding data channel in the data channel management unit when the comparison result is hit, so that the corresponding access source accesses the buffer data block in the data storage unit through the data channel; the data channel management unit comprises a plurality of data channels, and the data channels are mutually independent. The buffer memory, the GPU, the processing system and the buffer access method can improve the access efficiency.

Description

Buffer memory, GPU, processing system and buffer access method
Technical Field
The present disclosure relates to memory access technologies, and in particular, to a buffer memory, a GPU, a processing system, and a method for accessing a buffer.
Background
A graphics processor (Graphics Processing Unit, abbreviated as GPU) is a microprocessor specifically used for processing images or graphics, and is applied to a display system of an electronic terminal, so that the pressure of a central processing unit (central processing unit, abbreviated as CPU) on image or graphics processing can be relieved.
The GPU is internally provided with a Cache and a plurality of operation units, and each operation unit can be used as an access source to access the Cache. When a plurality of access sources access the Cache, the access sources are required to be subjected to arbitration, and then are sequenced and then sequentially accessed. The access mode has low efficiency, and a plurality of operation units are always in a waiting stage or have bottleneck effect and cannot work normally, so that the processing efficiency of the GPU is reduced. Moreover, when the Cache is in hit failure, all subsequent read-write access is suspended to execute the replacement operation of a certain data block in the Cache, so that the waiting time of the operation unit is further prolonged.
Disclosure of Invention
The embodiment of the application provides a buffer memory, a GPU, a processing system and a buffer access method, which are used for solving the problem that the efficiency of an access source for accessing the buffer memory in the traditional scheme is low.
An embodiment of a first aspect of the present application provides a buffer memory, including:
the multiple access address comparison units are used for receiving access addresses sent by multiple access sources and comparing the access addresses with block addresses stored in the address storage unit to generate comparison results; wherein the plurality of access address comparison units are mutually independent;
the data access management unit is used for receiving the comparison result sent by the access address comparison unit, and controlling the gating of the corresponding data channel in the data channel management unit when the comparison result is hit, so that the corresponding access source accesses the buffer data block in the data storage unit through the data channel;
a data storage unit for storing a plurality of buffered data blocks;
the address storage unit is used for storing block addresses corresponding to the buffer data blocks;
the data channel management unit comprises a plurality of data channels, and the data channels are mutually independent.
Embodiments of a second aspect of the present application provide a graphics processor GPU comprising: a plurality of arithmetic units and a buffer memory as described above.
Embodiments of a third aspect of the present application provide a processing system comprising: a graphics processor GPU as described above.
An embodiment of a fifth aspect of the present application provides a cache access method using a cache memory, including:
the multiple access address comparison units receive access addresses sent by multiple access sources and compare the access addresses with block addresses stored in the address storage unit to generate comparison results; the access address comparison units are mutually independent;
the data access management unit receives the comparison result sent by each access address comparison unit, and controls the gating of the corresponding data channel in the data channel management unit when the comparison result is hit, so that the corresponding access source accesses the buffer data block in the data storage unit through the data channel;
the data storage unit is used for storing a plurality of buffer data blocks; the address storage unit is used for storing block addresses corresponding to the buffer data blocks; the data channel management unit comprises a plurality of data channels, and all the data channels are mutually independent.
According to the technical scheme provided by the embodiment of the application, a plurality of access address comparison units are used for receiving access addresses sent by a plurality of access sources and comparing the access addresses with block addresses stored in an address storage unit to generate comparison results; the access address comparison units are mutually independent; the data access management unit is used for receiving the comparison result sent by the access address comparison unit, and controlling the gating of the corresponding data channel in the data channel management unit when the comparison result is hit so that the corresponding access source accesses the buffer data block in the data storage unit through the data channel; a data storage unit for storing a plurality of buffered data blocks; the address storage unit is used for storing block addresses corresponding to the buffer data blocks; the data channel management unit comprises a plurality of data channels, and the data channels are mutually independent. According to the technical scheme, the multiple access address comparison units work independently, the access addresses sent by the multiple access sources are received and compared with the block addresses of the buffer data blocks, and when the comparison result is hit, the corresponding data channel gating is controlled through the data access management unit, so that the access sources access the buffer data blocks in the data storage unit through the corresponding data channels independently, and the access efficiency is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a block diagram of a buffer memory according to an embodiment of the present disclosure;
fig. 2 is a block diagram of a GPU according to a fifth embodiment of the present application;
FIG. 3 is a block diagram of a processing system according to a fifth embodiment of the present application;
fig. 4 is a flowchart of a cache access method provided in a sixth embodiment of the present application.
Detailed Description
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of exemplary embodiments of the present application is given with reference to the accompanying drawings, and it is apparent that the described embodiments are only some of the embodiments of the present application and not exhaustive of all the embodiments. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other.
Example 1
The present embodiment provides a buffer memory, which can be applied to a processor, and the processor may be a central processing unit (central processing unit, abbreviated as CPU), a graphics processor (Graphics Processing Unit, abbreviated as GPU), or other processors. The processor includes a plurality of arithmetic units in addition to the buffer memory, each arithmetic unit being accessible to the buffer memory as an access source. The buffer memory provided in this embodiment provides an interface for multiple access sources, so that multiple access sources can access at the same time.
The above-described processor may be applied to a processing system including a processor, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), and the like. The processor is provided with the above-mentioned buffer memory, and the RAM may be referred to as an external memory from the perspective of the buffer memory.
Fig. 1 is a block diagram of a buffer memory according to an embodiment of the present application. As shown in fig. 1, the buffer memory provided in this embodiment includes: an address storage unit 1, an access address comparison unit 2, a data access management unit 3, a data channel management unit 4, and a data storage unit 5.
Wherein data acquired from an external memory is divided into a plurality of buffered data blocks, and stored in the data storage unit 5. The access source may read out the buffered data blocks from the data storage unit 5 and may also modify the content of the buffered data blocks. The block address corresponding to each buffered data block is stored in the address storage unit 1.
The data channel management unit 4 has a plurality of data channels, and the plurality of data channels are mutually independent and work in parallel. At least one data channel may be gated with the data storage unit 5 such that at least one access source may read from and write to the buffered data blocks in the data storage unit 5 via the corresponding data channel.
The number of the access address comparing units 2 is plural, and the plural access address comparing units 2 are configured to receive access addresses sent by plural access sources, and compare the access addresses with block addresses in the address storing unit 1. Specifically, each access address comparing unit 2 is configured to receive an access address sent from an access source, compare the access address with the block address stored in the address storage unit 1, and generate a comparison result. The access address comparing units 2 are independent of each other and can work in parallel. When the access address is the same as a certain block address stored in the address storage unit 1 after comparison, the access address comparison unit 2 generates a hit comparison result; when the access address is different from any one of the block addresses stored in the address storage unit 1, the access address comparing unit 2 generates a miss comparison result, or may also be referred to as a fail comparison result.
The data access management unit 3 is configured to receive the comparison result sent by the access address comparison unit 2. When the comparison result is hit, the data access management unit 3 controls the corresponding data channel strobe so that the corresponding access source can access the buffered data block in the data storage unit 5 through the data channel.
When an access source accesses the buffer memory, one access address comparison unit 2 receives the access address of the access source, compares the access address with the block address stored in the address storage unit 1, generates a comparison result, and sends the comparison result to the data access management unit 3. When the comparison result is hit, the data access management unit 3 controls one data channel strobe in the data channel management unit 4 so that the access source accesses the corresponding buffered data block in the data storage unit 5 through the data channel.
When more than two access sources access the buffer memory, the same number of access address comparison units 2 respectively receive the access address of one access source, respectively compare the received access address with the block address stored in the address storage unit 1, generate a comparison result, and send the comparison result to the data access management unit 3. For the case that the comparison result is hit, the data access management unit 3 controls the data channel gating of which the number is the same as that of hit results, so that each access source accesses the corresponding buffered data block in the data storage unit 5 through one data channel.
Illustrating: when three access sources A, B, C access the buffer memory, the three access sources A, B, C respectively transmit access addresses to the buffer memory. The three access address comparison units 2 respectively receive an access address of an access source, compare the received access address with the block address stored in the address storage unit 1, generate a comparison result, and send the comparison result to the data access management unit 3.
Assuming that all the three comparison structures are hit, the data access management unit 3 controls the gating of three data channels, and the three access sources A, B, C access corresponding buffered data blocks in the data storage unit 5 through one data channel respectively, and the three access sources A, B, C can access in parallel without interfering with each other.
Assuming that the two comparison results corresponding to the access sources a and B are hit, the data access management unit 3 controls the gating of the two data channels, the two access sources A, B access the corresponding buffered data blocks in the data storage unit 5 through one data channel respectively, and the two access sources A, B can access in parallel without interference.
The technical scheme provided by the embodiment is that a plurality of access address comparison units are used for receiving access addresses sent by a plurality of access sources and comparing the access addresses with block addresses stored in an address storage unit to generate comparison results; the access address comparison units are mutually independent; the data access management unit is used for receiving the comparison result sent by the access address comparison unit, and controlling the gating of the corresponding data channel in the data channel management unit when the comparison result is hit so that the corresponding access source accesses the buffer data block in the data storage unit through the data channel; a data storage unit for storing a plurality of buffered data blocks; the address storage unit is used for storing block addresses corresponding to the buffer data blocks; the data channel management unit comprises a plurality of data channels, and the data channels are mutually independent. According to the technical scheme, the multiple access address comparison units work independently, the access addresses sent by the multiple access sources are received and compared with the block addresses of the buffer data blocks, and when the comparison result is hit, the corresponding data channel gating is controlled through the data access management unit, so that the access sources access the buffer data blocks in the data storage unit through the corresponding data channels independently, and the access efficiency is improved.
Example two
The present embodiment optimizes the buffer memory based on the above embodiment:
as shown in fig. 1, the buffer memory further includes: a memory access unit 6. When the comparison result by the access address comparison unit 2 is invalid, the data access management unit 3 controls the external memory access unit 6 to read the data block corresponding to the access address sent by the access source from the external memory and write the data block into the data storage unit 5, and updates the address storage unit 1 according to the address corresponding to the read data block. Then, the access address comparing unit 2 compares the access address with the block address again, and if the address hits, the strobe data channel accesses the buffered data block in the data storage unit 5.
Specifically, the memory access unit 6 may include: and the external memory access module and the write-back data cache module. The external memory access module pre-stores the data blocks read from the external memory into the loading data caching module for caching, and then writes the data blocks into the data storage unit 5.
When there are a plurality of access sources accessing the buffer memory, only the access source whose address comparison is invalid is suspended, and then read from the external memory through the external memory access unit 6. While the access sources of the rest address hits can access the buffered data blocks through the data channels in parallel. The operation of suspending the access source does not affect the normal access operation of other address hit access sources, can improve the utilization rate of the data storage unit 5, and solves the problem of low access efficiency caused by the fact that other access sources have to wait all the time after the access operation of a certain access source is suspended in the traditional scheme.
In the above process, it is necessary to select one of the data blocks to be replaced for storing data to be read from the external memory. The data block to be replaced may be the data block with the lowest utilization, for example, may be determined by a least recently used algorithm.
When the selected data block to be replaced is not modified in the operation process, which is equivalent to that the data content is the same as that of the external memory, the data read from the external memory can be directly replaced by the data block to be replaced.
However, when the selected data block to be replaced is modified in the operation process, which is equivalent to that the data content of the selected data block to be replaced is different from that of the external memory, the selected data block to be replaced needs to be written back to the external memory, and then the data read from the external memory is replaced. Specifically, the external memory access unit 6 further includes a load data buffer module. The external memory access module is used for pre-storing the selected data block to be replaced into the write-back data cache module for caching when the control instruction sent by the data access management unit 3 after the comparison result is invalid is received, and then writing the data block into the external memory. After that, the data block read from the external memory is pre-stored into the loading data buffer module, and written into the data storage unit 5 to replace the data block to be replaced. The data access management unit 3 then sends the address of the data block to be replaced to the address storage unit 1 so that the address storage unit 1 corresponds to the update block address.
Example III
The present embodiment is based on the above embodiment, and optimizes the buffer memory, in particular, optimizes the address storage unit 1 and the access address comparing unit 2:
the address storage unit 1 may store therein block addresses corresponding to all buffered data blocks. The access address comparison unit 2 may compare the access address received from the access source with all block addresses in the address storage unit 1.
Or, the block addresses may be grouped, and the block addresses corresponding to the buffered data blocks associated with each access source are divided into a group, so that when the access address of one access source is received by the access address comparison unit 2, only the group of block addresses associated with the access source can be compared, the number and the times of comparison are reduced, the comparison time is shortened, and the improvement of the access efficiency is facilitated.
A specific implementation mode is as follows: the address storage unit 1 adopts a group association scheme, wherein each address of the address storage unit 1 stores the addresses of a plurality of groups of buffer data blocks, and the addresses of all the buffer data blocks in one address are associated with one access source.
Further, the address storage unit 1 includes a plurality of register sets, each of which is configured to store a plurality of block addresses, specifically, a plurality of block addresses of a buffered data block corresponding to one access source. The register group can improve a plurality of read address inlets, so that the access address comparison unit 2 can acquire block addresses conveniently, and addressing and address comparison of a plurality of access sources can be realized conveniently. In the practical application process, the configuration of the register set can be determined according to the capacity and the area of the address storage unit 1.
One specific implementation mode: the access address comparing unit 2 includes: the system comprises an address acquisition module and an address comparison module, wherein the address acquisition module is used for acquiring an access address sent by an access source. The address comparison module is used for intercepting a high-order part in the access address, comparing the high-order part with a plurality of block addresses stored in the address storage unit 1, and then outputting a comparison result. When the address hits, the comparison result contains address hit information and the hit block address number.
Further, when the address storage unit 1 is implemented by using a plurality of register sets, the access address comparing unit 2 further includes an address obtaining module, configured to determine a target register set according to the access address sent from the access source, and obtain a block address in the target register set, so that the address comparing module compares a high-order part in the access address with the block address in the target register set. The access address comparison unit 2 is used for comparing the block addresses in the target register group associated with the access source, so that the number and times of comparison are reduced, the comparison time is shortened, and the access efficiency is improved.
Example IV
The embodiment provides a buffer memory implementation manner based on the above embodiment:
as shown in fig. 1, the buffer memory provided in this embodiment includes: an address storage unit 1, an access address comparison unit 2, a data access management unit 3, a data channel management unit 4, and a data storage unit 5.
The address storage unit 1 is called Cache Tag, and the number of the address storage units 1 is one. The data read by the external memory is divided into a plurality of data blocks, and each data block is called a Cache Line. A number of data blocks are stored in the data storage unit 5, and the block addresses of the data blocks are stored in the address storage unit 1.
The address storage unit 1 adopts a group association scheme, is realized by adopting a register group, and can address and read a plurality of access sources at the same time, so that the conflict of the accesses of the plurality of access sources is avoided.
The 8-way access address comparison unit 2 is adopted, so that the access addresses of 8 access sources can be compared at the same time. The activation of each access address comparing unit 2 may be sequentially allocated according to the number of access sources, that is: the first path of access address comparison unit 2 is allocated to the first access source which initiates the access, the second path of access address comparison unit 2 is allocated to the second access source which initiates the access, and so on. Alternatively, the 8-way access address comparing unit 2 may be in one-to-one correspondence with the access sources, that is: an access address comparing unit 2 is configured to receive an access address sent from a corresponding fixed access source, and compare the access address with a block address.
Specifically, as shown in fig. 1, the 8 access address comparing units 2 are respectively: and the number 0 access address comparison unit is used for comparing the number 7 access addresses. The address comparison unit 0 correspondingly receives and compares the access address sent by the access source 0, the address comparison unit 1 correspondingly receives and compares the access address sent by the access source 1, and the like.
The access address comparing unit 2 compares the high order part of the access address with the multi-item block address addressed by the address storage unit 1 synchronously, and outputs hit information and hit block address number when the comparison result is hit. The plurality of access address comparing units 2 are independent from each other and can be operated simultaneously. The 8 access address comparison units 2 respectively send the comparison results to the data access management unit 3.
The data storage unit 5 is used for storing buffered data blocks. The data storage unit 5 in this embodiment is formed by splicing a plurality of RAM bodies of random access memory, each RAM body stores a plurality of buffered data blocks, and the RAM bodies are mutually independent. Each RAM body provides an access interface, so that the requirement that a plurality of access sources access different RAM bodies simultaneously can be met. When multiple access sources access different RAM banks simultaneously, the data storage unit 5 can respond to the access requirements of the multiple access sources simultaneously. When the hit RAM body executes the read-write operation, the RAM body corresponding to the data block to be replaced can synchronously execute the data replacement operation, and the RAM body is not affected with the hit RAM body. The splicing structure of the RAM body can be determined specifically according to factors such as the area, the power consumption, the efficiency and the like of the buffer memory.
When multiple access sources hit the same buffered data block, the accesses may be performed sequentially in the order of access or in priority.
The number of data channels in the data channel management unit 4 may be the same as the number of access sources, and the data channels may be allocated according to the access order of the access sources, for example: a first data channel is assigned to the access source of a first access, a second data channel is assigned to the access source of a second access, and so on.
Alternatively, the data channels are in one-to-one correspondence with the access sources, namely: the access source accesses the data storage unit 5 through its corresponding dedicated data channel. As shown in fig. 1, 8 data channels are used, namely, 0 data channel to 7 data channel. Wherein, the data channel No. 0 corresponds with the access source No. 0 one by one, the data channel No. 1 corresponds with the access source No. 1 one by one, and so on. The data channel management unit 4 is further provided with a data strobe module for strobing at least two data channels with the data storage unit at the same time. The data strobe module receives the data access management unit and controls the data channel module to strobe the corresponding data channel when the address comparison result is hit.
Each data channel comprises a read data channel and a write data temporary cache. When the address comparison hits, each data channel independently completes access operation, and can read from the RAM body or write into the RAM body.
When the address comparison fails, the external memory access unit 6 is configured to send a read-write command to the external memory according to an external protocol, acquire a buffer data block that needs to be written back to the external memory by using the RAM body, write back the buffer data block to the external memory, temporarily receive and store a data block that is acquired from the external memory and is required by an access source, update the corresponding data block to be replaced when the data storage unit 5 allows, and update the address storage unit 1 through the data access management unit 3. For example: the data of the data block to be replaced is X, the X is written back to the external memory, then the data Y in the external memory is read out, and the data block to be replaced is replaced by Y when the data storage unit 5 allows.
When the above-mentioned data storage unit 5 is permitted, it may be regarded as permitting updating of the data block to be replaced when it is determined that the storage area corresponding to the data block to be replaced is not hit by another access source.
The data access management unit 3 serves as a global management module of the buffer memory, and globally manages the above units to cooperatively operate.
The number 8 in the present embodiment is merely illustrative, and the number of the access address comparing unit 2 and the number of the data channels may not be limited to 8, and may be less than 8, or may be greater than 8.
In addition, when the address comparison of a plurality of access sources appears to fail, only the access source corresponding to the comparison fail needs to be suspended, and the buffered data block replacement operation is performed by the external memory access unit 6 in accordance with the access order or in accordance with the priority. The access source of the comparison hit can normally execute the access operation, and the utilization rate of the data storage unit 5 is improved.
The address storage unit 1, the access address comparison unit 2, the data access management unit 3, the data channel management unit 4, and the data storage unit 5 in the above embodiments may be constructed by adopting hardware structures, and the specific hardware structure of each unit is not limited in this embodiment, as long as the functions of the above units can be realized.
Example five
Fig. 2 is a block diagram of a GPU according to a fifth embodiment of the present application. As shown in fig. 2, the present embodiment provides a GPU of a graphics processor, including: a plurality of arithmetic units 21, and a buffer memory 22, the plurality of arithmetic units 21 being accessible to the buffer memory 22 as an access source. The buffer memory 22 may employ any of the above embodiments.
Fig. 3 is a block diagram of a processing system according to a fifth embodiment of the present application. As shown in fig. 3, the present embodiment provides a processing system, including: the graphics processor GPU31 is configured to perform graphics processing tasks. In addition, the processing system may also include a central processing unit CPU32 and a random access memory RAM 33.CPU32 may issue graphics processing tasks to GPU31, which GPU31 performs. The CPU32 and the GPU31 may access the random access memory 33 during operation.
The GPU, the processing system and the electronic terminal provided in the embodiment have the same technical effects as the buffer memory.
Example six
The present embodiment provides a cache access method, which can be executed by the above-mentioned buffer memory, based on the above-mentioned embodiments.
Fig. 4 is a flowchart of a cache access method provided in a sixth embodiment of the present application. As shown in fig. 4, the cache access method provided in this embodiment includes:
in step 401, the multiple access address comparing unit receives the access addresses sent by the multiple access sources, and compares the access addresses with the block addresses stored in the address storage unit, so as to generate a comparison result.
The number of the access address comparing units 2 is plural, and each access address comparing unit 2 is configured to correspondingly receive an access address sent by an access source. The access address comparing units 2 are independent of each other and can work in parallel.
A plurality of buffered data blocks are stored in the data storage unit, and block addresses corresponding to the buffered data blocks are stored in the address storage unit.
Each access address comparing unit 2, after receiving the access address from the access source, compares the access address with the block address stored in the address storing unit 1. The block address stored in the address storage unit 1 is the address of each buffered data block in the data storage unit 5.
When the access address is the same as a certain block address, the comparison result is hit; when the access address is different from any block address, the comparison result is invalid.
Step 402, the data access management unit receives the comparison result sent by each access address comparison unit, and controls the corresponding data channel gating in the data channel management unit when the comparison result is hit, so that the corresponding access source accesses the buffer data block in the data storage unit through the data channel.
The number of the data channels is multiple, and the data channels are mutually independent. Each data channel can be gated corresponding to a data storage unit. The data access management unit 3 receives the comparison result sent by the access address comparison unit 2, and when the comparison result is hit, controls the corresponding data channel to be gated with the data storage unit 5 so that the access source can access the buffered data block in the data storage unit 5 through the data channel.
In the above steps, the access address comparing units 2 work independently of each other, and each access address comparing unit 2 can acquire and compare access addresses. The data channels are mutually independent, and gating of the data channels is not influenced, so that a plurality of access sources can access at the same time.
The technical scheme provided by the embodiment is that a plurality of access address comparison units are used for receiving access addresses sent by a plurality of access sources and comparing the access addresses with block addresses stored in an address storage unit to generate comparison results; the access address comparison units are mutually independent; the data access management unit is used for receiving the comparison result sent by the access address comparison unit, and controlling the gating of the corresponding data channel in the data channel management unit when the comparison result is hit so that the corresponding access source accesses the buffer data block in the data storage unit through the data channel; a data storage unit for storing a plurality of buffered data blocks; the address storage unit is used for storing block addresses corresponding to the buffer data blocks; the data channel management unit comprises a plurality of data channels, and the data channels are mutually independent. According to the technical scheme, the multiple access address comparison units work independently, the access addresses sent by the multiple access sources are received and compared with the block addresses of the buffer data blocks, and when the comparison result is hit, the corresponding data channel gating is controlled through the data access management unit, so that the access sources access the buffer data blocks in the data storage unit through the corresponding data channels independently, and the access efficiency is improved.
On the basis of the above technical solution, when the comparison result is that the comparison fails, the data access management unit 3 controls the external memory access unit 6 to read the data block corresponding to the access address sent by the access source from the external memory and write the data block into the data storage unit, and then updates the block address stored in the address storage unit 1.
One implementation: and selecting one buffer data block in the data storage unit in advance as a data block to be replaced, and replacing the data block read from the external memory by the data block to be replaced. The data block to be replaced may be selected by a least recently used algorithm.
When the selected data block to be replaced is not modified in the operation process, which is equivalent to that the data content is the same as that of the external memory, the data read from the external memory can be directly replaced by the data block to be replaced.
When the selected data block to be replaced is modified in the operation process, which is equivalent to that the data content of the selected data block to be replaced is different from that of the external memory, the selected data block to be replaced needs to be written back to the external memory, and then the data block read from the external memory is replaced by the data block to be replaced.
After the external memory access unit 6 finishes the acquisition of the data block from the external memory and replaces the data block to be replaced selected in the data storage unit 5, the address storage unit 1 is updated by the data access management unit 3 so that the access address comparison unit 2 performs the address comparison again.
In addition, the access address comparing unit 2 can compare the high order part with the block address in the access address in the process of comparing the addresses, so that the number of compared address bits can be reduced, the comparison time can be shortened, and the access efficiency can be improved.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, the meaning of "plurality" is at least two, such as two, three, etc., unless explicitly defined otherwise.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (13)

1. A buffer memory, comprising:
the multiple access address comparison units are used for receiving access addresses sent by multiple access sources and comparing the access addresses with block addresses stored in the address storage unit to generate comparison results; wherein the plurality of access address comparison units are mutually independent;
the data access management unit is used for receiving the comparison result sent by the access address comparison unit, and controlling the gating of the corresponding data channel in the data channel management unit when the comparison result is hit, so that the corresponding access source accesses the buffer data block in the data storage unit through the data channel;
a data storage unit for storing a plurality of buffered data blocks;
the address storage unit is used for storing block addresses corresponding to the buffer data blocks;
the data channel management unit comprises a plurality of data channels, and the data channels are mutually independent;
the external memory access unit is used for reading the data block corresponding to the access address sent by the access source from the external memory and writing the data block into the data storage unit when the comparison result received by the data access management unit is invalid; the memory access unit includes: the device comprises an external memory access module, a loading data cache module and a write-back data cache module;
the external memory access module is used for reading the data block corresponding to the access address sent by the access source from the external memory when the comparison result received by the data access management unit is invalid, and writing the data block into the data storage unit after the data block is cached by the loading data cache module;
before the external memory access module reads the data block corresponding to the access address sent by the access source from the external memory, the method further comprises: the external memory access module caches the selected data block to be replaced by the write-back data cache module and writes the cached data block into an external memory;
after the external memory access module reads the data block corresponding to the access address sent by the access source from the external memory, the method further comprises the following steps: and the external memory access module replaces the data block to be replaced with the data block read from the external memory.
2. The buffer memory of claim 1, wherein the address storage unit comprises: a plurality of register sets, each for storing a block address of a buffered data block corresponding to one access source.
3. The buffer memory of claim 2, wherein the access address comparison unit comprises:
the access address acquisition module is used for acquiring an access address sent by an access source;
and the address comparison module is used for comparing the high-order part in the access address with the block address stored in the address storage unit and generating a comparison result.
4. The buffer memory of claim 3, wherein the access address comparison unit further comprises:
the block address acquisition module is used for determining a target register set according to the access address sent by the access source and acquiring a block address in the target register set, wherein the block address is used for comparing high-order parts in the access address by the address comparison module.
5. The buffer memory of claim 1, wherein the access address comparing unit is in one-to-one correspondence with the access sources, and an access address comparing unit is configured to receive an access address sent from the corresponding access source and compare the access address with the block address.
6. The buffer memory of claim 1, wherein the data storage unit is formed by stitching a plurality of random access memory banks, each providing an access interface.
7. The buffer memory of claim 1, wherein the data channels in the data channel management unit correspond one-to-one to the access sources, and the access sources access the data storage units through the data channels corresponding thereto.
8. A graphics processor GPU, comprising: a plurality of arithmetic units and a buffer memory as claimed in any one of claims 1 to 7.
9. A processing system, comprising: a graphics processor GPU according to claim 8.
10. A cache access method using the cache memory of any one of claims 1-7, comprising:
the multiple access address comparison units receive access addresses sent by multiple access sources and compare the access addresses with block addresses stored in the address storage unit to generate comparison results; the access address comparison units are mutually independent;
the data access management unit receives the comparison result sent by each access address comparison unit, and controls the gating of the corresponding data channel in the data channel management unit when the comparison result is hit, so that the corresponding access source accesses the buffer data block in the data storage unit through the data channel;
the data storage unit is used for storing a plurality of buffer data blocks; the address storage unit is used for storing block addresses corresponding to the buffer data blocks; the data channel management unit comprises a plurality of data channels, and all the data channels are mutually independent.
11. The cache access method of claim 10, further comprising:
and when the comparison result is invalid, the data access management unit controls the external memory access unit to read the data block corresponding to the access address sent by the access source from the external memory and write the data block into the data storage unit.
12. The cache access method according to claim 11, further comprising, before the data access management unit controls the external memory access unit to read, from the external memory, a data block corresponding to an access address transmitted from the access source:
the data access management unit controls the external memory access unit to write the selected data block to be replaced back to the external memory.
13. The cache access method according to claim 10, wherein the access address comparing unit compares the access address with the block address stored in the address storage unit, specifically:
the access address comparing unit compares a higher order part in the access address with the block address stored in the address storing unit.
CN202110228263.8A 2021-03-02 2021-03-02 Buffer memory, GPU, processing system and buffer access method Active CN113051194B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110228263.8A CN113051194B (en) 2021-03-02 2021-03-02 Buffer memory, GPU, processing system and buffer access method
PCT/CN2021/087350 WO2022183571A1 (en) 2021-03-02 2021-04-15 Buffer memory, gpu, processing system and cache access method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110228263.8A CN113051194B (en) 2021-03-02 2021-03-02 Buffer memory, GPU, processing system and buffer access method

Publications (2)

Publication Number Publication Date
CN113051194A CN113051194A (en) 2021-06-29
CN113051194B true CN113051194B (en) 2023-06-09

Family

ID=76509714

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110228263.8A Active CN113051194B (en) 2021-03-02 2021-03-02 Buffer memory, GPU, processing system and buffer access method

Country Status (2)

Country Link
CN (1) CN113051194B (en)
WO (1) WO2022183571A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298561A (en) * 2011-08-10 2011-12-28 北京百度网讯科技有限公司 Method for conducting multi-channel data processing to storage device and system and device
JP2013174997A (en) * 2012-02-24 2013-09-05 Mitsubishi Electric Corp Cache control device and cache control method

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4895262B2 (en) * 2005-12-09 2012-03-14 株式会社メガチップス Information processing apparatus, controller, and file reading method
CN100472494C (en) * 2007-02-05 2009-03-25 北京中星微电子有限公司 System and method for implementing memory mediation of supporting multi-bus multi-type memory device
TWI366094B (en) * 2007-12-28 2012-06-11 Asmedia Technology Inc Method and system of integrating data assessing commands and data accessing device thereof
CN102147757B (en) * 2010-02-08 2013-07-31 安凯(广州)微电子技术有限公司 Test device and method
CN102012872B (en) * 2010-11-24 2012-05-02 烽火通信科技股份有限公司 Level two cache control method and device for embedded system
US9916253B2 (en) * 2012-07-30 2018-03-13 Intel Corporation Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput
CN106569727B (en) * 2015-10-08 2019-04-16 福州瑞芯微电子股份有限公司 Multi-memory shares parallel data read-write equipment and its write-in, read method between a kind of multi-controller
CN111209232B (en) * 2018-11-21 2022-04-22 昆仑芯(北京)科技有限公司 Method, apparatus, device and storage medium for accessing static random access memory
CN111881068A (en) * 2020-06-30 2020-11-03 北京思朗科技有限责任公司 Multi-entry fully associative cache memory and data management method
CN112231254B (en) * 2020-09-22 2022-04-26 深圳云天励飞技术股份有限公司 Memory arbitration method and memory controller
CN112214427B (en) * 2020-10-10 2022-02-11 中科声龙科技发展(北京)有限公司 Cache structure, workload proving operation chip circuit and data calling method thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298561A (en) * 2011-08-10 2011-12-28 北京百度网讯科技有限公司 Method for conducting multi-channel data processing to storage device and system and device
JP2013174997A (en) * 2012-02-24 2013-09-05 Mitsubishi Electric Corp Cache control device and cache control method

Also Published As

Publication number Publication date
WO2022183571A1 (en) 2022-09-09
CN113051194A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
US7415575B1 (en) Shared cache with client-specific replacement policy
US9218286B2 (en) System cache with partial write valid states
KR100190350B1 (en) High-performance frame buffer and cache memory system and method
US5535361A (en) Cache block replacement scheme based on directory control bit set/reset and hit/miss basis in a multiheading multiprocessor environment
EP0407119A2 (en) Apparatus and method for reading, writing and refreshing memory with direct virtual or physical access
US20110173393A1 (en) Cache memory, memory system, and control method therefor
JPH10232839A (en) Cache system and operation method for cache memory
US20090300293A1 (en) Dynamically Partitionable Cache
US9798543B2 (en) Fast mapping table register file allocation algorithm for SIMT processors
JP2002373115A (en) Replacement control method for shared cache memory and device therefor
US9342461B2 (en) Cache memory system and method using dynamically allocated dirty mask space
JP4888839B2 (en) Vector computer system having cache memory and method of operating the same
KR100634930B1 (en) Use of a context identifier in cache memory
US9311251B2 (en) System cache with sticky allocation
US20170123978A1 (en) Organizing Memory to Optimize Memory Accesses of Compressed Data
US20050050281A1 (en) System and method for cache external writing and write shadowing
US7743215B2 (en) Cache-memory control apparatus, cache-memory control method and computer product
US10402323B2 (en) Organizing memory to optimize memory accesses of compressed data
JP2014085707A (en) Cache control apparatus and cache control method
JPH06318174A (en) Cache memory system and method for performing cache for subset of data stored in main memory
CN113051194B (en) Buffer memory, GPU, processing system and buffer access method
US8549228B2 (en) Processor and method of control of processor
JP6249120B1 (en) Processor
US20220066946A1 (en) Techniques to improve translation lookaside buffer reach by leveraging idle resources
JPWO2018061192A1 (en) Information processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant