CN111651376B - Data reading and writing method, processor chip and computer equipment - Google Patents

Data reading and writing method, processor chip and computer equipment Download PDF

Info

Publication number
CN111651376B
CN111651376B CN202010642465.2A CN202010642465A CN111651376B CN 111651376 B CN111651376 B CN 111651376B CN 202010642465 A CN202010642465 A CN 202010642465A CN 111651376 B CN111651376 B CN 111651376B
Authority
CN
China
Prior art keywords
cache
processor
cache block
block
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010642465.2A
Other languages
Chinese (zh)
Other versions
CN111651376A (en
Inventor
刘君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202010642465.2A priority Critical patent/CN111651376B/en
Publication of CN111651376A publication Critical patent/CN111651376A/en
Application granted granted Critical
Publication of CN111651376B publication Critical patent/CN111651376B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • G06F12/0833Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means in combination with broadcast means (e.g. for invalidation or updating)
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0842Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/62Details of cache specific to multiprocessor cache arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The embodiment of the application discloses a data reading and writing method, a processor chip and computer equipment, and belongs to the technical field of chips. The processor chip includes: at least two processor cores, processor caches corresponding to each processor core, a victim cache, an interconnection bus, and an off-chip interface; the off-chip interface is electrically connected with the interconnection bus; the interconnection bus is electrically connected with each processor cache respectively, and the interconnection bus is electrically connected with the sacrifice cache; each processor cache is electrically connected with a corresponding processor core, and each processor cache is electrically connected with the sacrifice cache; each processor core is electrically connected with the sacrifice cache; the victim cache is used for storing cache blocks in a shared state. The processor chip provided by the embodiment of the application can avoid the problem of pseudo cache sharing caused by the fact that different processor cores perform write operations on different addresses in the same shared cache block, and improves the processor performance of the processor chip.

Description

Data reading and writing method, processor chip and computer equipment
Technical Field
The embodiment of the application relates to the technical field of chips, in particular to a data reading and writing method, a processor chip and computer equipment.
Background
The MESI (Modified Exclusive Shared Or Invalid) protocol is widely used in processors as a protocol supporting write-back policy, and is used for guaranteeing cache consistency among processor cores.
The MESI protocol specifies that a cache block (cache line) in a cache includes 4 states, namely a Modified (M) state, an Exclusive (E) state, a Shared (S) state, and an Invalid (I) state. When a cache block is cached by a plurality of processor cores, the cache block is in a shared state, and when one processor core modifies data in the cache block, the modifying operation triggers a snoop operation flow to invalidate the cache block in the other processor cores, i.e. the cache block in the other processor cores becomes an invalid state.
Disclosure of Invention
The embodiment of the application provides a data reading and writing method, a processor chip and computer equipment. The technical scheme is as follows:
in one aspect, an embodiment of the present application provides a processor chip, including: at least two processor cores, processor caches corresponding to each processor core, a victim cache, an interconnection bus, and an off-chip interface;
the off-chip interface is electrically connected with the interconnection bus;
the interconnection buses are respectively and electrically connected with the processor caches, and the interconnection buses are electrically connected with the sacrifice caches;
each processor cache is electrically connected with a corresponding processor core, and each processor cache is electrically connected with the sacrifice cache;
each processor core is electrically connected with the sacrifice cache;
the victim cache is used for storing cache blocks in a sharing state.
In another aspect, an embodiment of the present application provides a data read-write method, where the method is applied to a processor chip as described in the foregoing aspect, and the method includes:
when the processor core initiates a write operation to a target cache block, inquiring the target cache block in the sacrifice cache and a processor cache corresponding to the processor core;
and if the target cache block is queried in the victim cache, the processor core performs write operation on the target cache block in the victim cache.
In another aspect, embodiments of the present application provide a computer device comprising a processor and a memory, the processor comprising a processor chip as claimed in any one of claims 1 to 8.
The technical scheme provided by the embodiment of the application at least comprises the following beneficial effects:
in the embodiment of the application, the victim cache is added in the processor chip and is respectively and electrically connected with each processor core, the victim cache is respectively and electrically connected with each processor cache, and the cache block in the shared state is stored in the victim cache, when the processor cores need to write the shared cache block, the write operation is directly carried out on the cache block in the victim cache, the exclusive interconnection bus is not needed, the problem of cache pseudo sharing caused by the write operation of different processor cores on different addresses in the same shared cache block is avoided, and the processor performance of the processor chip is improved.
Drawings
FIG. 1 shows a block diagram of a processor chip with multiple cores;
FIG. 2 is a schematic diagram illustrating an implementation of the processor chip of FIG. 1 when writing to a shared cache block;
FIG. 3 illustrates a block diagram of a processor chip provided in accordance with an exemplary embodiment of the present application;
FIG. 4 is a schematic diagram illustrating an implementation of the processor chip of FIG. 3 when writing to a shared cache block;
FIG. 5 is a flow chart illustrating a method of writing and reading data according to an exemplary embodiment of the present application;
FIG. 6 is a flow chart illustrating a data read/write method according to another exemplary embodiment of the present application;
fig. 7 shows a schematic structural diagram of a computer device according to an exemplary embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
References herein to "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
For ease of understanding, terms involved in embodiments of the present application are described below.
Cache (cache): refers to a high-speed memory that has faster access than random access memory (Random Access Memory, RAM), and cache typically employs static random access memory (Static Random Access Memory, SRAM) technology, as opposed to dynamic random access memory (Dynamic Random Access Memory, DRAM) technology for main memory (i.e., RAM).
The cache in the embodiment of the application refers to a processor cache and is used for accelerating the data access speed of a processor core. Typically, the processor cores of the processor correspond to multiple levels of caches, and the storage space and access speed of the different levels of caches are different. Wherein the closer to the processor core the smaller the memory space of the cache and the faster the access speed.
Cache block (cache line): the data transmission unit is used for transmitting data between the cache and the memory, namely, the data in the cache is stored in the form of cache blocks.
Typically, the size of the cache block is between 32 and 256 bytes (e.g., 64 bytes), and different addresses may be included in the same cache block, and the processor core may write data at different addresses in the cache block.
Cache block status: the MESI protocol specifies that the cache block includes 4 states, namely a modified state, an exclusive state, a shared state, and an invalid state. In the exclusive state, the cache block is only cached in the cache corresponding to the current processor core, and the data of the cache block is not modified and is consistent with the data in the main memory; in the sharing state, the cache block is cached in caches corresponding to at least two processor cores, and the cache block is not modified and is consistent with data in the main memory; in the modified state, the cache block is only cached in the cache corresponding to the current processor core, and the data of the cache block is modified and inconsistent with the data in the main memory; if a certain processor core modifies the data of the cache block in the shared state, the cache blocks in the corresponding caches of other processor cores become invalid.
Caching pseudo sharing: in a processor chip with multiple processor cores, a cache false sharing phenomenon occurs when multiple processor cores or hardware threads access different addresses in the same cache block respectively. Because a large number of exclusive requests to the interconnection bus are generated when the phenomenon of pseudo sharing of the cache occurs, the running time of the processor is increased, and the processing performance of the processor chip is further affected.
Victim cache (victimized cache): the victim cache in the embodiment of the application is independent of the processor cache and is used for storing the cache blocks in a shared state. In the processor chip, the read-write operation of the shared cache block by different processor cores is performed in the victim cache. In some embodiments, the memory space of the victim cache is smaller than the memory space of the processor cache.
Referring to FIG. 1, a block diagram of a processor chip with multiple cores is shown. The processor chip 100 includes: at least two processor cores, processor caches corresponding to each processor core, an interconnection bus, and an off-chip interface.
In fig. 1, a processor chip 100 is schematically illustrated as including four processor cores, namely, a processor core 101, a processor core 102, a processor core 103, and a processor core 104.
Each processor core is electrically connected with a corresponding processor cache. As shown in fig. 1, processor core 101 is electrically connected to processor cache 105, processor core 102 is electrically connected to processor cache 106, processor core 103 is electrically connected to processor cache 107, and processor core 104 is electrically connected to processor cache 108. The processor cache may be at least one of an L1 cache, an L2 cache, an L3 cache, or an L4 cache, which is not limited in this embodiment.
Each processor cache is electrically connected to an Interconnect (Interconnect) bus 109, so that data transmission with the main memory or data transmission between each processor cache is achieved through the Interconnect bus 109. As shown in fig. 1, processor cache 105, processor cache 106, processor cache 107, and processor cache 108 are each electrically coupled to interconnect bus 109.
In some embodiments, the bus control authority of the interconnect bus 109 may be obtained when the processor core performs a write operation.
The interconnect bus 109 is electrically connected to an off-chip interface 110. The off-chip interface 110 is used to connect external devices, and may include: at least one of a high-speed serial port, an optical module interface, a camera acquisition interface, a high-speed data interface, a high-speed serial computer expansion bus standard (Peripheral Component Interconnect express, PCIe) interface, an Ethernet interface and a Bluetooth interface.
As shown in fig. 1, the processor chip 100 may be electrically connected to the memory 120 through the off-chip interface 110, so as to perform read/write operations on the memory 120.
In some embodiments, a read-write channel and a snoop channel exist between each processor cache and the interconnection bus, and in the process of data read-write, the processor cache realizes cache consistency through a read-write channel and a snoop channel.
On the basis of fig. 1, as shown in fig. 2, an implementation schematic diagram of the processor chip 100 when performing a write operation on a cache block in a shared state is shown.
Processor cache 105 of processor core 101 includes cache block 0, processor cache 106 of processor core 102 includes cache block 1, processor cache 107 of processor core 103 includes cache block 2, and processor cache 108 of processor core 104 includes cache block 0. When the processor core 101 initiates a write operation to the cache block 0 in the processor cache 0, since the cache block 0 is in a shared state, cache coherency upon the write operation needs to be ensured by the following steps.
1. The processor core 101 initiates a write operation to cache block 0. Wherein, cache block 0 is in a shared state.
2. The processor core 101 acquires the operation right of the interconnect bus 109, and sends a read/write operation signal to the interconnect bus 109 through the read/write path 21 between the processor cache 105 and the interconnect bus 109.
3. After receiving the read/write operation signal, the interconnect bus 109 sends the read/write operation signal to the processor core 104 through the snoop path 221 with the processor cache 108.
4. The processor core 104 invalidates the cache block 0 in the processor cache 108 upon receipt of the read/write operation signal.
5. The processor core 104 sends an invalidation signal to the interconnect bus 109 via snoop path 221. The invalidation signal is used to indicate that a cache block in the local cache has been invalidated.
6. The interconnect bus 109 sends an invalidation signal to the processor core 101 via the read/write path 211.
7. The processor core 101 evicts the cache block based on the invalidation signal.
8. The processor core 101 writes data to the processor cache 105.
Therefore, when writing operation is performed on the cache block in the shared state, the processor core needs to acquire the operation right of the interconnection bus, and further a series of monitoring operation flows are realized through the interconnection bus.
However, since the cache block contains different addresses, when different processor cores perform write operation on data with different addresses in the same cache block, a snoop operation flow is initiated, but at this time, there is actually no data sharing requirement between different processor cores, so that a problem of cache pseudo sharing is caused.
For a simple example, when processor core 0 writes data in address 0 and processor core 1 writes data in address 1, since address 0 and address 1 are both located in cache block 0, a snoop operation flow needs to be performed; when the processor core 2 writes data in the address 2, since both the address 2 and the address 1 are located in the cache block 0, a snoop operation procedure needs to be executed; when the processor core 3 writes data in the address 3, since both the address 3 and the address 2 are located in the cache block 0, a snoop operation procedure needs to be executed; when processor core 0 writes data in address 4, a snoop operation flow needs to be performed because address 4 and address 3 are both located in cache block 0.
Obviously, when the cache pseudo sharing occurs, the processor core needs to execute a monitoring operation flow once every time when executing the write operation, so that the interconnection bus is occupied for a long time, the running time of the processor chip is prolonged, and the processing performance of the processor chip is further affected.
In order to solve the problem of pseudo sharing of the cache on the premise of not modifying the memory address, the embodiment of the application additionally adds the sacrifice cache in the processor chip, thereby solving the problem of pseudo sharing of the cache by means of the sacrifice cache.
Referring to fig. 3, a block diagram of a processor chip according to an exemplary embodiment of the application is shown. The processor chip comprises at least two processor cores, processor caches corresponding to the processor cores, a victim cache, an interconnection bus and an off-chip interface.
The processor chip 300 may be any one of a central processing unit (Central Processing Unit, CPU), a Field programmable gate array (Field-Programmable Gate Array, FPGA), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a graphics processor (Graphics Processing Unit, GPU) or an artificial intelligence (Artificial Intelligence, AI) chip, which is not limited in this embodiment of the application.
In fig. 3, a processor chip 300 including 2 processor cores, processor core 301 and processor core 302, respectively, is schematically illustrated. In other possible implementations, the number of processor cores in the processor chip may be more than 2, which is not limited by the present embodiment.
Each processor core is electrically connected with a corresponding processor cache. As shown in FIG. 3, processor core 301 is electrically coupled to processor cache 303, and processor core 302 is electrically coupled to processor cache 304. The processor cache may be at least one of an L1 cache, an L2 cache, an L3 cache, or an L4 cache, which is not limited in this embodiment.
In the embodiment of the present application, the processor chip 300 further includes a victim cache 305 in addition to the processor cache 303. The victim cache 305 is electrically connected to each processor core, and the victim cache 305 is electrically connected to each processor cache 303.
As shown in FIG. 3, victim cache 305 is electrically coupled to processor core 301 and processor core 302, respectively, and victim cache 305 is electrically coupled to processor cache 303 and processor cache 304, respectively.
The victim cache in the embodiment of the application is used for storing the cache blocks in the shared state, and correspondingly, when the processor core needs to perform write operation on the shared cache blocks, the data write operation is directly performed in the victim cache without acquiring the operation right of the interconnection bus, and the monitoring operation flow is performed, so that the occupation of the cache pseudo-sharing on the interconnection bus is avoided, the running time of the processor chip is reduced, and the performance of the processor chip is improved.
In some embodiments, the victim cache and the processor cache use the same random access technique to ensure access speed to the data in the victim cache. And, since the victim cache is used to store shared cache blocks, while the cache blocks in a non-shared state are still stored in the processor cache, the cache space of the victim cache is smaller than the cache space of the processor cache.
Similar to the processor cache, the victim cache is also electrically coupled to the interconnect bus for data write-back operations via the interconnect bus. As shown in FIG. 3, processor cache 303, processor cache 304, and victim cache 305 are all electrically coupled to interconnect bus 306.
The interconnect bus 306 is electrically connected to an off-chip interface 307. The off-chip interface 307 is used to connect external devices, which may include: at least one of a high-speed serial port, an optical module interface, a camera acquisition interface, a high-speed data interface, a PCIe interface, an Ethernet interface and a Bluetooth interface.
In summary, in the embodiment of the present application, by adding the victim cache in the processor chip, electrically connecting the victim cache with each processor core, and storing the cache block in the shared state to the victim cache, when the processor cores need to write the shared cache block, the write operation is directly performed on the cache block in the victim cache, without monopolizing the interconnection bus, thereby avoiding the problem of cache pseudo sharing caused when different processor cores write different addresses in the same shared cache block, and improving the processor performance of the processor chip.
The data writing operation procedure of the processor chip 300 shown in fig. 3 is described below by way of an exemplary embodiment.
Referring to fig. 4, a schematic diagram of an implementation of a data writing process performed by a processor chip according to an exemplary embodiment of the application is shown. In this embodiment, the description will be given taking the example that the processor core 301 first initiates the write operation to the cache block 0.
And the processor core is used for inquiring the target cache block in the sacrifice cache and the processor cache corresponding to the processor core when the write operation is initiated to the target cache block.
Unlike the related art, when a write operation is initiated to a cache block, only the cache block is queried in its own corresponding processor cache.
In some embodiments, the processor core queries the target cache block based on a cache block address (address) of the target cache block.
Illustratively, as shown in FIG. 4, when the processor 301 initiates a write operation to cache block 0 (i.e., the target cache block), it is queried whether cache block 0 is contained in the processor cache 303 and victim cache 305.
Further, the processor core is configured to write the data into the cache block of the victim cache through the processor cache if the target cache block is queried in the processor cache and the target cache block is in the shared state.
In some embodiments, if the target cache block is queried in the processor cache and the target cache block is not queried in the victim cache, the processor core further obtains a cache block state of the target cache block in the processor cache.
In the embodiment of the present application, if the state of the cache block is the shared state, the processor core writes the data into the cache block of the victim cache, so as to execute the write operation to the shared cache block in the victim cache. After the write operation is completed, the victim cache comprises the target cache block after data modification.
If the cache block state is not shared (e.g., E-state), the processor core performs a data write operation on the target cache block in the processor cache and modifies the cache block state (e.g., M-state) without writing the data into the victim cache.
Although the processor core 301 is electrically connected to the victim cache 305, the processor 301 only performs a cache block query on the victim cache via the path, and when writing data into the victim cache 305, it is necessary to write data into the victim cache via the processor cache 303.
Illustratively, as shown in fig. 4, the processor core 301 queries that the processor cache 303 includes the cache block 0, the victim cache 305 does not include the cache block 0, and the cache block 0 is in a shared state, so that the processor core 301 writes data to the victim cache 305 through the processor cache 303 according to a write operation.
When the target victim cache is in a shared state, the other processor caches are indicated to contain the target cache block, so that in order to avoid that other subsequent processor cores inquire about the cache block (the data in the cache block is different) in the victim cache and the processor caches at the same time, the processor cores also need to inform the other processor cores to invalidate the target cache block in the local processor cache.
The processor core is used for sending a write operation signal to the interconnection bus through a read-write channel if the target cache block is queried in the processor cache and is in a shared state;
the interconnection bus is used for sending write operation signals to other processor cores through monitoring channels corresponding to the other processor cores; receiving invalidation signals through monitoring paths corresponding to other processor cores, wherein the invalidation signals are sent after invalidating target cache blocks of the other processor cores; transmitting an invalidation signal to the processor core through a read-write channel corresponding to the processor core;
the processor cache is also used for invalidating the target cache block;
the processor caches are also used for evicting the target cache block.
Illustratively, as shown in fig. 4, when the processor core 301 queries the processor cache 303 for the cache block 0 and the cache block 0 is in the shared state, the processor core 301 obtains the operation right of the interconnection bus 306 and sends a write operation signal to the interconnection bus 306 through the read-write path 411 with the interconnection bus 306, so as to instruct other processor cores to invalidate the cache block 0 in the local cache.
After the interconnect bus 306 receives the write operation signal, the write operation signal is sent to the processor core 302 via the snoop path 421 with the processor core 302. After receiving the write operation signal, the processor core 302 invalidates the cache block 0 in the processor cache 304, and sends an invalidation signal to the interconnect bus 306 through the snoop path 421, where the invalidation signal is used to indicate that the target cache block has been invalidated.
Interconnect bus 306 further sends an invalidate signal to processor core 301 through read/write path 411. After the processor core 301 receives the invalidation signal, the processor cache 303 invalidates the cache block 0 and evicts the cache block 0 via the read/write channel 411 and the interconnect bus 306. Wherein the evicted cache block is returned to memory via off-chip interface 307.
In some embodiments, the processor core may notify the other processor cores to invalidate the target cache block through the interconnection bus, and after receiving the invalidation signal fed back by the other processor cores, invalidate the target cache block in the local processor cache and write data into the victim cache.
In the embodiment of the application, the victim cache is positioned in a write-back path between the processor cache and the interconnection bus, namely, the victim cache can write data back into a memory electrically connected with the off-chip interface through the write-back path.
Since the data stored in the victim cache has changed from the data stored in the memory after the write operation, in one possible implementation, when the processor chip supports dirty state (dirty) transfer, the victim cache is further configured to write the cache block with the dirty state identifier back to the memory through the write-back path, where the cache block with the dirty state identifier is written to the memory through the interconnect bus and the off-chip interface, and the dirty state identifier is used to characterize that the data in the cache block is modified.
The dirty state identifier (dirty bit) can be written into the cache block, and after the victim cache is the cache block in the dirty state through the dirty state identifier, the cache block is written back into the memory through the write-back channel, so that the consistency of the data in the memory and the data in the victim cache is maintained.
Illustratively, as shown in FIG. 4, after the processor core 301 writes data into the victim cache 305 through the processor cache 303, the victim cache 305 writes the data in the dirty state cache block back into the memory through the write-back path 308 with the interconnect bus 306 according to the dirty state identification.
In other possible implementations, when the processor chip does not support dirty state transfer, the processor core only writes the data to the victim cache and does not invalidate the cache blocks in the local processor cache, but the victim cache does not write the data back to memory.
In some embodiments, the victim cache has a much smaller cache space than the processor cache, so to increase the space utilization of the victim cache, the victim cache employs a fully associative cache (fully associative cache).
Correspondingly, the victim cache is also used for evicting the cache block according to the cache block replacement policy when a data write operation is received and no spare cache block exists, so as to write in a new cache block.
In one possible implementation, the victim cache may evict a cache block according to a first-in first-out principle, or the victim cache may preferentially evict a cache block with a low multiplexing rate according to the multiplexing rate of the cache block, and reserve a cache block with a high multiplexing rate. This embodiment is not limited thereto.
Through the process, the target cache block in the processor cache is rewritten and written into the victim cache, and when the subsequent processor cache needs to perform read-write operation on the target cache block, the data in the victim cache is directly read-written without executing a monitoring operation flow.
And the processor core is further used for performing write operation on the target cache block in the victim cache if the target cache block is queried in the victim cache.
Illustratively, as shown in FIG. 4, when the processor core 302 initiates a write operation to cache block 0, the processor core 302 first queries the processor cache 304 and victim cache 305 for cache block 0. Since cache block 0 in processor cache 304 has been invalidated and written to victim cache 305, processor core 302 queries victim cache 305 for cache block 0, thereby writing data directly to victim cache 305.
In this embodiment, when a write operation is initiated to a target cache block, the target cache block is queried in the local processor cache and the victim cache, and the target cache block is included in the local processor cache, and when the target cache block is in a shared state, data is written into the victim cache, so that when data modification is performed to the target cache block by other subsequent processor cores (whether the modified data is the same address or not), only the data in the victim cache needs to be directly modified, and a listening operation flow does not need to be executed, thereby reducing occupation of an interconnection bus when the data write operation is performed to the shared cache block, and improving processing performance of the processor chip.
It should be noted that, in the foregoing embodiments only take the processor core to initiate the write operation as an example, in other possible embodiments, when the same processor core includes at least two hardware threads and the hardware threads initiate the write operation to the cache block, the foregoing manner may also be adopted to avoid the problem of cache pseudo sharing, which is not repeated herein.
Referring to fig. 5, a flowchart of a data read-write method according to an exemplary embodiment of the present application is shown, where the method is used for the processor chip shown in fig. 3 or 4 as an example, and the method includes the following steps:
in step 501, when the processor core initiates a write operation to the target cache block, the target cache block is queried in the victim cache and the processor cache corresponding to the processor core.
Unlike the related art, when a write operation is initiated to a cache block, only the cache block is queried in its own corresponding processor cache.
In some embodiments, the processor core queries the target cache block based on the cache block address of the target cache block.
In step 502, if the target cache block is queried in the victim cache, the processor core performs a write operation on the target cache block in the victim cache.
And if the target cache block is queried in the victim cache, indicating that the target cache block is in a sharing state. Unlike the related art, when writing a shared cache block, a series of snoop operation flows are required, and in this embodiment, the processor core may directly write a targeted cache block in the victim cache.
By the method, the processor cores can directly write the data into the sacrifice cache, so that the problem of cache pseudo sharing can not occur even if different processor cores modify the data with different addresses in the same cache block, and the processing performance of the processor chip is improved.
When the target cache block is not found in the victim cache, the processor chip can write the cache block data into the victim cache in the following manner, so that other subsequent processor cores can directly perform data writing operation on the victim cache.
Referring to fig. 6, a flowchart of a data read-write method according to another exemplary embodiment of the present application is shown, where the method is used for the processor chip shown in fig. 3 or 4 as an example, and the method includes the following steps:
in step 601, when the processor core initiates a write operation to the target cache block, the target cache block is queried in the victim cache and the processor cache corresponding to the processor core.
In step 602, if the target cache block is found in the processor cache and the target cache block is in the shared state, the processor core sends a write operation signal to the interconnect bus through the read-write path.
In step 603, the interconnection bus sends write operation signals to the other processor cores through snoop paths corresponding to the other processor cores.
In step 604, the interconnect bus receives an invalidation signal through a snoop path corresponding to the other processor core, where the invalidation signal is sent after the other processor core invalidates the target cache block.
In step 605, the interconnect bus sends an invalidation signal to the processor core through a read-write path corresponding to the processor core.
At step 606, the processor caches the invalidated target cache block.
At step 607, the processor caches the evicted target cache block.
At step 608, the processor core writes the data into the cache block of the victim cache via the processor cache.
In step 609, when a data write operation is received and there are no spare cache blocks, the victim cache evicts the cache blocks according to the cache block replacement policy.
In step 610, if the dirty state transfer is supported, the victim cache writes back the cache block with the dirty state identifier to the memory through the write-back path, where the cache block with the dirty state identifier is written to the memory through the interconnect bus and the off-chip interface, and the dirty state identifier is used to characterize that the data in the cache block is modified.
The implementation of the above steps may refer to the embodiment shown in fig. 4, and this embodiment is not described herein.
Referring to fig. 7, a schematic structural diagram of a computer device according to an exemplary embodiment of the present application is shown, where the computer device includes: a processor 701, a memory 702, and a bus 703. The processor 701 and the memory 702 are electrically connected by a bus.
The processor 701 may be any one of CPU, GPU, AI chips, and the processor 701 executes various functional applications and information processing by running software programs and modules. The processor 701 in this embodiment includes a processor chip as shown in fig. 3 or 4.
The memory 702 may be used for storing at least one instruction, and the processor 701 is used for executing the at least one instruction to implement various application functions and information processing.
Further, the memory 702 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, including but not limited to: magnetic or optical disk, electrically erasable programmable Read-Only Memory (EEPROM), erasable programmable Read-Only register (Erasable Programmable Read Only Memory, EPROM), static Random-Access Memory (SRAM), read-Only Memory (ROM), magnetic Memory, flash Memory, programmable Read-Only Memory (Programmable Read-Only Memory, PROM). In this embodiment, the storage 702 may be a memory (or referred to as a main memory) of a computer device.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the present disclosure is provided for the purpose of illustration only, and is not intended to limit the disclosure to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, alternatives, and alternatives falling within the spirit and principles of the disclosure.

Claims (9)

1. A processor chip, the processor chip comprising: at least two processor cores, processor caches corresponding to each processor core, a victim cache, an interconnection bus, and an off-chip interface;
the off-chip interface is electrically connected with the interconnection bus;
the interconnection buses are respectively and electrically connected with the processor caches, and the interconnection buses are electrically connected with the sacrifice caches;
each processor cache is electrically connected with a corresponding processor core, and each processor cache is electrically connected with the sacrifice cache;
each processor core is electrically connected with the sacrifice cache, and a read-write path and a monitoring path exist between each processor core and the interconnection bus;
the sacrifice cache is used for storing cache blocks in a sharing state;
the processor core is used for inquiring the target cache block in the sacrifice cache and the processor cache corresponding to the processor core when writing operation is initiated to the target cache block; if the target cache block is queried in the sacrifice cache, performing write operation on the target cache block in the sacrifice cache;
the processor core is further configured to write data into the cache block of the victim cache through the processor cache if the target cache block is queried in the processor cache and the target cache block is in a shared state;
the processor cache is also used for evicting the target cache block;
the processor core is configured to send a write operation signal to the interconnection bus through a read-write path if the target cache block is queried in the processor cache and the target cache block is in a shared state;
the interconnection bus is used for sending the write operation signal to other processor cores through monitoring paths corresponding to the other processor cores; receiving invalidation signals through monitoring paths corresponding to other processor cores, wherein the invalidation signals are sent after the target cache blocks are invalidated by the other processor cores; transmitting the invalidation signal to the processor core through a read-write channel corresponding to the processor core;
the processor cache is further configured to invalidate the target cache block.
2. The processor chip of claim 1, wherein the victim cache is located in a write-back path between the processor cache and the interconnect bus, and the off-chip interface is electrically coupled to a memory;
the victim cache is further configured to write, if the dirty state transfer is supported, a cache block provided with a dirty state identifier back to the memory through the write-back path, where the cache block provided with the dirty state identifier is written into the memory through the interconnection bus and the off-chip interface, and the dirty state identifier is used to characterize that data in the cache block is modified.
3. The processor chip of any one of claims 1 to 2, wherein the victim cache is a fully associative cache.
4. The processor chip of claim 3, wherein the processor chip,
and the sacrifice cache is also used for evicting the cache block according to the cache block replacement policy when the data write operation is received and no spare cache block exists.
5. The processor chip of any one of claims 1 to 2, wherein the processor cache is at least one of an L1 cache, an L2 cache, an L3 cache, or an L4 cache.
6. A data read-write method, applied to the processor chip of claim 1, the method comprising:
when the processor core initiates a write operation to a target cache block, inquiring the target cache block in the sacrifice cache and a processor cache corresponding to the processor core;
if the target cache block is queried in the victim cache, the processor core performs write operation on the target cache block in the victim cache;
if the target cache block is found in the processor cache and is in a shared state, the processor core writes data into the cache block of the victim cache through the processor cache;
the processor caches and evicts the target cache block;
a read-write path and a snoop path exist between the processor core and the interconnection bus;
the method further comprises the steps of:
if the target cache block is found in the processor cache and is in a shared state, the processor core sends a write operation signal to the interconnection bus through a read-write channel;
the interconnection bus sends the write operation signals to other processor cores through monitoring paths corresponding to the other processor cores;
the interconnection bus receives invalidation signals through monitoring paths corresponding to other processor cores, and the invalidation signals are sent after the other processor cores invalidate the target cache block;
the interconnection bus sends the invalidation signal to the processor core through a read-write channel corresponding to the processor core;
the processor cache invalidates the target cache block.
7. The method of claim 6, wherein the victim cache is located in a write-back path between the processor cache and the interconnect bus, and the off-chip interface is electrically coupled to a memory;
the method further comprises the steps of:
if the dirty state transmission is supported, the victim cache writes the cache block provided with the dirty state identifier back to the memory through the write-back channel, wherein the cache block provided with the dirty state identifier is written into the memory through the interconnection bus and the off-chip interface, and the dirty state identifier is used for representing that data in the cache block is modified.
8. The method of any of claims 6 to 7, wherein the victim cache is a fully associative cache;
the method further comprises the steps of:
and when a data writing operation is received and no spare cache blocks exist, the sacrifice cache drives out the cache blocks according to a cache block replacement strategy.
9. A computer device comprising a processor and a memory, the processor comprising a processor chip as claimed in any one of claims 1 to 5.
CN202010642465.2A 2020-07-06 2020-07-06 Data reading and writing method, processor chip and computer equipment Active CN111651376B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010642465.2A CN111651376B (en) 2020-07-06 2020-07-06 Data reading and writing method, processor chip and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010642465.2A CN111651376B (en) 2020-07-06 2020-07-06 Data reading and writing method, processor chip and computer equipment

Publications (2)

Publication Number Publication Date
CN111651376A CN111651376A (en) 2020-09-11
CN111651376B true CN111651376B (en) 2023-09-19

Family

ID=72352537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010642465.2A Active CN111651376B (en) 2020-07-06 2020-07-06 Data reading and writing method, processor chip and computer equipment

Country Status (1)

Country Link
CN (1) CN111651376B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463650A (en) * 2020-11-27 2021-03-09 苏州浪潮智能科技有限公司 Method, device and medium for managing L2P table under multi-core CPU
CN115061972B (en) * 2022-07-05 2023-10-13 摩尔线程智能科技(北京)有限责任公司 Processor, data read-write method, device and storage medium
CN116167310A (en) * 2023-04-25 2023-05-26 上海芯联芯智能科技有限公司 Method and device for verifying cache consistency of multi-core processor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6101420A (en) * 1997-10-24 2000-08-08 Compaq Computer Corporation Method and apparatus for disambiguating change-to-dirty commands in a switch based multi-processing system with coarse directories
EP1612683A2 (en) * 2004-06-30 2006-01-04 Intel Coporation An apparatus and method for partitioning a shared cache of a chip multi-processor
CN1848095A (en) * 2004-12-29 2006-10-18 英特尔公司 Fair sharing of a cache in a multi-core/multi-threaded processor by dynamically partitioning of the cache
CN101135993A (en) * 2007-09-20 2008-03-05 华为技术有限公司 Embedded system chip and data read-write processing method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7287126B2 (en) * 2003-07-30 2007-10-23 Intel Corporation Methods and apparatus for maintaining cache coherency
US8347037B2 (en) * 2008-10-22 2013-01-01 International Business Machines Corporation Victim cache replacement
US9274960B2 (en) * 2012-03-20 2016-03-01 Stefanos Kaxiras System and method for simplifying cache coherence using multiple write policies

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6101420A (en) * 1997-10-24 2000-08-08 Compaq Computer Corporation Method and apparatus for disambiguating change-to-dirty commands in a switch based multi-processing system with coarse directories
EP1612683A2 (en) * 2004-06-30 2006-01-04 Intel Coporation An apparatus and method for partitioning a shared cache of a chip multi-processor
CN1728112A (en) * 2004-06-30 2006-02-01 英特尔公司 An apparatus and method for partitioning a shared cache of a chip multi-processor
CN1848095A (en) * 2004-12-29 2006-10-18 英特尔公司 Fair sharing of a cache in a multi-core/multi-threaded processor by dynamically partitioning of the cache
CN101135993A (en) * 2007-09-20 2008-03-05 华为技术有限公司 Embedded system chip and data read-write processing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
异构多处理器系统Cache一致性解决方案;田芳等;《微计算机信息》;20081015(第29期);全文 *

Also Published As

Publication number Publication date
CN111651376A (en) 2020-09-11

Similar Documents

Publication Publication Date Title
CN111651376B (en) Data reading and writing method, processor chip and computer equipment
US7814279B2 (en) Low-cost cache coherency for accelerators
KR101497002B1 (en) Snoop filtering mechanism
US7305523B2 (en) Cache memory direct intervention
US7434007B2 (en) Management of cache memories in a data processing apparatus
KR20050070013A (en) Computer system with processor cashe that stores remote cashe presience information
GB2439650A (en) Snoop filter that maintains data coherency information for caches in a multi-processor system by storing the exclusive ownership state of the data
CN111143244B (en) Memory access method of computer equipment and computer equipment
US20090006668A1 (en) Performing direct data transactions with a cache memory
JPH10320283A (en) Method and device for providing cache coherent protocol for maintaining cache coherence in multiprocessor/data processing system
TW201832095A (en) Read-with overridable-invalidate transaction
US6587922B2 (en) Multiprocessor system
US7117312B1 (en) Mechanism and method employing a plurality of hash functions for cache snoop filtering
US7325102B1 (en) Mechanism and method for cache snoop filtering
US6918009B1 (en) Cache device and control method for controlling cache memories in a multiprocessor system
KR100505695B1 (en) Cache memory device having dynamically-allocated or deallocated buffers, digital data processing system comprising it and method thereof
US7779205B2 (en) Coherent caching of local memory data
US9442856B2 (en) Data processing apparatus and method for handling performance of a cache maintenance operation
CN110737407A (en) data buffer memory realizing method supporting mixed writing strategy
US20200081844A1 (en) Accelerating accesses to private regions in a region-based cache directory scheme
US20180276125A1 (en) Processor
CN110221985B (en) Device and method for maintaining cache consistency strategy across chips
US20230100746A1 (en) Multi-level partitioned snoop filter
US11954033B1 (en) Page rinsing scheme to keep a directory page in an exclusive state in a single complex
US9983995B2 (en) Delayed write through cache (DWTC) and method for operating the DWTC

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant