CN111651376A - Data reading and writing method, processor chip and computer equipment - Google Patents

Data reading and writing method, processor chip and computer equipment Download PDF

Info

Publication number
CN111651376A
CN111651376A CN202010642465.2A CN202010642465A CN111651376A CN 111651376 A CN111651376 A CN 111651376A CN 202010642465 A CN202010642465 A CN 202010642465A CN 111651376 A CN111651376 A CN 111651376A
Authority
CN
China
Prior art keywords
cache
processor
cache block
block
chip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010642465.2A
Other languages
Chinese (zh)
Other versions
CN111651376B (en
Inventor
刘君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202010642465.2A priority Critical patent/CN111651376B/en
Publication of CN111651376A publication Critical patent/CN111651376A/en
Application granted granted Critical
Publication of CN111651376B publication Critical patent/CN111651376B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • G06F12/0833Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means in combination with broadcast means (e.g. for invalidation or updating)
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0842Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/62Details of cache specific to multiprocessor cache arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The embodiment of the application discloses a data reading and writing method, a processor chip and computer equipment, and belongs to the technical field of chips. The processor chip includes: the system comprises at least two processor cores, a processor cache corresponding to each processor core, a sacrifice cache, an interconnection bus and an off-chip interface; the chip external interface is electrically connected with the interconnection bus; the interconnection bus is electrically connected with each processor cache respectively, and the interconnection bus is electrically connected with the sacrifice cache; each processor cache is electrically connected with the corresponding processor core, and each processor cache is electrically connected with the sacrifice cache; each processor core is electrically connected with the sacrifice cache; the victim cache is used for storing the cache blocks in the shared state. The processor chip provided by the embodiment of the application can avoid the problem of cache false sharing caused by the fact that different processor cores write different addresses in the same shared cache block, and improves the processor performance of the processor chip.

Description

Data reading and writing method, processor chip and computer equipment
Technical Field
The embodiment of the application relates to the technical field of chips, in particular to a data reading and writing method, a processor chip and computer equipment.
Background
The mesi (modified Exclusive Shared Or invalid) protocol is widely used in processors as a protocol supporting write-back policy, and is used to ensure consistency of caches between processor cores.
The MESI protocol provides that a cache block (cache line) in a cache includes 4 states, which are a Modified (M) state, an Exclusive (E) state, a Shared (S) state, and an Invalid (I) state. When a cache block is cached by a plurality of processor cores, the cache block is in a shared state, and when one processor core modifies data in the cache block, the modification operation triggers a snoop operation flow to invalidate the cache block in other processor cores, namely, the cache block in other processor cores becomes invalid.
Disclosure of Invention
The embodiment of the application provides a data reading and writing method, a processor chip and computer equipment. The technical scheme is as follows:
in one aspect, an embodiment of the present application provides a processor chip, where the processor chip includes: the system comprises at least two processor cores, a processor cache corresponding to each processor core, a sacrifice cache, an interconnection bus and an off-chip interface;
the chip external interface is electrically connected with the interconnection bus;
the interconnection bus is electrically connected with each processor cache respectively, and the interconnection bus is electrically connected with the sacrifice cache;
each processor cache is electrically connected with the corresponding processor core, and each processor cache is electrically connected with the sacrifice cache;
each processor core is electrically connected with the sacrifice cache;
wherein the victim cache is used for storing the cache blocks in the shared state.
In another aspect, an embodiment of the present application provides a data reading and writing method, where the method is applied to a processor chip as described in the above aspect, and the method includes:
when the processor core initiates a write operation to a target cache block, inquiring the target cache block in the sacrifice cache and a processor cache corresponding to the processor core;
and if the target cache block is inquired in the victim cache, the processor core performs write operation on the target cache block in the victim cache.
In another aspect, embodiments of the present application provide a computer device, where the computer device includes a processor and a memory, and the processor includes a processor chip according to any one of claims 1 to 8.
The technical scheme provided by the embodiment of the application at least comprises the following beneficial effects:
in the embodiment of the application, the sacrifice cache is added in the processor chip, the sacrifice cache is respectively electrically connected with each processor core, the sacrifice cache is respectively electrically connected with each processor cache, the cache block in the shared state is stored in the sacrifice cache, when the processor core needs to write the shared cache block, the cache block in the sacrifice cache is directly written without monopolizing an interconnection bus, the problem of cache false sharing caused when different processor cores write different addresses in the same shared cache block is avoided, and the processor performance of the processor chip is improved.
Drawings
FIG. 1 shows a block diagram of a processor chip having multiple cores;
FIG. 2 is a diagram illustrating an implementation of the processor chip of FIG. 1 in writing to a cache block in a shared state;
FIG. 3 illustrates a block diagram of a processor chip provided in an exemplary embodiment of the present application;
FIG. 4 is a diagram illustrating an implementation of the processor chip of FIG. 3 in writing to a cache block in a shared state;
FIG. 5 is a flow chart illustrating a method for reading and writing data provided by an exemplary embodiment of the present application;
FIG. 6 is a flow chart illustrating a method for reading and writing data provided by another exemplary embodiment of the present application;
fig. 7 shows a schematic structural diagram of a computer device provided in an exemplary embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
For convenience of understanding, terms referred to in the embodiments of the present application will be described below.
Cache (cache): a high-speed Memory with a faster Access speed than a Random Access Memory (RAM), is different from a main Memory (i.e., RAM) that uses a Dynamic Random Access Memory (DRAM) technology, and a cache Memory usually uses a Static Random Access Memory (SRAM) technology.
The cache in the embodiment of the application refers to a processor cache and is used for accelerating the data access speed of the processor core. Generally, a processor core of a processor corresponds to a multi-level cache, and the storage space and the access speed of different levels of caches are different. Wherein, the closer to the processor core, the smaller the memory space of the cache is, and the faster the access speed is.
Cache block (cache line): the cache is also called a cache line, which is a basic data transmission unit in a processor memory hierarchy, data transmission is performed between the cache and the memory by taking a cache block as a unit, and data in the cache is stored in a cache block form.
Typically, the size of a cache block is between 32 and 256 bytes (e.g., 64 bytes), and different addresses may be included in the same cache block, and the processor core may write data at different addresses in the cache block.
Cache block status: the MESI protocol specifies that a cache block includes 4 states, modified, exclusive, shared, and invalid, respectively. In the exclusive state, the cache block is only cached in the cache corresponding to the current processor core, and the data of the cache block is not modified and is consistent with the data in the main memory; in a shared state, caching a cache block in caches corresponding to at least two processor cores, wherein the cache block is not modified and is consistent with data in a main memory; in the modified state, the cache block is only cached in the cache corresponding to the current processor core, and the data of the cache block is modified and is inconsistent with the data in the main memory; if a certain processor core modifies the data of the cache block in the shared state, the cache block in the cache corresponding to other processor cores is changed into an invalid state.
Caching pseudo sharing: in a processor chip with a multi-processor core, when a plurality of processor cores or hardware threads access different addresses in the same cache block respectively, a cache false sharing phenomenon occurs. Because a large number of exclusive requests to the interconnection bus are generated when the cache false sharing phenomenon occurs, the running time of the processor is increased, and the processing performance of the processor chip is further influenced.
Victim cache (victims cache): the victim cache in the embodiments of the present application is independent of the processor cache and is used to store cache blocks in a shared state. In the processor chip, the read-write operation of different processor cores to the shared cache block is carried out in the sacrifice cache. In some embodiments, the memory space of the victim cache is less than the memory space of the processor cache.
Referring to FIG. 1, a block diagram of a processor chip having multiple cores is shown. The processor chip 100 includes: at least two processor cores, a processor cache corresponding to each processor core, an interconnection bus and an off-chip interface.
In fig. 1, a processor chip 100 includes four processor cores, i.e., a processor core 101, a processor core 102, a processor core 103, and a processor core 104.
Each processor core is electrically connected with the corresponding processor cache. As shown in fig. 1, processor core 101 is electrically coupled to processor cache 105, processor core 102 is electrically coupled to processor cache 106, processor core 103 is electrically coupled to processor cache 107, and processor core 104 is electrically coupled to processor cache 108. The processor cache may be at least one of an L1 cache, an L2 cache, an L3 cache, or an L4 cache, which is not limited in this embodiment.
Each processor cache is electrically connected to an Interconnect (Interconnect) bus 109, so that data transmission with a main memory is realized through the Interconnect bus 109, or data transmission between each processor cache is realized. As shown in FIG. 1, processor cache 105, processor cache 106, processor cache 107, and processor cache 108 are each electrically coupled to interconnect bus 109.
In some embodiments, bus control authority for interconnect bus 109 may be obtained when a processor core performs a write operation.
The interconnect bus 109 is electrically connected to the off-chip interface 110. The off-chip interface 110 is used to connect external devices, and may include: at least one of a high-speed serial port, an optical module interface, a camera acquisition interface, a high-speed data interface, a Peripheral Component Interconnect express (PCIe) interface, an ethernet interface, and a bluetooth interface.
As shown in fig. 1, the processor chip 100 may be electrically connected to the memory 120 through the off-chip interface 110, so as to perform read/write operations on the memory 120.
In some embodiments, a read-write channel and a snoop channel exist between each processor cache and the interconnection bus, and during data reading and writing, the processor caches achieve cache consistency through the read-write path and the snoop path.
On the basis of fig. 1, as shown in fig. 2, an implementation diagram of the processor chip 100 when writing to a cache block in a shared state is shown.
The processor cache 105 of the processor core 101 includes a cache block 0, the processor cache 106 of the processor core 102 includes a cache block 1, the processor cache 107 of the processor core 103 includes a cache block 2, and the processor cache 108 of the processor core 104 includes a cache block 0. When the processor core 101 initiates a write operation to the cache block 0 in the processor cache 0, since the cache block 0 is in the shared state, cache coherency at the time of the write operation needs to be ensured by the following steps.
1. The processor core 101 initiates a write operation to cache block 0. Where cache block 0 is in the shared state.
2. The processor core 101 obtains the operation right of the interconnect bus 109 and sends a read/write operation signal to the interconnect bus 109 through the read/write path 21 between the processor cache 105 and the interconnect bus 109.
3. After receiving the read/write operation signal, the interconnect bus 109 sends the read/write operation signal to the processor core 104 through the snoop path 221 between the processor cache 108 and the interconnect bus.
4. The processor core 104 invalidates cache block 0 in the processor cache 108 in response to receiving the read and write operation signal.
5. Processor core 104 sends an invalidation signal to interconnect bus 109 via snoop path 221. The invalidate signal is used to indicate that a cache block in the local cache has been invalidated.
6. Interconnect bus 109 sends an invalidation signal to processor core 101 through read/write path 211.
7. The processor core 101 evicts a cache block based on the invalidation signal.
8. The processor core 101 writes data to the processor cache 105.
Therefore, when the cache block in the shared state is subjected to write operation, the processor core needs to acquire the operation right of the interconnection bus, and then a series of monitoring operation processes are realized through the interconnection bus.
However, since the cache blocks contain different addresses, when different processor cores write data of different addresses in the same cache block, a snoop operation procedure is initiated, but at this time, there is no data sharing requirement between different processor cores, which results in a cache pseudo-sharing problem.
For a simple example, when the processor core 0 writes data in the address 0 and the processor core 1 writes data in the address 1, since the address 0 and the address 1 are both located in the cache block 0, a monitoring operation flow needs to be executed; when the processor core 2 writes data in the address 2, since the address 2 and the address 1 are both located in the cache block 0, a monitoring operation flow needs to be executed once; when the processor core 3 writes data in the address 3, since the address 3 and the address 2 are both located in the cache block 0, a monitoring operation flow needs to be executed once; when processor core 0 writes data in address 4, a snoop flow is required since address 4 and address 3 are both located in cache block 0.
Obviously, when cache pseudo sharing occurs, when the processor core executes write operation each time, a monitoring operation flow needs to be executed once, so that the interconnection bus is occupied for a long time, the running time of the processor chip is increased, and the processing performance of the processor chip is affected.
In order to solve the problem of cache pseudo-sharing without modifying a memory address, a victim cache is additionally added in a processor chip in the embodiment of the application, so that the cache pseudo-sharing problem is solved by means of the victim cache.
Referring to fig. 3, a block diagram of a processor chip provided in an exemplary embodiment of the present application is shown. The processor chip comprises at least two processor cores, a processor cache corresponding to each processor core, a sacrifice cache, an interconnection bus and an off-chip interface.
The processor chip 300 may be any one of a Central Processing Unit (CPU), a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or an Artificial Intelligence (AI) chip, which is not limited in the embodiments of the present application.
In fig. 3, a processor chip 300 is schematically illustrated as including 2 processor cores, namely a processor core 301 and a processor core 302. In other possible implementations, the number of processor cores in the processor chip may be more than 2, and this embodiment does not limit this.
Each processor core is electrically connected with the corresponding processor cache. As shown in FIG. 3, processor core 301 is electrically coupled to processor cache 303, and processor core 302 is electrically coupled to processor cache 304. The processor cache may be at least one of an L1 cache, an L2 cache, an L3 cache, or an L4 cache, which is not limited in this embodiment.
In the embodiment of the present application, the processor chip 300 further includes a victim cache 305 in addition to the processor cache 303. The victim cache 305 is electrically connected to each processor core, and the victim cache 305 is electrically connected to each processor cache 303.
As shown in FIG. 3, victim cache 305 is electrically coupled to processor core 301 and processor core 302, respectively, and victim cache 305 is electrically coupled to processor cache 303 and processor cache 304, respectively.
The victim cache in the embodiment of the application is used for storing the cache block in the sharing state, correspondingly, when the processor core needs to write the shared cache block, the data writing operation is directly executed in the victim cache without acquiring the operation right of the interconnection bus, and the monitoring operation process is executed, so that the occupation of the interconnection bus by cache pseudo sharing is avoided, the running time of the processor chip is shortened, and the performance of the processor chip is improved.
In some embodiments, the victim cache and the processor cache use the same random access technique to ensure access speed to data in the victim cache. Moreover, since the victim cache is used to store the shared cache block, and the cache block in the non-shared state is still stored in the processor cache, the cache space of the victim cache is smaller than the cache space of the processor cache.
The victim cache, similar to the processor cache, is also electrically coupled to the interconnect bus for data write back operations via the interconnect bus. As shown in FIG. 3, processor cache 303, processor cache 304, and victim cache 305 are all electrically coupled to interconnect bus 306.
The interconnect bus 306 is electrically connected to the off-chip interface 307. The off-chip interface 307 is used for connecting an external device, and may include: the device comprises at least one of a high-speed serial port, an optical module interface, a camera acquisition interface, a high-speed data interface, a PCIe interface, an Ethernet interface and a Bluetooth interface.
To sum up, in the embodiment of the present application, a victim cache is added in a processor chip, the victim cache is electrically connected to each processor core, the victim cache is electrically connected to each processor cache, and a cache block in a shared state is stored in the victim cache.
The following describes a data write operation of the processor chip 300 shown in fig. 3 by way of an exemplary embodiment.
Referring to fig. 4, a schematic diagram of an implementation of a data writing process performed by a processor chip according to an exemplary embodiment of the present application is shown. In this embodiment, the description will be given taking an example in which the processor core 301 first initiates a write operation to the cache block 0.
And the processor core is used for inquiring the target cache block in the sacrifice cache and the processor cache corresponding to the processor core when the write operation is initiated on the target cache block.
Different from the related art, when a write operation on a cache block is initiated, only the cache block is queried in a corresponding processor cache of the processor core.
In some embodiments, the processor core queries the target cache block according to its cache block address.
Illustratively, as shown in FIG. 4, when a processor 301 initiates a write operation to cache block 0 (i.e., the target cache block), it is queried whether cache block 0 is contained in the processor cache 303 and victim cache 305.
Further, the processor core is configured to, if the target cache block is found in the processor cache and the target cache block is in the shared state, write the data into the cache block of the victim cache through the processor cache.
In some embodiments, the processor core further obtains a cache block status of the target cache block in the processor cache if the target cache block is queried in the processor cache and the target cache block is not queried in the victim cache.
Different from the related art, in the embodiment of the present application, if the state of the cache block is the shared state, the processor core writes data into the cache block of the victim cache, so as to perform a write operation on the shared cache block in the victim cache subsequently. After the write operation is completed, the victim cache contains the target cache block after the data modification.
If the cache block state is a non-shared state (e.g., an E state), the processor core performs a data write operation to a target cache block in the processor cache and modifies the cache block state (e.g., to an M state) without writing data to the victim cache.
It should be noted that although the processor core 301 and the victim cache 305 are electrically connected, the processor 301 only performs cache block lookup on the victim cache through the path, and when data is written into the victim cache 305, the data needs to be written into the victim cache through the processor cache 303.
Illustratively, as shown in fig. 4, the processor core 301 queries the processor cache 303 to contain the cache block 0, the victim cache 305 does not contain the cache block 0, and the cache block 0 is in a shared state, so that the processor core 301 writes data to the victim cache 305 through the processor cache 303 according to a write operation.
When the target victim cache is in the shared state, it indicates that the target cache block is contained in the other processor caches, and in order to avoid that the cache block is simultaneously queried in the victim cache and the processor caches by the other processor cores (the data in the cache block is different), the processor cores need to inform the other processor cores to invalidate the target cache block in the local processor cache.
The processor core is used for sending a write operation signal to the interconnection bus through the read-write path if a target cache block is inquired in the processor cache and the target cache block is in a shared state;
the interconnection bus is used for sending a write operation signal to other processor cores through the monitoring channels corresponding to the other processor cores; receiving an invalidation signal through a monitoring path corresponding to other processor cores, wherein the invalidation signal is sent after the other processor cores invalidate the target cache block; an invalidation signal is sent to the processor core through a read-write path corresponding to the processor core;
a processor cache further to invalidate the target cache block;
a processor cache, further to evict the target cache block.
Illustratively, as shown in fig. 4, when the processor core 301 queries the cache block 0 in the processor cache 303 and the cache block 0 is in the shared state, the processor core 301 obtains the operation right of the interconnect bus 306 and sends a write operation signal to the interconnect bus 306 through the read/write path 411 with the interconnect bus 306, so as to instruct other processor cores to invalidate the cache block 0 in the local cache.
Upon receiving the write operation signal, interconnect bus 306 sends the write operation signal to processor core 302 via snoop path 421 to processor core 302. Upon receiving the write operation signal, processor core 302 invalidates cache block 0 in processor cache 304 and sends an invalidation signal to interconnect bus 306 via snoop path 421 indicating that the target cache block has been invalidated.
Interconnect bus 306 further sends an invalidation signal to processor core 301 through read and write path 411. Upon receipt of the invalidate signal by processor core 301, processor cache 303 invalidates cache block 0 and evicts cache block 0 via read/write channel 411 and interconnect bus 306. The evicted cache block is returned to the memory through the off-chip interface 307.
In some embodiments, the processor core may first notify other processor cores of invalidating the target cache block through the interconnection bus, and after receiving an invalidation signal fed back by the other processor cores, invalidate the target cache block in the local processor cache, and write data into the victim cache, which is not limited in this embodiment.
In this embodiment, the victim cache is located in a write-back path between the processor cache and the interconnect bus, that is, the victim cache can write data back to the memory electrically connected to the off-chip interface through the write-back path.
Since the data stored in the victim cache has changed compared to the data in the memory after the write operation, in one possible embodiment, when the processor chip supports dirty state (dirty) transfer, the victim cache is further configured to write back the cache block with the dirty state identifier to the memory through a write-back path, where the cache block with the dirty state identifier is written to the memory through the interconnect bus and the off-chip interface, and the dirty state identifier is used to indicate that the data in the cache block is modified.
The dirty state flag (dirty bit) may be written into the cache block, and after the victim cache is a cache block in the dirty state via the dirty state flag, the cache block is written back into the memory via the write-back path, so as to maintain the consistency between the data in the memory and the data in the victim cache.
Illustratively, as shown in fig. 4, after the processor core 301 writes data into the victim cache 305 via the processor cache 303, the victim cache 305 writes the data in the dirty state cache block back into the memory via the write-back path 308 with the interconnect bus 306 according to the dirty state identifier.
In other possible embodiments, when the processor chip does not support dirty state transfer, the processor core only writes data to the victim cache and invalidates cache blocks in the local processor cache, and the victim cache does not write data back to memory.
In some embodiments, the cache space of the victim cache is much smaller than the processor cache, so to improve the space utilization of the victim cache, the victim cache employs a fully associative cache (fully associative cache).
Correspondingly, the victim cache is further configured to evict the cache block according to the cache block replacement policy when a data write operation is received and there are no free cache blocks, so as to write a new cache block.
In one possible embodiment, the victim cache may evict the cache block according to a first-in first-out principle, or the victim cache may preferentially evict the cache block with a low multiplexing rate and reserve the cache block with a high multiplexing rate according to the multiplexing rate of the cache block. This embodiment is not limited to this.
Through the process, the target cache block in the processor cache is rewritten and written into the sacrifice cache, and when the subsequent processor cache needs to read and write the target cache block, the data in the sacrifice cache is directly read and written without executing a monitoring operation flow.
And the processor core is also used for performing write operation on the target cache block in the victim cache if the target cache block is inquired in the victim cache.
Illustratively, as shown in FIG. 4, when the processor core 302 initiates a write operation to cache block 0, the processor core 302 first queries the processor cache 304 and the victim cache 305 for cache block 0. Since cache block 0 in processor cache 304 has been invalidated and written to victim cache 305, processor core 302 looks up cache block 0 in victim cache 305, thereby writing data directly into victim cache 305.
In this embodiment, when initiating a write operation on a target cache block, the target cache block is queried in the local processor cache and the victim cache, and the target cache block is included in the local processor cache and data is written into the victim cache when the local processor cache is in a shared state, so that when data modification is performed on the target cache block by other subsequent processor cores (no matter whether the modified data is the same address or not), only the data in the victim cache needs to be modified directly, and a monitoring operation flow does not need to be executed, thereby reducing occupation of an interconnection bus when performing the data write operation on the shared cache block, and improving the processing performance of the processor chip.
It should be noted that, in the above embodiment, only the processor core initiates the write operation as an example, in other possible embodiments, when the same processor core includes at least two hardware threads and the hardware thread initiates the write operation on the cache block, the above manner may also be adopted to avoid the problem of cache pseudo sharing, which is not described herein again.
Referring to fig. 5, a flowchart of a data read/write method according to an exemplary embodiment of the present application is shown, where the present application takes the method as an example for being applied to the processor chip shown in fig. 3 or 4 to explain, and the method includes the following steps:
step 501, when a processor core initiates a write operation to a target cache block, the processor core queries the target cache block in a victim cache and a processor cache corresponding to the processor core.
Different from the related art, when a write operation on a cache block is initiated, only the cache block is queried in a corresponding processor cache of the processor core.
In some embodiments, the processor core queries the target cache block according to the cache block address of the target cache block.
Step 502, if the target cache block is found in the victim cache, the processor core performs a write operation on the target cache block in the victim cache.
If the target cache block is inquired in the victim cache, the target cache block is in a shared state. Unlike the related art, when writing to a cache block in a shared state, a series of snooping operation processes need to be performed.
By the method, the processor cores can directly write data into the sacrifice cache, so that the problem of false cache sharing can not occur even if different processor cores modify data of different addresses in the same cache block, and the processing performance of the processor chip is improved.
When the target cache block is not found in the victim cache, the processor chip can write the cache block data into the victim cache in the following way, so that the subsequent other processor cores can directly write data into the victim cache.
Referring to fig. 6, a flowchart of a data read/write method according to another exemplary embodiment of the present application is shown, where the present embodiment takes the method as an example for being applied to the processor chip shown in fig. 3 or 4, and the method includes the following steps:
step 601, when the processor core initiates a write operation to the target cache block, querying the target cache block in the victim cache and the processor cache corresponding to the processor core.
Step 602, if the target cache block is found in the processor cache and the target cache block is in a shared state, the processor core sends a write operation signal to the interconnection bus through the read-write path.
Step 603, the interconnection bus sends a write operation signal to other processor cores through the corresponding monitoring paths of other processor cores.
Step 604, the interconnection bus receives an invalidation signal through the monitoring path corresponding to the other processor core, and the invalidation signal is sent after the other processor core invalidates the target cache block.
Step 605, the interconnection bus sends an invalidation signal to the processor core through the read-write path corresponding to the processor core.
At step 606, the processor caches the invalidated target cache block.
At step 607, the processor caches the evicted target cache block.
The processor core then writes the data to the cache block of the victim cache via the processor cache, step 608.
In step 609, when the data write operation is received and no free cache block exists, the victim cache evicts the cache block according to the cache block replacement policy.
Step 610, if the dirty state transfer is supported, the victim cache writes the cache block with the dirty state identifier back to the memory through a write-back path, wherein the cache block with the dirty state identifier is written into the memory through an interconnection bus and an off-chip interface, and the dirty state identifier is used for indicating that data in the cache block is modified.
The implementation manner of the above steps can refer to the embodiment shown in fig. 4, and this embodiment is not described herein again.
Referring to fig. 7, a schematic structural diagram of a computer device provided in an exemplary embodiment of the present application is shown, where the computer device includes: a processor 701, a memory 702, and a bus 703. The processor 701 and the memory 702 are electrically connected by a bus.
The processor 701 may be any one of a CPU, a GPU, and an AI chip, and the processor 701 executes various functional applications and information processing by running software programs and modules. The processor 701 in this embodiment includes a processor chip as shown in fig. 3 or 4.
The memory 702 may be used to store at least one instruction that the processor 701 uses to execute in order to implement various application functions and information processing.
Further, the memory 702 may be implemented by any type or combination of volatile or non-volatile storage devices, including, but not limited to: magnetic or optical disks, Electrically Erasable-Programmable-Read-Only-Memory (EEPROM), Erasable-Programmable-Read-Only-Memory (EPROM), Static Random-Access-Memory (SRAM), Read-Only-Memory (ROM), magnetic Memory, flash Memory, Programmable-Read-Only-Memory (PROM). In this embodiment, the storage 702 may be a memory (or referred to as a main memory) of the computer device.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is intended to be exemplary only and not to limit the present disclosure, and any modification, equivalent replacement, or improvement made without departing from the spirit and scope of the present disclosure is to be considered as the same as the present disclosure.

Claims (14)

1. A processor chip, comprising: the system comprises at least two processor cores, a processor cache corresponding to each processor core, a sacrifice cache, an interconnection bus and an off-chip interface;
the chip external interface is electrically connected with the interconnection bus;
the interconnection bus is electrically connected with each processor cache respectively, and the interconnection bus is electrically connected with the sacrifice cache;
each processor cache is electrically connected with the corresponding processor core, and each processor cache is electrically connected with the sacrifice cache;
each processor core is electrically connected with the sacrifice cache;
wherein the victim cache is used for storing the cache blocks in the shared state.
2. The processor chip of claim 1,
the processor core is used for inquiring the target cache block in the sacrifice cache and the processor cache corresponding to the processor core when writing operation is initiated on the target cache block;
and if the target cache block is inquired in the sacrifice cache, performing write operation on the target cache block in the sacrifice cache.
3. The processor chip of claim 2,
the processor core is further configured to, if the target cache block is queried in the processor cache and the target cache block is in a shared state, write data into the cache block of the victim cache through the processor cache;
the processor cache is further configured to evict the target cache block.
4. The processor chip of claim 3, wherein the victim cache is located in a write-back path between the processor cache and the interconnect bus, and wherein the off-chip interface is electrically coupled to a memory;
the victim cache is further configured to write back, through the write-back path, the cache block with the dirty state identifier to the memory if dirty state transfer is supported, where the cache block with the dirty state identifier is written into the memory through the interconnect bus and the off-chip interface, and the dirty state identifier is used to indicate that data in the cache block is modified.
5. The processor chip of claim 3, wherein a read/write path and a snoop path exist between the processor core and an interconnect bus;
the processor core is configured to send a write operation signal to the interconnection bus through a read-write path if the target cache block is queried in the processor cache and the target cache block is in a shared state;
the interconnection bus is used for sending the write operation signal to other processor cores through monitoring passages corresponding to the other processor cores; receiving an invalidation signal through a monitoring path corresponding to other processor cores, wherein the invalidation signal is sent after the other processor cores invalidate the target cache block; sending the invalidation signal to the processor core through a read-write channel corresponding to the processor core;
the processor cache is further configured to invalidate the target cache block.
6. The processor chip of any one of claims 1 to 5, wherein the victim cache is a fully associative cache.
7. The processor chip of claim 6,
and the sacrifice cache is also used for evicting the cache block according to the cache block replacement strategy when receiving the data write-in operation and no vacant cache block exists.
8. The processor chip of any one of claims 1 to 5, wherein the processor cache is at least one of an L1 cache, an L2 cache, an L3 cache, or an L4 cache.
9. A method for reading and writing data, applied to the processor chip according to claim 1, the method comprising:
when the processor core initiates a write operation to a target cache block, inquiring the target cache block in the sacrifice cache and a processor cache corresponding to the processor core;
and if the target cache block is inquired in the victim cache, the processor core performs write operation on the target cache block in the victim cache.
10. The method of claim 9, wherein after querying the target cache block in the victim cache and a processor cache corresponding to the processor core, the method further comprises:
if the target cache block is found in the processor cache and the target cache block is in a shared state, the processor core writes data into the cache block of the sacrifice cache through the processor cache;
the processor cache evicts the target cache block.
11. The method of claim 10, wherein the victim cache is located in a write back path between the processor cache and the interconnect bus, and wherein the off-chip interface is electrically coupled to a memory;
the method further comprises the following steps:
and if the victim cache supports dirty state transmission, the victim cache writes the cache block with the dirty state identifier back to the memory through the write-back channel, wherein the cache block with the dirty state identifier is written into the memory through the interconnection bus and the off-chip interface, and the dirty state identifier is used for representing that data in the cache block is modified.
12. The method of claim 10, wherein there are a read/write path and a snoop path between the processor core and an interconnect bus;
the method further comprises the following steps:
if the target cache block is found in the processor cache and the target cache block is in a shared state, the processor core sends a write operation signal to the interconnection bus through a read-write path;
the interconnection bus sends the write operation signal to other processor cores through monitoring channels corresponding to the other processor cores;
the interconnection bus receives an invalidation signal through a monitoring channel corresponding to other processor cores, and the invalidation signal is sent after the other processor cores invalidate the target cache block;
the interconnection bus sends the invalidation signal to the processor core through a read-write path corresponding to the processor core;
the processor cache invalidates the target cache block.
13. The method of any of claims 9 to 12, wherein the victim cache is a fully associative cache;
the method further comprises the following steps:
and when the data write operation is received and no vacant cache block exists, the sacrifice cache evicts the cache block according to a cache block replacement strategy.
14. A computer device, characterized in that the computer device comprises a processor and a memory, the processor comprising a processor chip according to any of claims 1 to 8.
CN202010642465.2A 2020-07-06 2020-07-06 Data reading and writing method, processor chip and computer equipment Active CN111651376B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010642465.2A CN111651376B (en) 2020-07-06 2020-07-06 Data reading and writing method, processor chip and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010642465.2A CN111651376B (en) 2020-07-06 2020-07-06 Data reading and writing method, processor chip and computer equipment

Publications (2)

Publication Number Publication Date
CN111651376A true CN111651376A (en) 2020-09-11
CN111651376B CN111651376B (en) 2023-09-19

Family

ID=72352537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010642465.2A Active CN111651376B (en) 2020-07-06 2020-07-06 Data reading and writing method, processor chip and computer equipment

Country Status (1)

Country Link
CN (1) CN111651376B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463650A (en) * 2020-11-27 2021-03-09 苏州浪潮智能科技有限公司 Method, device and medium for managing L2P table under multi-core CPU
CN115061972A (en) * 2022-07-05 2022-09-16 摩尔线程智能科技(北京)有限责任公司 Processor, data read-write method, device and storage medium
CN116167310A (en) * 2023-04-25 2023-05-26 上海芯联芯智能科技有限公司 Method and device for verifying cache consistency of multi-core processor

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6101420A (en) * 1997-10-24 2000-08-08 Compaq Computer Corporation Method and apparatus for disambiguating change-to-dirty commands in a switch based multi-processing system with coarse directories
US20050027945A1 (en) * 2003-07-30 2005-02-03 Desai Kiran R. Methods and apparatus for maintaining cache coherency
EP1612683A2 (en) * 2004-06-30 2006-01-04 Intel Coporation An apparatus and method for partitioning a shared cache of a chip multi-processor
CN1848095A (en) * 2004-12-29 2006-10-18 英特尔公司 Fair sharing of a cache in a multi-core/multi-threaded processor by dynamically partitioning of the cache
CN101135993A (en) * 2007-09-20 2008-03-05 华为技术有限公司 Embedded system chip and data read-write processing method
US20100100682A1 (en) * 2008-10-22 2010-04-22 International Business Machines Corporation Victim Cache Replacement
US20130254488A1 (en) * 2012-03-20 2013-09-26 Stefanos Kaxiras System and method for simplifying cache coherence using multiple write policies

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6101420A (en) * 1997-10-24 2000-08-08 Compaq Computer Corporation Method and apparatus for disambiguating change-to-dirty commands in a switch based multi-processing system with coarse directories
US20050027945A1 (en) * 2003-07-30 2005-02-03 Desai Kiran R. Methods and apparatus for maintaining cache coherency
EP1612683A2 (en) * 2004-06-30 2006-01-04 Intel Coporation An apparatus and method for partitioning a shared cache of a chip multi-processor
CN1728112A (en) * 2004-06-30 2006-02-01 英特尔公司 An apparatus and method for partitioning a shared cache of a chip multi-processor
CN1848095A (en) * 2004-12-29 2006-10-18 英特尔公司 Fair sharing of a cache in a multi-core/multi-threaded processor by dynamically partitioning of the cache
CN101135993A (en) * 2007-09-20 2008-03-05 华为技术有限公司 Embedded system chip and data read-write processing method
US20100100682A1 (en) * 2008-10-22 2010-04-22 International Business Machines Corporation Victim Cache Replacement
US20130254488A1 (en) * 2012-03-20 2013-09-26 Stefanos Kaxiras System and method for simplifying cache coherence using multiple write policies

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田芳等: "异构多处理器系统Cache一致性解决方案", 《微计算机信息》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463650A (en) * 2020-11-27 2021-03-09 苏州浪潮智能科技有限公司 Method, device and medium for managing L2P table under multi-core CPU
CN115061972A (en) * 2022-07-05 2022-09-16 摩尔线程智能科技(北京)有限责任公司 Processor, data read-write method, device and storage medium
CN115061972B (en) * 2022-07-05 2023-10-13 摩尔线程智能科技(北京)有限责任公司 Processor, data read-write method, device and storage medium
CN116167310A (en) * 2023-04-25 2023-05-26 上海芯联芯智能科技有限公司 Method and device for verifying cache consistency of multi-core processor

Also Published As

Publication number Publication date
CN111651376B (en) 2023-09-19

Similar Documents

Publication Publication Date Title
US7305523B2 (en) Cache memory direct intervention
US7305522B2 (en) Victim cache using direct intervention
TWI391821B (en) Processor unit, data processing system and method for issuing a request on an interconnect fabric without reference to a lower level cache based upon a tagged cache state
CN111651376B (en) Data reading and writing method, processor chip and computer equipment
US6295582B1 (en) System and method for managing data in an asynchronous I/O cache memory to maintain a predetermined amount of storage space that is readily available
US10402327B2 (en) Network-aware cache coherence protocol enhancement
US5317738A (en) Process affinity scheduling method and apparatus
US7434007B2 (en) Management of cache memories in a data processing apparatus
KR102531264B1 (en) Read-with overridable-invalidate transaction
US11789868B2 (en) Hardware coherence signaling protocol
US7117312B1 (en) Mechanism and method employing a plurality of hash functions for cache snoop filtering
US7325102B1 (en) Mechanism and method for cache snoop filtering
KR100505695B1 (en) Cache memory device having dynamically-allocated or deallocated buffers, digital data processing system comprising it and method thereof
US11507517B2 (en) Scalable region-based directory
CN112673358B (en) Accelerating access to private areas in a region-based cache directory scheme
US9442856B2 (en) Data processing apparatus and method for handling performance of a cache maintenance operation
CN110221985B (en) Device and method for maintaining cache consistency strategy across chips
US10565111B2 (en) Processor
CN110737407A (en) data buffer memory realizing method supporting mixed writing strategy
US10949360B2 (en) Information processing apparatus
US20230100746A1 (en) Multi-level partitioned snoop filter
US9983995B2 (en) Delayed write through cache (DWTC) and method for operating the DWTC
JP2023552722A (en) Method and apparatus for transferring data within a hierarchical cache circuit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant