CN112463652B - Data processing method and device based on cache consistency, processing chip and server - Google Patents

Data processing method and device based on cache consistency, processing chip and server Download PDF

Info

Publication number
CN112463652B
CN112463652B CN202011316533.2A CN202011316533A CN112463652B CN 112463652 B CN112463652 B CN 112463652B CN 202011316533 A CN202011316533 A CN 202011316533A CN 112463652 B CN112463652 B CN 112463652B
Authority
CN
China
Prior art keywords
transaction request
cache
probe filter
group
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011316533.2A
Other languages
Chinese (zh)
Other versions
CN112463652A (en
Inventor
杨凯歌
林江
曹俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202011316533.2A priority Critical patent/CN112463652B/en
Publication of CN112463652A publication Critical patent/CN112463652A/en
Application granted granted Critical
Publication of CN112463652B publication Critical patent/CN112463652B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0873Mapping of cache memory to specific storage devices or parts thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

One or more embodiments of the invention disclose a data processing method, a device, a processing chip and a server based on cache consistency, wherein the data processing method based on cache consistency comprises the following steps: acquiring a transaction request sent to a probe filter for storing cache state information, wherein the cache state information stored in the probe filter is stored in a group-associative mode, and an additional memory is added in the probe filter in advance; and responding to the condition that no free storage position exists in the group mapped by the transaction request, and allocating a target address in the extra memory for the cache state information corresponding to the transaction request. The method can reduce the probability of the degradation operation in the cache consistency maintenance process in the multiprocessor system.

Description

Data processing method and device based on cache consistency, processing chip and server
Technical Field
The invention relates to the technical field of processors, in particular to a data processing method and device based on cache consistency, a processing chip and a server.
Background
In today's multiprocessor and multi-cache computer systems, a cache may cache a large amount of data, either exclusively or with copies of the same address maintained in different caches. Coherency problems arise when different caches operate on cached data at the same address. To address this problem, numerous techniques have emerged to maintain cache coherency. For example, when multiple caches store copies of the same data, and a processor needs to modify some cached data, the data will be marked invalid in other cached copies to avoid coherency errors. In order to improve the efficiency of cache coherency maintenance in multiprocessor systems, techniques such as probe filtering have been developed. Probe filtering may help track cache state in multiple caches, e.g., only a single copy in the cache, multiple copies, or only main memory medium state for the data. In multi-processor and multi-cache computer systems, some directory-based probe filters exist that track the state of the processor cache and store the tracked state information in the probe filter. When the coherency agent snoops the bus to discover coherency transactions, it queries the information tracked in the probe filter and sends out a response probe to complete coherency maintenance. But the probability of the occurrence of the degraded operation in the cache consistency maintenance process in the multiprocessor system is higher at present.
Disclosure of Invention
In view of this, one or more embodiments of the present invention provide a data processing method, an apparatus, a processing chip and a server based on cache coherence, which can reduce the probability of a destage operation occurring during a cache coherence maintenance process in a multiprocessor system.
One or more embodiments of the present invention provide a data processing method based on cache coherence, including: acquiring a transaction request for storing cache state information sent to a probe filter, wherein the cache state information stored in the probe filter is stored in a group-associative mode, and an additional memory is added in the probe filter in advance; and responding to the condition that no free storage position exists in the group mapped by the transaction request, and allocating a target address in the extra memory for the cache state information corresponding to the transaction request.
Optionally, each group in the probe filter is added with a data pointer in advance, and the method further includes: and after allocating a target address in the extra memory for the cache state information corresponding to the transaction request, storing the target address into a data pointer corresponding to the group mapped by the transaction request to form a linked list.
Optionally, each group in the probe filter is added with a field for indicating whether the data is valid in advance.
Optionally, each set of the probe filters corresponds to an additional storage location in the additional storage.
Optionally, the cache state information corresponding to the transaction request is stored in the additional memory in a form of a linked list, and the method further includes: triggering a destage operation based on the transaction request in response to the length of a linked list in the set of transaction request mappings reaching a preset length and/or an absence of a free storage location in additional storage locations corresponding to the set of transaction request mappings.
Optionally, the cache state information corresponding to the transaction request is stored in the additional memory in a form of a linked list, and the method further includes: triggering a destage operation based on the transaction request in response to a length of a linked list in a group mapped to the transaction request reaching a preset length and/or an absence of free storage locations in the additional memory.
Optionally, the method further includes: in response to a destage operation initiated by the probe filter or a cache eviction operation occurring in a target storage location of the additional memory, reclaiming a target storage address at which the destage operation or cache eviction operation occurred as a free storage address through a pointer queue added in advance in the probe filter.
One or more embodiments of the present invention further provide a data processing apparatus based on cache coherence, including: the acquisition module is configured to acquire a transaction request sent to a probe filter for storing cache state information, wherein the cache state information stored in the probe filter is stored in a group-associative mode, and an additional memory is added in the probe filter in advance; an allocation module configured to allocate a target address in the extra memory for the cache state information corresponding to the transaction request in response to no free memory location in the set mapped by the transaction request.
Optionally, each group in the probe filter is pre-added with a data pointer, and the apparatus further includes: and the storing module is configured to store the target address into the data pointer corresponding to the group mapped by the transaction request to form a linked list after allocating the target address in the additional memory for the cache state information corresponding to the transaction request.
Optionally, each group in the probe filter is added with a field for indicating whether the data is valid in advance.
Optionally, each set of the probe filters corresponds to an additional storage location in the additional storage.
Optionally, the buffer status information corresponding to the transaction request is stored in the additional memory in a form of a linked list, and the apparatus further includes: a first triggering module configured to trigger a destaging operation based on the transaction request in response to a length of a linked list in the set of transaction request maps reaching a preset length and/or an absence of a free storage location in additional storage locations corresponding to the set of transaction request maps.
Optionally, the buffer status information corresponding to the transaction request is stored in the additional memory in a form of a linked list, and the apparatus further includes: a second triggering module configured to trigger a destage operation based on the transaction request in response to a length of a linked list in a group mapped by the transaction request reaching a preset length and/or no free storage locations in the additional memory.
Optionally, the apparatus further comprises: a reclamation module configured to reclaim, as a free storage address, a target storage address at which a destage operation or a cache eviction operation occurred by a pointer queue added in advance in the probe filter in response to a destage operation initiated by the probe filter or a cache eviction operation occurring by a cache occurring in a target storage location of the additional memory.
One or more embodiments of the present invention also provide a processor chip including: at least one processor core, a cache; the processor core is used for executing any one of the data processing methods based on cache consistency.
One or more embodiments of the present invention also provide a server, including: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, and is used for executing any one of the data processing methods based on cache consistency.
The data processing method based on cache consistency provided by one or more embodiments of the present invention expands the storage capacity of the memory in the probe filter, so as to reduce the probability of occurrence of a destaging operation in the cache consistency maintenance process in the multiprocessor system, thereby reducing bandwidth consumption and access delay of the interconnection bus due to the destaging operation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic diagram illustrating a multiprocessor multi-cache system in accordance with one or more embodiments of the present invention;
FIG. 2 is a flow diagram illustrating a method for probe filter-based data probe filtering in accordance with one or more embodiments of the invention;
FIG. 3 is a schematic diagram illustrating a bank structure in a probe filter according to one or more embodiments of the invention;
FIG. 4 is a block diagram illustrating a probe filter in accordance with one or more embodiments of the invention;
FIG. 5 is a block diagram illustrating a data processing apparatus based on cache coherency in accordance with one or more embodiments of the present invention;
FIG. 6 is a schematic diagram illustrating a processor chip in accordance with one or more embodiments of the invention;
FIG. 7 is a schematic diagram illustrating a server in accordance with one or more embodiments of the invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 is a schematic diagram of a multiprocessor multi-cache system in which a probe filter is used to maintain cache coherency, where the probe filter is illustrated as a directory probe filter, where a plurality of trace information stored in the directory probe filter are stored in a set associative mode, each set containing multiple data paths, according to one or more embodiments of the invention. When a new transaction needs to store the cache state information into the probe filter, the probe filter puts the information into a corresponding group according to the mapping relation. If all the ways of the set are occupied by cache state information, one of the cache state information needs to be evicted from the probe filter, and the probe filter sends a probe to the corresponding cache at this time, and the corresponding cache is evicted from the cache and written back to the memory (this process is called downgrading). However, the degradation is not actively initiated by the processor, so if the processor may need to access the evicted cache, the data needs to be read from memory, and the data needs to be written to the probe filter to be read from memory again. Such a procedure would increase access latency and tie up bus resources and may trigger a new downgrade procedure.
Since the storage capacity of each group in the probe filter group associative storage mode is limited in the related art, the degradation problem is liable to occur in the case where the stored information is not uniform. Therefore, the present invention proposes a data processing method based on cache coherence, which can extend the storage capacity of a probe filter by using an additional storage structure, thereby reducing the degradation probability.
FIG. 2 is a flow diagram illustrating a probe filter-based data probe filtering method according to one or more embodiments of the invention, the method comprising:
step 201: acquiring a transaction request sent to a probe filter for storing cache state information, wherein the cache state information stored in the probe filter is stored in a group-associative mode, and an additional memory is added in the probe filter in advance;
the probe filter in one or more embodiments of the present invention may be, for example, a directory-type probe filter, based on which the memory in the probe filter may also be referred to as a target.
In one example, assuming that the probe filter includes M sets, for any one set of the probe filters, adding additional storage space of storage capacity a to the set may significantly reduce the probability of degraded operation, and adding additional storage space of storage capacity a to each set of the probe filters may increase the maximum capacity of each set of the probe filters.
In another example, assuming that the probe filter includes M groups, adding an extra storage space with a storage capacity a to any one of the groups of the probe filter can significantly reduce the probability of destaging operation, since destaging operation occurs randomly and not in all the groups of the probe filter, the storage capacity of the extra memory can be set to be less than ma based on the consideration of increasing the utilization rate of the extra memory, i.e., in one or more embodiments of the present invention, the extra memory can be increased nonlinearly, which can make the circuit area occupied by the extra memory smaller.
Step 202: and responding to the condition that no free storage position exists in the group mapped by the transaction request, and allocating a target address in the extra memory for the cache state information corresponding to the transaction request.
For example, when there is no free location in the group to which the new transaction request is mapped, a free address may be allocated in the extra memory to store the cache state information corresponding to the transaction request.
The data processing method based on cache consistency provided by one or more embodiments of the present invention expands the storage capacity of the memory in the probe filter, so as to reduce the probability of occurrence of a destaging operation in the cache consistency maintenance process in the multiprocessor system, thereby reducing bandwidth consumption and access delay of the interconnection bus caused by the destaging operation. In some scenarios where destaging operations are more frequent, the overall performance of the processor may be improved.
In one or more embodiments of the invention, each group in the probe filter is pre-populated with a data pointer, which may be referred to as an extra data pointer, for example. Taking the example shown in FIG. 3, in the probe filter, set 0 corresponds to set 0 additional data pointer 0, set 1 corresponds to set 1 additional data pointer …, and set M-1 corresponds to set M-1 additional data pointer, based on which the method may further include:
and after allocating a target address in the extra memory for the cache state information corresponding to the transaction request, storing the target address into a data pointer corresponding to the group mapped by the transaction request to form a linked list. A linked list is a non-continuous, non-sequential storage structure on a physical storage unit, and the logical order of data elements is realized by the order of pointer links in the linked list. Each node comprises two parts: one is a data field that stores the data element and the other is a pointer field that stores the address of the next node. Because the overflow quantity of each group of data in the probe filter is less and random, the storage structure of the linked list can be used for simultaneously expanding multiple groups in the probe filter before the additional memory reaches the upper storage limit, so that the multiple groups can share the same additional memory, and the randomness of the degradation operation can be well coped with. And the data read-write performance of each group cannot be influenced by the amount of data in the additional memory, and the data read-write performance of each group is only related to the length of the linked list.
In one or more embodiments of the present invention, each group in the probe filter may be pre-added with a field for indicating whether the data is valid, and still taking fig. 3 as an example, the field shown in the dashed box is a pre-added field, which may be used for indicating whether the data is valid and may also indicate a pointer of the data.
In one or more embodiments of the invention, each set of the probe filters may correspond to an additional storage location in the additional memory. Still taking the example of including M groups in the probe filter, each of the M groups is assigned a corresponding additional storage location in advance, based on which each of the groups in the probe filter may have one fixed additional storage location. The transaction data mapped to a transaction request of a group should be stored in the corresponding additional storage location of the group, e.g., the transaction data M-1 mapped to a transaction request of group M-1 should be stored in the corresponding additional storage location of group M-1.
In one or more embodiments of the invention, the cache state information corresponding to the transaction request is stored in the additional memory in a linked list, based on which, in case each set in the probe filter corresponds to one additional storage location in the additional memory, the method may further comprise: triggering a destage operation based on the transaction request in response to the length of a linked list in the set of transaction request mappings reaching a preset length and/or an absence of a free storage location in additional storage locations corresponding to the set of transaction request mappings. As can be seen, the new transaction may trigger the destaging operation only when the length of the linked list of each group in the probe filter reaches a preset upper limit or no idle storage location exists in the additional storage location of the current group mapped by the transaction request, so the data processing method based on cache coherence of this embodiment reduces the probability of triggering the destaging operation.
In one or more embodiments of the present invention, the cache state information corresponding to the transaction request is stored in the additional memory in a form of a linked list, and based on this, in a case that the groups in the probe filter share a storage location in the additional memory, the method may further include: triggering a destage operation based on the transaction request in response to a length of a linked list in a group mapped by the transaction request reaching a preset length and/or an absence of free storage locations in the additional memory. Therefore, only after the length of the linked list of each group in the probe filter reaches a preset upper limit or the additional memory is full, a new transaction is likely to trigger the destaging operation, so that the data processing method based on the cache consistency reduces the probability of triggering the destaging operation.
In one or more embodiments of the invention, a pointer queue may be added to the probe filter as shown in fig. 4, where in fig. 4, the probe filter memory represents the memory native to the probe filter. Based on this, the above method may further include: in response to a destage operation initiated by the probe filter or a cache eviction operation occurring in a target storage location of the additional memory, reclaiming a target storage address at which the destage operation or cache eviction operation occurred as a free storage address through a pointer queue added in advance in the probe filter.
Still taking the example shown in fig. 3, a data processing method based on cache coherence according to one or more embodiments of the present invention is exemplarily described. As shown in fig. 3, in the method, a field for indicating whether extra data (referring to data stored in an extra memory) is valid and a pointer of the extra data are added to each group of the original probe filter, and an extra memory is added, in which buffer status information of probe filtering is stored in a data format of a linked list. When the group 0 mapped by the new transaction request has no idle position, an idle address is allocated in the extra memory to store the transaction data, and the allocated address is stored in the pointer position of the extra data to form a linked list, if the group 0 subsequently receives the new transaction and needs to allocate the position for storage, an address is allocated in the extra memory, and the pointer of the previous linked list data points to the current data address.
Similarly, when there is no free location in the original storage space of group 1, a free address is allocated from the extra memory to the new transaction data of group 1, and a linked list is formed according to the above manner. Other groups within the probe filter may be expanded using the above described approach.
In one or more embodiments of the present invention, in the data processing method based on cache coherency, whether to enable the additional memory may be set, for example, whether to enable or not enable the additional memory may be determined by setting the content of the command, when it is determined that the additional memory is not enabled according to the set command, the probe filter performs data filtering in its original manner without affecting the original function of the probe filter, and when it is determined that the additional memory is enabled according to the set command, the above-described processing manner of storing transaction data using the additional memory may be performed.
In one or more embodiments of the invention, the length of the linked list of a certain group in the probe filter can be increased until the extra storage space is used up, which can improve the capacity of the non-uniform distribution of the storage of the probe filter in extreme cases.
In one or more embodiments of the present invention, a maximum linked list length value may be preset according to the capacity of each path of data and the access delay, where the capacity of each path of data is in a positive correlation with the maximum linked list length value, and the access delay is in a negative correlation with the maximum linked list length value.
FIG. 5 is a block diagram illustrating a data processing apparatus based on cache coherency according to one or more embodiments of the invention, such as the apparatus 50 shown in FIG. 5, comprising:
an obtaining module 51, configured to obtain a transaction request for storing cache state information, which is sent to a probe filter, where the cache state information stored in the probe filter is stored in a group-associative mode, and an additional memory is added in the probe filter in advance;
an allocating module 52 configured to allocate a target address in the extra memory for the cache state information corresponding to the transaction request in response to no free memory location in the set mapped by the transaction request.
In one or more embodiments of the present invention, each group in the probe filter is pre-incremented by one data pointer, and the apparatus may further include:
and the storing module is configured to store the target address into the data pointer corresponding to the group mapped by the transaction request to form a linked list after allocating the target address in the additional memory for the cache state information corresponding to the transaction request.
In one or more embodiments of the invention, each group in the probe filter may be pre-populated with a field indicating whether the data is valid.
In one or more embodiments of the invention, each set of the probe filters may correspond to an additional storage location in the additional memory.
In one or more embodiments of the present invention, the cache state information corresponding to the transaction request may be stored in the additional memory in a form of a linked list, and the apparatus may further include: a first triggering module configured to trigger a destage operation based on the transaction request in response to a length of a linked list in the set of transaction request mappings reaching a preset length and/or an absence of a free storage location in additional storage locations corresponding to the set of transaction request mappings.
In one or more embodiments of the present invention, the buffer status information corresponding to the transaction request is stored in the additional memory in the form of a linked list, and the apparatus may further include: a second triggering module configured to trigger a destage operation based on the transaction request in response to a length of a linked list in a group mapped by the transaction request reaching a preset length and/or no free storage locations in the additional memory.
In one or more embodiments of the invention, the apparatus may further include: a reclamation module configured to reclaim, as a free storage address, a target storage address at which a destage operation or a cache eviction operation occurred by a pointer queue added in advance in the probe filter in response to a destage operation initiated by the probe filter or a cache eviction operation occurring by a cache occurring in a target storage location of the additional memory.
One or more embodiments of the present invention further provide a processor chip, and fig. 6 is a schematic diagram of a processing chip according to one or more embodiments of the present invention, as shown in fig. 6, the processing chip 60 includes: at least one processor core 61 and a cache 62; the processor core 61 is configured to execute any one of the above data processing methods based on cache coherency.
One or more embodiments of the present invention also provide a server, including: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, and is used for executing any one of the data processing methods based on cache consistency.
Accordingly, as shown in fig. 7, a server provided by one or more embodiments of the present invention may include: the device comprises a shell 71, a processor 72, a memory 73, a circuit board 74 and a power circuit 75, wherein the circuit board 74 is arranged inside a space enclosed by the shell 71, and the processor 72 and the memory 73 are arranged on the circuit board 74; a power supply circuit 75 for supplying power to each circuit or device of the electronic apparatus; the memory 73 is used to store executable program code; the processor 72 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 73, for executing any one of the cache coherency-based data processing methods provided by the foregoing embodiments.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on differences from other embodiments.
In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
For convenience of description, the above devices are described as being respectively described in terms of functional division into various units/modules. Of course, the functionality of the units/modules may be implemented in one or more software and/or hardware implementations of the invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (12)

1. A data processing method based on cache consistency is characterized by comprising the following steps:
acquiring a transaction request sent to a probe filter for storing cache state information, wherein the cache state information stored in the probe filter is stored in a group-associative mode, and an additional memory is added in the probe filter in advance;
responding to no free storage position in the group mapped by the transaction request, and allocating a target address in the additional memory for the cache state information corresponding to the transaction request;
wherein each group in the probe filter is pre-populated with a data pointer, the method further comprising:
after allocating a target address in the extra memory for the cache state information corresponding to the transaction request, storing the target address into a data pointer corresponding to the group mapped by the transaction request to form a linked list;
wherein each set of the probe filters corresponds to an additional storage location in the additional storage; the probe filter comprises M groups, and for any group in the probe filter, an additional storage space with a storage capacity a is added to the group, and the storage capacity of the additional storage is set to be less than M x a.
2. The method of claim 1, wherein each group in the probe filter is pre-populated with a field indicating whether data is valid.
3. The method of claim 1, wherein the cache state information corresponding to the transaction request is stored in the additional memory in a linked list, the method further comprising:
triggering a destaging operation based on the transaction request in response to the length of a linked list in the set of transaction request maps reaching a preset length and/or no free storage location in additional storage locations corresponding to the set of transaction request maps.
4. The method of claim 1, wherein the cache state information corresponding to the transaction request is stored in the additional memory in a linked list, the method further comprising:
triggering a destage operation based on the transaction request in response to a length of a linked list in a group mapped to the transaction request reaching a preset length and/or an absence of free storage locations in the additional memory.
5. The method according to any one of claims 1 to 4, further comprising:
in response to a destage operation initiated by the probe filter or a cache eviction operation initiated by a cache occurring in a target storage location of the additional memory, reclaiming the target storage address at which the destage operation or cache eviction operation occurred as a free storage address through a pointer queue added in advance in the probe filter.
6. A data processing apparatus based on cache coherency, comprising:
the acquisition module is configured to acquire a transaction request sent to a probe filter for storing cache state information, wherein the cache state information stored in the probe filter is stored in a group-associative mode, and an additional memory is added in the probe filter in advance;
an allocation module configured to allocate a target address in the additional memory for the cache state information corresponding to the transaction request in response to no free memory location in the group mapped by the transaction request;
wherein each group in the probe filter is pre-populated with a data pointer, the apparatus further comprising:
a storing module configured to store a target address into a data pointer corresponding to a group mapped by the transaction request to form a linked list after allocating the target address in the additional memory for the cache state information corresponding to the transaction request;
wherein each set of the probe filters corresponds to an additional storage location in the additional memory; the probe filter comprises M groups, and for any group in the probe filter, an additional storage space with a storage capacity a is added to the group, and the storage capacity of the additional storage is set to be less than M x a.
7. The apparatus of claim 6, wherein each group in the probe filter is pre-populated with a field indicating whether data is valid.
8. The apparatus of claim 6, wherein the buffer status information corresponding to the transaction request is stored in the additional memory in a linked list, the apparatus further comprising:
a first triggering module configured to trigger a destage operation based on the transaction request in response to a length of a linked list in the set of transaction request mappings reaching a preset length and/or an absence of a free storage location in additional storage locations corresponding to the set of transaction request mappings.
9. The apparatus of claim 6, wherein the buffer status information corresponding to the transaction request is stored in the additional memory in a linked list, the apparatus further comprising:
a second triggering module configured to trigger a destage operation based on the transaction request in response to a length of a linked list in a group mapped by the transaction request reaching a preset length and/or no free storage locations in the additional memory.
10. The apparatus of any one of claims 6 to 9, further comprising:
a reclamation module configured to reclaim, as a free storage address, a target storage address at which a destage operation or a cache eviction operation occurred by a pointer queue added in advance in the probe filter in response to a destage operation initiated by the probe filter or a cache eviction operation occurring by a cache occurring in a target storage location of the additional memory.
11. A processor chip, comprising: at least one processor core, a cache;
the processor core is used for executing the data processing method based on cache coherence, which is defined by any one of claims 1-5.
12. A server, comprising:
the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for executing the cache coherence based data processing method of any one of claims 1 to 5.
CN202011316533.2A 2020-11-20 2020-11-20 Data processing method and device based on cache consistency, processing chip and server Active CN112463652B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011316533.2A CN112463652B (en) 2020-11-20 2020-11-20 Data processing method and device based on cache consistency, processing chip and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011316533.2A CN112463652B (en) 2020-11-20 2020-11-20 Data processing method and device based on cache consistency, processing chip and server

Publications (2)

Publication Number Publication Date
CN112463652A CN112463652A (en) 2021-03-09
CN112463652B true CN112463652B (en) 2022-09-27

Family

ID=74798334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011316533.2A Active CN112463652B (en) 2020-11-20 2020-11-20 Data processing method and device based on cache consistency, processing chip and server

Country Status (1)

Country Link
CN (1) CN112463652B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238170A (en) * 2021-12-21 2022-03-25 海光信息技术股份有限公司 Data processing method, data processing apparatus, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6014690A (en) * 1997-10-24 2000-01-11 Digital Equipment Corporation Employing multiple channels for deadlock avoidance in a cache coherency protocol
US6108752A (en) * 1997-10-24 2000-08-22 Compaq Computer Corporation Method and apparatus for delaying victim writes in a switch-based multi-processor system to maintain data coherency
CN108337172A (en) * 2018-01-30 2018-07-27 长沙理工大学 Extensive OpenFlow flow table classification storage architecture and acceleration lookup method
CN109388585A (en) * 2017-08-07 2019-02-26 英特尔公司 For providing the technology of cache coherence based on cache types
CN110362504A (en) * 2018-04-09 2019-10-22 英特尔公司 Management to consistency link and multi-level store
CN111936979A (en) * 2018-04-12 2020-11-13 Arm有限公司 Cache control in the presence of speculative read operations

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7305524B2 (en) * 2004-10-08 2007-12-04 International Business Machines Corporation Snoop filter directory mechanism in coherency shared memory system
US7500049B2 (en) * 2005-10-31 2009-03-03 Intel Corporation Providing a backing store in user-level memory
US8015365B2 (en) * 2008-05-30 2011-09-06 Intel Corporation Reducing back invalidation transactions from a snoop filter
US8185695B2 (en) * 2008-06-30 2012-05-22 Advanced Micro Devices, Inc. Snoop filtering mechanism
US9639470B2 (en) * 2014-08-26 2017-05-02 Arm Limited Coherency checking of invalidate transactions caused by snoop filter eviction in an integrated circuit
US11237965B2 (en) * 2014-12-31 2022-02-01 Arteris, Inc. Configurable snoop filters for cache coherent systems
US10157133B2 (en) * 2015-12-10 2018-12-18 Arm Limited Snoop filter for cache coherency in a data processing system
US20170300427A1 (en) * 2016-04-18 2017-10-19 Mediatek Inc. Multi-processor system with cache sharing and associated cache sharing method
GB2557254B (en) * 2016-12-02 2020-02-12 Advanced Risc Mach Ltd Filtering coherency protocol transactions

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6014690A (en) * 1997-10-24 2000-01-11 Digital Equipment Corporation Employing multiple channels for deadlock avoidance in a cache coherency protocol
US6108752A (en) * 1997-10-24 2000-08-22 Compaq Computer Corporation Method and apparatus for delaying victim writes in a switch-based multi-processor system to maintain data coherency
CN109388585A (en) * 2017-08-07 2019-02-26 英特尔公司 For providing the technology of cache coherence based on cache types
CN108337172A (en) * 2018-01-30 2018-07-27 长沙理工大学 Extensive OpenFlow flow table classification storage architecture and acceleration lookup method
CN110362504A (en) * 2018-04-09 2019-10-22 英特尔公司 Management to consistency link and multi-level store
CN111936979A (en) * 2018-04-12 2020-11-13 Arm有限公司 Cache control in the presence of speculative read operations

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Key-Value型NoSQL本地存储系统研究;马文龙等;《计算机学报》;20170601(第08期);全文 *
一种基于共享转发态的多级缓存一致性协议;陈继承等;《计算机研究与发展》;20170415(第04期);全文 *

Also Published As

Publication number Publication date
CN112463652A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN105740164B (en) Multi-core processor supporting cache consistency, reading and writing method, device and equipment
CN101593160B (en) Reducing back invalidation transactions from snoop filter
US6826651B2 (en) State-based allocation and replacement for improved hit ratio in directory caches
JP4447580B2 (en) Partitioned sparse directory for distributed shared memory multiprocessor systems
US10402327B2 (en) Network-aware cache coherence protocol enhancement
KR20170098187A (en) Associative and atomic write-back caching system and method for storage subsystem
CN111143244B (en) Memory access method of computer equipment and computer equipment
CN107341114B (en) Directory management method, node controller and system
US7117312B1 (en) Mechanism and method employing a plurality of hash functions for cache snoop filtering
US7325102B1 (en) Mechanism and method for cache snoop filtering
US8694732B2 (en) Enhanced coherency tracking with implementation of region victim hash for region coherence arrays
CN112463652B (en) Data processing method and device based on cache consistency, processing chip and server
CN113656212B (en) System and method for cache directory TCAM error detection and correction
CN106164874B (en) Method and device for accessing data visitor directory in multi-core system
CN113138851B (en) Data management method, related device and system
CN114238171B (en) Electronic equipment, data processing method and device and computer system
US11755485B2 (en) Snoop filter device
CN112612726B (en) Data storage method and device based on cache consistency, processing chip and server
CN114238165B (en) Data processing method, data processing apparatus, and storage medium
JP6272011B2 (en) Cache device, computer including cache device, and cache control method
US20230195643A1 (en) Re-fetching data for l3 cache data evictions into a last-level cache
CN116049031A (en) Data processing method, device, electronic equipment and storage medium
CN114238173A (en) Method and system for realizing CRQ and CWQ quick deallocate in L2
CN114238170A (en) Data processing method, data processing apparatus, and storage medium
CN116401183A (en) Storage control system and storage server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant