CN116126517A - Access request processing method, multi-core processor system, chip and electronic device - Google Patents

Access request processing method, multi-core processor system, chip and electronic device Download PDF

Info

Publication number
CN116126517A
CN116126517A CN202211593341.5A CN202211593341A CN116126517A CN 116126517 A CN116126517 A CN 116126517A CN 202211593341 A CN202211593341 A CN 202211593341A CN 116126517 A CN116126517 A CN 116126517A
Authority
CN
China
Prior art keywords
target
cache line
consistency
access request
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211593341.5A
Other languages
Chinese (zh)
Inventor
冯寅翀
林江
贾琳黎
程永波
曹俊
杨凯歌
李洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202211593341.5A priority Critical patent/CN116126517A/en
Publication of CN116126517A publication Critical patent/CN116126517A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides an access request processing method, a multi-core processor system, a chip and electronic equipment, wherein the method is applied to a home agent and comprises the following steps: detecting a transmission frequency returning that consistency is invalid; when the frequency threshold value is higher than a preset frequency threshold value, acquiring target data aiming at a target address of a data access request of a processor core; the consistency state of the target cache line corresponding to the target address is recorded in a target entry of a consistency directory; setting an identification carrying a temporary storage state in a response packet of a data access request, and feeding back the identification to a processor core; identifying a target cache line for the processor core in the non-last-level cache to load target data, wherein the last-level cache sets a temporary storage state for the target cache line, so that the non-last-level cache eliminates the target cache line, and the target cache line is directly written back to the memory without passing through the last-level cache; and acquiring a write-back request of the target cache line, and releasing the target entry. The technical scheme provided by the embodiment of the application can improve the performance of the multi-core processor system.

Description

Access request processing method, multi-core processor system, chip and electronic device
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to an access request processing method, a multi-core processor system, a chip and electronic equipment.
Background
In modern computer architectures, processor cores may improve data access efficiency through caching. The cache is to store data in units of cache lines, and temporarily store data accessed before a processor core in a memory, or frequently accessed data.
Under the architecture of the multi-core processor system, each processor core has a cache, at this time, a cache line of the same address may be mapped into caches of a plurality of processor cores, and if one processor core modifies data in the cache line, the cache inconsistency problem may occur due to inconsistency of corresponding cache lines in other processor cores.
To guarantee cache coherency, a cache coherency protocol (cache coherence protocol) may be utilized to manage the coherency state of the cache line. Under the cache coherency protocol, a coherency directory may use an entry (entry) to record the coherency state of a cache line. However, the number of entries in the coherence directory is limited, so how to provide a technical solution to improve the service performance of the coherence directory, and further improve the performance of the multi-core processor system, which is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The technical problem solved by the embodiment of the application is how to improve the service performance of the consistency catalogue, and further improve the performance of the multi-core processor system.
In order to solve the above problems, embodiments of the present application provide an access request processing method, a multi-core processor system, a chip, and an electronic device.
In a first aspect, an embodiment of the present application provides an access request processing method, applied to a home agent, including:
detecting a transmission frequency returning that consistency is invalid;
when the sending frequency of the return consistency invalidation is higher than a preset frequency threshold, aiming at a data access request sent by a processor core, acquiring target data according to a target address indicated by the data access request; the consistency state of the target cache line corresponding to the target address is recorded in a target entry of a consistency directory;
setting an identification carrying a temporary storage state in a response packet corresponding to the data access request, and feeding back the response packet to the processor core; the identifier is used for enabling the processor core to load the target data in a target cache line in a non-final cache, and setting a temporary storage state for the target cache line in the final cache so that the non-final cache eliminates the target cache line, and the target cache line is directly written back to the memory without passing through the final cache;
And acquiring a write-back request of the target cache line, and releasing target entries in the consistency directory.
In a second aspect, an embodiment of the present application provides an access request processing method, applied to a processor core, including:
sending a data access request to a home agent;
acquiring target data and response packets corresponding to the data access request;
if the response packet is provided with an identifier carrying a temporary storage state, loading target data to a target cache line of a non-final cache, and storing the temporary storage state of the target cache line in the final cache; when the sending frequency of the return consistency invalidation of the home agent is higher than a preset frequency threshold, the home agent sets the identification in a response packet responding to the data access request; the temporary storage state is used for indicating that when the target cache line is removed from the cache of the processor core, the target cache line is directly written back to the memory without passing through the final cache;
and when the target cache line is rejected from the cache, sending a write-back request of the target cache line, wherein the write-back request is used for rejecting the target cache line to a memory and indicating a home agent to release a target entry corresponding to the target cache line in a consistency directory.
In a third aspect, embodiments of the present application further provide a multi-core processor system, including:
a home agent for detecting a transmission frequency at which the return consistency is invalid; when the sending frequency of the return consistency invalidation is higher than a preset frequency threshold, aiming at a data access request sent by a processor core, acquiring target data according to a target address indicated by the data access request; the consistency state of the target cache line corresponding to the target address is recorded in a target entry of a consistency directory; setting an identification carrying a temporary storage state in a response packet corresponding to the data access request, and feeding back the response packet to the processor core; acquiring a write-back request of the target cache line, and releasing target items in the consistency directory;
a processor core for sending a data access request to a home agent; acquiring target data and response packets corresponding to the data access request; if the response packet is provided with an identifier carrying a temporary storage state, loading target data to a target cache line of a non-final cache, and storing the temporary storage state of the target cache line in the final cache; the temporary storage state is used for indicating that when the target cache line is removed from the cache of the processor core, the target cache line is directly written back to the memory without passing through the final cache; and sending a write-back request of the target cache line when the target cache line is removed from the cache.
In a fourth aspect, embodiments of the present application further provide a chip including the multi-core processor system according to the third aspect.
In a fifth aspect, embodiments of the present application further provide an electronic device including a chip as described in the fourth aspect.
It should be noted that, when the home agent processes the data access request from the processor core, the home agent may query the coherency state of the cache line corresponding to the address in the coherency directory that is connected based on the address indicated by the data access request; when the consistency state of the cache line corresponding to the address is not recorded in the consistency directory, the data required by the data access request is recorded in the memory instead of the cache of the multi-core processor, and the data required by the data access request needs to be read from the memory; at this time, for the data read from the memory, the home agent needs to record the coherency state of the new cache line in the coherency directory for the cache line corresponding to the address of the data, and if the coherency directory has no usable entry to record the coherency state of the new cache line at this time, the home agent needs to release one used entry in the coherency directory to obtain a usable entry, and feed back a coherency invalidation to the processor core to notify the processor core to reject the cache line corresponding to the used entry from the cache to the memory. Therefore, for the data access request with lower hit rate in the cache of the multi-core processor, the home agent frequently feeds back the consistency invalidation back to the processor core, so that the home agent needs to wait for the processor core to reject the released cache line in the used item from the cache to the memory, and then the consistency state of the new cache line can be recorded in the usable item of the consistency directory, thereby reducing the overall data access request processing efficiency of the multi-core processor system and affecting the performance of the multi-core processor system.
Based on this, in the access request processing method provided by the embodiment of the present application, the home agent may detect the sending frequency of returning that the consistency is invalid; when the sending frequency of returning the consistency invalidation is higher than a preset frequency threshold value, the occupied entries in the consistency catalogue are frequently released, and the data in the cache of the processor core are frequently written back to the memory; at this time, in order to prevent the consistency directory from frequently feeding back to the processor core to return consistency invalidation, increase the time for waiting for the processor core to reject the cache line released in the consistency directory from the cache to the memory, for the data access request sent by the processor core, the home agent needs to set an identifier carrying a temporary storage state in a response packet corresponding to the data access request and feed back the response packet to the processor core in addition to obtaining target data according to the target address indicated by the data access request; the identifier is used for enabling the processor core to load the target data in a target cache line in a non-final cache, and setting a temporary storage state for the target cache line in the final cache, so that the non-final cache can reject the target cache line, and the target cache line is directly written back to the memory without passing through the final cache. That is, based on the identifier carrying the temporary storage state set in the response packet, when the target cache line is temporarily stored in the cache of the processor core, and when the target cache line is removed from the cache, the target data is directly removed from the non-last-stage cache of the processor core to the memory without passing through the last-stage cache of the processor core, so that the time for waiting for the processor core to remove the target cache line from the non-last-stage cache to the last-stage cache and then removing the target cache line to the memory through the last-stage cache can be saved, and the effect of improving the overall access request processing efficiency of the processor core is achieved. Furthermore, when the home agent acquires the write-back request of the target cache line, the target entry in the consistency directory can be released; and the consistency state of the target cache line corresponding to the target address is recorded in a target entry of a consistency directory.
It can be seen that, in the access request processing method provided by the embodiment of the present application, by detecting the sending frequency of the return consistency invalidation fed back by the consistency directory, when the sending frequency reaches the frequency threshold, it may be determined that the consistency directory has frequently sent out the state of the return consistency invalidation, so as to avoid the situation of low processing efficiency of the multi-core processor system caused by the return consistency invalidation, in the response packet of the data access request of the home agent to the processor core, an identifier carrying a temporary storage state is set, so that based on the identifier, the processor core loads the target data in a target cache line in a non-final cache; and setting a temporary storage state for the target cache line in the final-stage cache, so that the target cache line is directly written back to the memory without passing through the final-stage cache when the target cache line is removed from the non-final-stage cache. Therefore, the time for the processor core to reject the corresponding target cache line can be saved, and the home agent does not need to wait for the processing time for the processor core to reject the target cache line from the cache to the memory for a long time; meanwhile, the processor core sends the write-back request of the target cache line to the home agent according to the identifier of the temporary storage state, and then the write-back request is directly written back to the memory, so that the home agent can inquire the target item recorded with the target cache line in the consistency catalog, the home agent can release the target cache line in the target item according to the received write-back request, namely, write the target cache line back to the memory, thereby achieving the purpose of pre-releasing the used item in the consistency catalog, achieving the purpose of improving the use performance of the consistency catalog, and further improving the performance of the multi-core processor system.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
FIG. 1 illustrates an example block diagram of a multi-core processor system.
Fig. 2 illustrates a flow diagram of a home agent processing a data access request.
Fig. 3 is a flow chart of an access request processing method according to an embodiment of the present application.
Fig. 4 is another flow chart of an access request processing method provided in an embodiment of the present application.
Fig. 5 is a schematic flow chart of another method for processing an access request according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In modern computer architectures, when accessing data, a processor core may first read data in a cache (cache) of the processor core, and if the data is temporarily stored in the cache due to an operation of the processor core for a data access request before the processor core, the processor core may directly read the data from the cache, without accessing a memory or other caches in the processor core to read the data, thereby improving data access efficiency of the processor core. That is, the buffer memory is used as a temporary storage between the processor core and the memory, and can temporarily store the data accessed before the processor core or the data accessed frequently, so as to improve the data access efficiency of the processor core. It should be noted that, if the processor core does not find the data to be accessed in its own cache, the processor core may read the data to the memory or other processor cores and move the read data to the cache.
With the continuous development of multi-core technology, a multi-core processor system having a plurality of processor cores may be provided in a chip, and each processor core has its own cache. For ease of understanding, FIG. 1 illustrates an example block diagram of a multi-core processor system.
As shown, the multi-core processor system includes a plurality of core clusters (core clusters) 10, a network on chip (fabric) 11, a plurality of memory controllers 12, and a corresponding plurality of memory (DDR) 13, input output Interfaces (IOs) 14, and the like. Each core cluster includes a processor core and a multi-level cache, for example, a first-level cache L1, a second-level cache L2, and a Last-level cache (LLC). The input/output interface 14 is a bus interface for connecting the input/output devices 15, such as an external device interconnection bus (PCI, peripheral Component Interconnect), a high-speed serial computer expansion bus (PCIE, peripheral Component Interconnect Express), a universal serial bus (USB, universal Serial Bus), and the like. The memory controllers 12 are used for controlling access to the corresponding memories 13. The network-on-chip 11 is used to connect other individual components in the system.
The network-on-chip 11 includes a plurality of kernel request ports (core request port) 110 and a plurality of home agents (home agents) 111, and devices in the network-on-chip 11 are connected through a routing network in the network-on-chip; wherein one core request port 110 is connected to one core cluster 10, as shown in fig. 1, the number of core request ports 110 and the number of core clusters 10 are corresponding. A home agent connects a memory controller and a coherence directory (coherence directory) 112. Similarly, as shown in fig. 1, the number of home agents 111 and memory controllers 12 and coherence directory 112 are the same.
Whenever a processor core in the core cluster has a data access requirement, if the processor core in the core cluster does not find data in its own cache, the processor core in the core cluster may issue a data access request to the connected core request port to access the memory or the caches of other processor cores; the kernel request port may send a data access request to the home agent, and then the home agent queries a coherence state of a cache line (cache line) corresponding to the address in the coherence directory based on the address indicated by the data access request. The consistency directory may record the consistency state of the cache line corresponding to the address, which indicates that the processor cores sending the data access request need data in other processor cores, and at this time, the required data can be acquired from the corresponding other processor cores according to the cache line consistency state recorded by the entries of the consistency directory; the coherence directory may not have a coherence state recorded for the cache line corresponding to the address, indicating that the required data is not stored in the caches within each processor core of the core cluster, and at this time, the required data needs to be obtained from the memory. It should be noted that a cache line may correspond to an address in memory, and one cache line may be mapped into caches of multiple processor cores, so that one cache line may have multiple copies in multiple core clusters. The consistency state of the cache line refers to the overall state of the cache line for each copy in the plurality of kernel clusters; and record the coherency state of the cache line in a coherency directory.
The coherence directory may be used to track information such as the coherence state of a cache line corresponding to the memory address to which it is coupled. For example, the address range of the first memory is [0-7], the address range of the second memory is [8-15], the coherency directory coupled to the first memory is the cache line of the address range [0-7], the tracked addresses are the cache lines already existing in one or more core clusters, and the coherency directory coupled to the second memory is the cache line of the address range [8-15 ].
For a convenient understanding of the processing manner of the home agent for the data access request, please refer to fig. 2, and fig. 2 is a schematic flow chart illustrating the processing of the data access request by the home agent.
As shown, the process may include the steps of:
in step S001, a data access request sent by a processor core is received.
It should be noted that the data access request is routed to the home agent in case of a miss in the cache of the processor core. The specific implementation process is that the data is sent to a kernel request port through a processor core and then reaches a home agent through a network-on-chip.
Step S002, query whether the entry in the coherence directory records the coherence state of the target cache line, if not, execute step S003, and if yes, execute step S008.
It is easy to understand that the target cache line corresponds to the target address indicated by the target cache line data access request.
As described above, the coherence directory is used to track the coherence state of the corresponding cache line with respect to the address of the memory to which it is coupled. For example, in the data access request sent by the processor core, the indicated target address is [0-7], the first memory is connected with the coherence directory, the memory address tracked in the entry of the coherence directory is the memory address of the first memory, and the coherence state of the corresponding target cache line is obtained. Therefore, when the home agent receives the data access request sent by the processor core, it is first determined whether the coherency state of the target cache line corresponding to the target address is already recorded in the entry of the coherency directory. If records exist in the entries of the coherence directory, it is indicated that the target data (i.e., the data in the target address indicated by the data access request) is stored in the caches of the other processor cores, and the target data is acquired from the caches of the corresponding processor cores according to the coherence state recorded in the entries of the coherence directory.
When the target data is not recorded in the entry of the coherence directory, it is indicated that the target data is not stored in other processor cores, and the cache lines corresponding to the addresses of the memories connected with the cache lines are in one-to-one correspondence due to the coherence state tracked by the coherence directory. Therefore, the accessed consistency directory can be determined according to the target address, and then the memory connected with the consistency directory is accessed, so that the target data is obtained.
For convenience of description, the address indicated by the data access request is referred to herein as a target address, and the address may be a memory address where data needs to be accessed; the cache line corresponding to the target address is referred to as a target cache line. Based on the target address indicated by the data access request, the home agent searches the coherence directory corresponding to the target address, if the coherence state of the target cache line corresponding to the target address is not recorded in an entry of the coherence directory, it is indicated that the data corresponding to the target address is not stored in the caches in other kernel clusters, and the data of the target address needs to be read from the memory and moved to the cache of the processor core at this time, so that the coherence directory needs to determine the current position of the target cache line according to the target address, and therefore the coherence directory needs to determine whether a usable entry exists in the coherence directory for recording the coherence state of the target cache line. If the consistency state of the target cache line is recorded in an entry of the consistency directory, the fact that the data corresponding to the target cache line is stored in caches of other kernel clusters is indicated, and the processor core sending the data access request can acquire the target data from the corresponding processor core according to the consistency state of the target cache line recorded in the consistency directory.
Step S003, determining whether there is a usable entry in the coherence directory, if not, executing step S004, and if yes, executing step S007.
No entry exists in the coherence directory to record the target cache line, indicating that the target data required for the data access request needs to be fetched from memory. Based on a consistency protocol followed by the multi-core processor system accessing the data, after the home agent acquires the target data from the memory, the target cache line corresponding to the target address is recorded in the consistency directory, so that the multi-core processor system can work normally. Thus, there is a need to further determine whether there are available entries in the coherency directory to record the coherency state of the target cache line.
In step S004, the coherence directory invalidates the cache line in the used entry, obtains a usable entry, and sends a return coherence invalidation to the processor core.
If no entry is available in the coherence directory, the coherence directory invalidates the used entry, i.e., invalidates the cache line recorded in the entry, and returns a coherence invalidation to the processor core.
After receiving the return consistency invalidation, the processor core eliminates the invalidated cache line in the consistency catalog and sends a write-back signal for writing the invalidated cache line back to the memory to the home agent.
Step S005, receiving a write-back signal sent by the processor core, where the write-back signal is used to instruct the processor core to reject, to the memory, a cache line stored in the cache and invalidated in the coherence directory, and allocate the usable entry to the target cache line.
The coherence directory tracks the coherence state of the cache line corresponding to the memory address to which it is coupled, and when a cache line in a used entry is invalidated in the coherence directory, the invalidated cache line stored in the cache of the processor core needs to be synchronously removed or invalidated in order to maintain the coherence state of the cache line. After the home agent receives the invalidating cache line that has been rejected by the processor core to memory or invalidated, it may allocate an empty usable entry for the target cache line.
Step S006, according to the consistency status of the target cache line, obtaining the data of the data access request from the memory.
After the home agent allocates a usable entry capable of recording a consistency state to the target cache line according to the received cache line write-back signal of the processor core, the data access request can be normally executed, corresponding target data is acquired in the memory according to the target address, and the processing of the data access request is completed.
Step S007, allocating the usable entry to the target cache line.
If the available items exist in the consistency catalogue, the cache line corresponding to the address of the data in the data access request, namely the target cache line, is directly allocated. And then continuing to execute step S006, and acquiring the data of the data access request from the memory according to the consistency state of the target cache line.
Step S008, according to the consistency status of the target cache line, acquiring the data of the data access request from caches of other processor cores.
Since the consistency state of the target cache line is already recorded in the consistency directory, it is stated that the data of the data access request is stored in the caches of the other processor cores, and the data can be obtained from the corresponding cache lines of the other processor cores directly according to the consistency state of the target cache line.
It can be seen that when it is determined that there is a usable entry in the coherence directory, the usable entry is directly allocated to the target cache line, so as to record and track the coherence state of the target cache line, mainly the coherence state of the target cache line of the corresponding target address.
The coherence directory needs to cover all caches as much as possible, but the excessive capacity of the coherence directory affects the area of the whole chip, so the size of the chip area occupied by the coherence directory needs to be considered to meet the use requirement of the coherence directory. The area of the coherence directory is limited, and the corresponding recording space is correspondingly limited, so that when a specific data access request is met, for example, more index conflicts (index-conflicts) exist, or the memory access is sparse, the space locality is poor, and the like, the problem of low cache hit rate of the data access request access can be caused, so that the data in the memory can be frequently accessed, and the coherence directory needs to have enough space to record the cache line coherence state corresponding to the data newly acquired from the memory. The coherence directory may frequently invalidate the cache lines recorded in the used entries in order to allow free usable entries to record the coherence state of the new cache line, resulting in a decrease in overall system performance.
In order to solve the above problems, a method for avoiding the use of the coherence directory is generally adopted, and this method causes the resource waste of the coherence directory that can be normally used, thereby reducing the overall performance of the system.
In order to solve the problems of the coherence directory in the use process, the embodiment of the application provides an access request processing method, so as to optimize the working process of the coherence directory and improve the overall performance of a system. Specifically, referring to fig. 3, fig. 3 is a flow chart of an access request processing method according to an embodiment of the present application.
Based on the foregoing description, it may be known that when the multi-core processor system processes a data access request, and when the required data is not found in a cache in the processor core, the processor core sends the data access request to the home agent, and then the home agent processes the data access request and returns a processing result, that is, a response packet, to the processor core.
As shown in fig. 3, the process of the home agent may include the steps of:
in step S100, the home agent detects the transmission frequency at which the return consistency is invalid.
And the return consistency invalidation is that after the home agent receives the data access request sent by the processor core, the home agent inquires a consistency catalog correspondingly connected, and when the consistency catalog is determined that no entry records the consistency state of the cache line corresponding to the data access request and no usable entry exists in the consistency catalog, the return consistency invalidation is sent to the processor core. That is, when the coherence directory has no free entry to record the coherence state of a new cache line, the home agent releases the occupied entry in the coherence directory and sends back the coherence invalidation to inform the cache of the processor core to reject the cache line corresponding to the occupied entry; when the new cache behavior obtains data from the memory to the cache, the address of the data corresponds to the cache line.
Based on the foregoing, it can be known that when the home agent frequently sends back the coherence invalidation, that is, when the cache line recorded in the used entry is frequently invalidated by the coherence directory, the home agent needs to wait for the processing completion signal of the processor core for a long time, thereby reducing the overall performance of the system.
Therefore, the technical scheme judges the working state of the consistency directory at the moment by detecting the sending frequency of the consistency invalidation returned by the home agent, thereby being capable of solving the problem of reducing the overall processing efficiency of the multi-core processor system caused by the fact that the consistency directory frequently invalidates cache lines in used items.
In order to be able to timely detect a transmission frequency for which the return consistency is invalid, in one embodiment, step S100 may include:
detecting the count value of a counter arranged in the home agent, wherein the counter counts according to the number of times that the home agent sends back consistency invalidation;
when the consistency directory has no free entry to record the consistency state of a new cache line, the home agent releases the occupied entry in the consistency directory, and returns consistency invalidation by sending to inform the cache of the processor core to reject the cache line corresponding to the occupied entry; when the new cache line obtains data from the memory to the cache, the address of the data corresponds to the cache line.
According to the method and the device for processing the cache line in the cache line, the counter can be arranged in the home agent, when the home agent is detected to feed back a signal for returning the invalid consistency to the processor core, the count value of the counter is increased by one, so that the use state of the current consistency directory can be judged according to the count value of the counter, whether frequent sending for returning the invalid consistency occurs or not, namely the situation that the cache line in the used item is frequently invalidated in the consistency directory occurs.
By setting a counter in the home agent, counting the sending frequency with invalid return consistency, accurately acquiring the sending frequency with invalid return consistency, and timely determining whether to execute the technical scheme.
When a data access request sent by a processor core is received, and a cache line corresponding to an address of data is not recorded in an entry of the consistency directory, and no idle entry in the consistency directory records a consistency state of a new cache line, a home agent releases an occupied entry in the consistency directory, and informs a cache of the processor core to reject the cache line corresponding to the occupied entry. The home agent may select from the coherence directory the occupied entry for release, e.g., the occupied entry with the longest lifetime in the coherence directory or the occupied entry with the lowest access hit in the coherence directory. The policy of selecting occupied entries from the coherence directory by the home agent for release may be set according to practical situations, and embodiments of the present application are not limited.
In step S101, the processor core sends a data access request to the home agent.
When the processor core needs to access the target data of the target address, the processor core can search the target data in the cache of the processor core, and if the target data does not hit in the cache of the processor core, the processor core can send a data access request to the home agent.
Step S102, when the home agent detects that the sending frequency of the return consistency invalidation is higher than a preset frequency threshold, aiming at a data access request sent by a processor core, acquiring target data according to a target address indicated by the data access request.
And the consistency state of the target cache line corresponding to the target address is recorded in a target entry of a consistency directory.
The preset frequency threshold can be obtained by carrying out performance analysis and research on the chip according to actual needs, or trying out design values of various frequency thresholds on a performance model, and running various benchmark test programs (benchmarks) to obtain an optimal value, namely the preset frequency threshold.
According to the detected sending frequency for returning the consistency invalidation, the use state of the current consistency directory can be judged, and whether the situation that the cache line in the used item is frequently invalidated by the consistency directory is determined, so that the behavior that the cache line is frequently invalidated by the consistency directory can be prevented in time.
Step S103, the home agent sets an identification carrying a temporary storage state in a response packet corresponding to the data access request.
The identifier is used for enabling the processor core to load the target data in a target cache line in a non-final cache, and setting a temporary storage state for the target cache line in the final cache, so that the target cache line is directly written back to the memory without passing through the final cache when the non-final cache rejects the target cache line.
In the method for processing the access request provided by the embodiment of the application, if the home agent detects that the sending frequency of returning the consistency invalidation is higher than the preset frequency threshold, as an improvement, the home agent sets an identifier carrying a temporary storage state in a response packet made based on the data access request besides obtaining the target data according to the target address indicated by the data access request, and after the processor core receives the response packet of the data access request and stores the target data in a non-final cache of the processor core, the state of a target cache line corresponding to the target data in the non-final cache can be further set to be the temporary storage state according to the identifier, so that the non-final cache can reject the target cache line to a memory without going through the non-final cache and directly return to the memory. Therefore, the processing time for the home agent to wait for the processor core to reject the target cache from the non-final-stage cache to the final-stage cache and then reject the target cache to the memory can be saved, and the processing efficiency of the multi-core processor system is improved. It should be noted that, the frequency threshold value higher than the preset frequency threshold value is greater than or equal to the preset frequency threshold value, so that the current use state of the consistency catalogue can be accurately obtained.
In order to enable non-last-level cache eviction of a target cache line to be written back to memory directly without going through the last-level cache, a corresponding target cache line in the last-level cache is also evicted, so that the system as a whole can operate normally.
According to the embodiment of the application, when the target cache line is eliminated from the cache to the memory, the target cache line of the non-final cache of the processor core loads the target data, and the final cache sets the temporary storage state of the target cache line, so that when the target cache line is eliminated from the non-final cache, the target cache line actually loaded with the target data can be directly eliminated from the non-final cache to the memory without passing through the final cache based on the temporary storage state of the target cache line set by the final cache. Thus, the processing time of the home agent waiting for the processor core to reject the target cache line from the cache to the memory can be saved.
In one embodiment, step S102 may include:
And if the count value of the counter exceeds a count threshold value within a preset time, determining that the sending frequency of returning consistency invalidation is higher than a preset frequency threshold value.
In order to ensure that the obtained count value of the sending frequency can accurately represent the situation that the current consistency directory has frequently invalidated used items, the size of the count value needs to be designed, and the time range for measuring that the consistency directory frequently releases the used items is set, so that the embodiment of the application can set the time range of preset time. That is, data access requests of the processor core frequently miss in the cache and the coherence directory is frequently in the case of no usable entry within a predetermined time, so the coherence directory needs to frequently release the used entry in order to free the usable entry to record the coherence state of the cache line corresponding to the data read from the memory. Based on the above, the embodiment of the application can start the scheme of the access request processing method provided by the embodiment of the application when the count value of the counter exceeds the count threshold value within the preset time, so as to relieve the situation that the consistency catalog is frequently in a non-usable item, and further optimize the use performance of the consistency catalog.
As an alternative implementation, in order to be able to adapt to the actual performance of the various chips, in one embodiment, the predetermined time comprises a predetermined number of clock cycles.
By doing an analytical study on the product performance target, a predetermined time, such as n clock cycles, is obtained that is appropriate for the product performance target. For example, the number of clock cycles may be 3 or 4, and the number of clock cycles is not particularly limited, so long as the actual use requirement is satisfied.
As an optional implementation, the optional manner of obtaining the target data by the home agent according to the target address indicated by the data access request may include: when the consistency state of the target cache line corresponding to the target address is recorded by a target entry in the consistency directory, acquiring target data from other processor cores according to the position of the target address contained in the target cache line recorded by the target entry; when the target address contained in the target cache line is not recorded in the consistency directory, the target data is acquired in the memory, and the consistency state of the target cache line is recorded in the usable entry of the consistency directory, wherein the usable entry is the target entry.
Optionally, reference may be made to fig. 4, and fig. 4 is another flow chart of the access request processing method provided in the embodiment of the present application.
As shown, the process may include the steps of:
step S200, detecting a transmission frequency for which the return consistency is invalid.
The content of step S200 may refer to the content of step S100, and will not be described herein.
Step S201, determining whether the transmission frequency is higher than a preset frequency threshold, if yes, executing step S202, and if no, executing step S205.
When the transmission frequency is determined to be higher than the preset frequency threshold, it is indicated that the consistency directory may frequently invalidate the used entry, affecting the performance of the multi-core processor system, and step S202 is performed.
Step S202, judging whether the consistency state of the target cache line is recorded in the target entry in the consistency directory, if so, executing step S203, and if not, executing step S204.
For a target address indicated by a data access request issued by a processor core, determining whether a corresponding target entry record exists in the coherence directory.
Step S203, determining the cache of the processor core for caching the target cache line according to the consistency state of the target entry record; and acquiring target data of the target cache line from the determined cache of the processor core.
Step S204, obtaining the target data of the target address from the memory.
If the preset frequency threshold is not reached, the data access request is processed according to the normal flow shown in fig. 2, i.e. step S205.
Step S205, process the data access request received by the current home agent.
The counter is arranged in the home agent, the counter is used for counting the number of times of returned consistency invalidation amount fed back by the home agent, when the count value is higher than a preset frequency threshold value, the home agent is indicated to frequently send and return invalidation, and then cache lines of used items in the consistency catalogue are frequently invalidated, so that the processing performance of the multi-core processor system is affected.
When the count value does not exceed the preset frequency threshold value, it indicates that the home agent does not frequently invalidate the cache line of the used item in the coherence directory, and at this time, the data access request received by the current home agent can be processed according to a normal data request processing flow, which is shown in a processing flow diagram in fig. 2. Therefore, on the basis of not affecting the processing flow of the normal data access request of the multi-core processor system, the situation that the cache lines in the used items are frequently invalidated by the frequent feedback return consistency invalidation of the home agent is prevented, the use process of the consistency directory is optimized, and the processing efficiency of the multi-core processor system is improved.
When it is determined that there is no target entry in the coherence directory, where the coherence state of the target cache line is recorded, the processing of the data access request is completed according to a normal processing procedure of the coherence directory, and optionally, referring to fig. 5, fig. 5 is a further flowchart of an access request processing method provided in an embodiment of the present application.
As shown, the process flow of step S205 may include the steps of:
step S300, determining that the target entry in the consistency directory does not record the consistency state of the target cache line corresponding to the target address.
Step S301, judging whether available items exist in the consistency catalogue, if yes, executing step S302, and if not, executing step S304.
Since the coherence directory does not have a target entry of the target cache line recorded therein, that is, no caches of the target cache line are stored in other processor cores, it is necessary to obtain data of a target address corresponding to the target cache line from the memory.
To achieve a coherency state of the cache lines, the multi-core processor system is enabled to satisfy a coherency protocol, and thus to function properly. Thus, after the data is obtained from the memory, the coherency state of the target cache line is required to be recorded in the entry of the coherency directory, and based on the foregoing, it can be known that the space of the coherency directory is limited, and other data access requests have been processed before the current data access request is processed, so that when the current data access request is processed, and the coherency state of the target cache line is recorded in the entry of the coherency directory, it can be determined whether there is an empty usable entry in the coherency directory, thereby meeting the requirement of recording the coherency state of the target cache line.
Step S302, allocating the available entry for the target cache line, and using the allocated available entry as a target entry to record the coherency state of the target cache line.
When it is determined that the coherence directory can have free usable entries, the usable entries are used as target entries, and the target entries are directly allocated to the target cache lines to record the coherence state of the target cache lines.
Step S303, obtaining the target data of the target address from the memory.
And acquiring target data of a target address corresponding to the target cache line from the memory according to the consistency state of the target cache line recorded in the target entry.
Step S304, releasing the occupied items in the consistency directory to obtain idle target items, sending the return consistency invalidation to the processor core, and recording the consistency state of the target cache line by using the idle target items.
When no usable entry exists in the consistency directory, namely the space of the consistency directory is full, the entries are occupied, at the moment, the consistency directory releases one of the occupied entries, and based on a consistency protocol, the cache line released in the consistency directory is also required to be released in the cache of the processor core, and is invalidated or rejected in the corresponding cache to the memory. The occupied entry may be a cache line that is recorded in the entry and is not frequently accessed by the processor core, so that after the processor core invalidates or eliminates the corresponding cache line, other data access requests of the processor core will not be excessively affected.
And the execution of the data access request is completed according to the normal processing flow of the consistency directory, so that the normal operation of the multi-core processor system is not influenced, the improper use of the consistency directory is not caused, and the processing performance of the multi-core processor system is improved.
With continued reference to fig. 3, as shown, the process may include the steps of:
in step S104, the home agent feeds back the response packet to the processor core.
The home agent responds to the data access request sent by the currently received processor core, and feeds back the obtained target data of the data access request to the processor core in a response packet mode, so that after the processor core obtains the response packet, the target data of the target cache line can be cached to the non-final cache based on the identifier of the temporary storage state carried in the response packet, and the target cache line in the final cache is set to be in the temporary storage state, therefore, when the processor core rejects the target cache line to the memory, the target cache line can be directly rejected from the non-final cache to the memory without going through the final cache, the processing time of the processor core for rejecting the target cache line is saved, and the efficiency of the consistency directory use item for recording the consistency state of the target cache line is improved.
In step S105, the processor core loads the target data to the target cache line of the non-last-stage cache based on the identifier carrying the temporary storage state set in the response packet, and stores the temporary storage state of the target cache line in the last-stage cache.
It is easy to understand that when the sending frequency of the home agent, which is invalid in the return consistency, is higher than a preset frequency threshold, the home agent correspondingly sets the identifier in a response packet for responding to the data access request; the temporary storage state is used for indicating that when the target cache line is removed from the cache of the processor core, the target cache line is directly written back to the memory without passing through the final stage cache.
By setting the identifier carrying the temporary storage state in the response packet, the time for the processor core to reject the target cache line from the cache to the memory is saved, correspondingly, the processing time for the home agent to wait for the processor core can be saved, and finally the processing efficiency of the multi-core processor system is improved.
In step S106, the processor core sends a write-back request of the target cache line to the home agent.
The write-back request is used for eliminating the target cache line to the memory and indicating the home agent to release the target entry corresponding to the target cache line in the consistency directory.
The write-back request sent to the home agent by the processor core enables the home agent to release the target entry corresponding to the target cache line in the consistency directory, so that the home agent can complete data acquisition of the data access request based on the received data access request, release of the target entry occupied by the target cache line in the consistency directory can be realized, and the effect of pre-releasing the target entry before the next data access request is received is achieved.
In step S107, the home agent obtains the write-back request of the target cache line, and releases the target entry corresponding to the target cache line in the coherence directory.
Because the home agent is a response packet returned after responding to the data access request, after the home agent acquires the target data from the memory, the consistency state of the target cache line corresponding to the target address is recorded by using the entry in the consistency directory, and the used entry is the target entry. Therefore, the home agent can release the target entry corresponding to the target cache line according to the indication of the write-back request.
It can be seen that, in the access request processing method provided by the embodiment of the present application, by detecting the sending frequency of the return consistency invalidation fed back by the consistency directory, when the sending frequency reaches the frequency threshold, it may be determined that the consistency directory has frequently sent out the state of the return consistency invalidation, so as to avoid the situation of low processing efficiency of the multi-core processor system caused by the return consistency invalidation, in the response packet of the data access request of the home agent to the processor core, an identifier carrying a temporary storage state is set, so that based on the identifier, the processor core loads the target data in a target cache line in a non-final cache; and setting a temporary storage state for the target cache line in the final-stage cache, so that the target cache line is directly written back to the memory without passing through the final-stage cache when the target cache line is removed from the non-final-stage cache. Therefore, the time for the processor core to reject the corresponding target cache line can be saved, and the home agent does not need to wait for the processing time for the processor core to reject the target cache line from the cache to the memory for a long time; meanwhile, the processor core sends the write-back request of the target cache line to the home agent according to the identifier of the temporary storage state, and then the write-back request is directly written back to the memory, so that the home agent can inquire the target item recorded with the target cache line in the consistency catalog, the home agent can release the target cache line in the target item according to the received write-back request, namely, write the target cache line back to the memory, thereby achieving the purpose of pre-releasing the used item in the consistency catalog, achieving the purpose of improving the use performance of the consistency catalog, and further improving the performance of the multi-core processor system.
The embodiment of the application also provides a multi-core processor system, which comprises:
a home agent for detecting a transmission frequency at which the return consistency is invalid; when the sending frequency of the return consistency invalidation is higher than a preset frequency threshold, aiming at a data access request sent by a processor core, acquiring target data according to a target address indicated by the data access request; the consistency state of the target cache line corresponding to the target address is recorded in a target entry of a consistency directory; setting an identification carrying a temporary storage state in a response packet corresponding to the data access request, and feeding back the response packet to the processor core; acquiring a write-back request of the target cache line, and releasing target items in the consistency directory;
a processor core for sending a data access request to a home agent; acquiring target data and response packets corresponding to the data access request; if the response packet is provided with an identifier carrying a temporary storage state, loading target data to a target cache line of a non-final cache, and storing the temporary storage state of the target cache line in the final cache; the temporary storage state is used for indicating that when the target cache line is removed from the cache of the processor core, the target cache line is directly written back to the memory without passing through the final cache; and sending a write-back request of the target cache line when the target cache line is removed from the cache.
It can be seen that, in the multi-core processor system provided by the embodiment of the present application, when the home agent detects that the sending frequency of the return consistency invalidation reaches the preset frequency threshold, the response is based on the received data access request sent by the processor core, and the identifier carrying the temporary storage state is set in the response packet returned to the processor core, so that when the processor core receives the returned response packet, the target cache line in the final-stage cache is set to the temporary storage state based on the identifier in the response packet, and thus, when the non-final-stage cache of the processor core eliminates the target cache line to the memory, no final-stage cache is required, and then the home agent receives the write-back request sent by the processor core and writes the target cache line back to the memory, in response to the write-back request, thereby releasing the target cache line recorded in the target entry of the consistency directory, and when the home agent receives the next data access request, the available target entry in the consistency directory is released, and the target cache line does not need to be sent to the processor core again, and the return consistency invalidation the basis of the identifier in the response packet is not required to send the return consistency invalidation the non-final-stage cache line, and the result can be ready to be accessed to the target cache line.
In some embodiments, the home agent for detecting a frequency of transmission returning a consistency invalidation includes:
detecting the count value of a counter arranged in the home agent, wherein the counter counts according to the number of times that the home agent sends back consistency invalidation;
when the consistency directory has no free entry to record the consistency state of a new cache line, the home agent releases the occupied entry in the consistency directory, and returns consistency invalidation by sending to inform the cache of the processor core to reject the cache line corresponding to the occupied entry; when the new cache line obtains data from the memory to the cache, the address of the data corresponds to the cache line.
In some embodiments, the home agent, configured to detect that the sending frequency of the return consistency invalidation is higher than a preset frequency threshold, includes:
and if the count value of the counter exceeds a count threshold value within a preset time, determining that the sending frequency of returning consistency invalidation is higher than a preset frequency threshold value.
In some embodiments, the predetermined time includes a predetermined number of clock cycles.
In some embodiments, the home agent is configured to set, in a response packet corresponding to the data access request, an identifier carrying a temporary storage state, where the method includes: and setting the identification by utilizing an original reserved bit in the response packet.
In some embodiments, the scratch state is used to indicate that the target cache line is not written back to memory directly through the last level cache when it is evicted from the processor core's cache.
In some embodiments, the processor core is configured to load target data into a target cache line of a non-last-level cache based on an identifier carrying a temporary storage state being set in a response packet, where the identifier is set by using an original reserved bit in the response packet.
In some embodiments, the processor core is configured to send a data access request to a home agent, including:
and when the fact that the target data of the target address does not exist in the cache of the processor core is determined, sending a data access request corresponding to the target address to the home agent.
The embodiment of the application also provides a chip, which comprises the multi-core processor system.
The embodiment of the application also provides electronic equipment comprising the chip.
Although the embodiments of the present application are disclosed above, the present application is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention shall be defined by the appended claims.

Claims (21)

1. An access request processing method, applied to a home agent, comprising:
detecting a transmission frequency returning that consistency is invalid;
when the sending frequency of the return consistency invalidation is higher than a preset frequency threshold, aiming at a data access request sent by a processor core, acquiring target data according to a target address indicated by the data access request; the consistency state of the target cache line corresponding to the target address is recorded in a target entry of a consistency directory;
setting an identification carrying a temporary storage state in a response packet corresponding to the data access request, and feeding back the response packet to the processor core; the identifier is used for enabling the processor core to load the target data in a target cache line in a non-final cache, and setting a temporary storage state for the target cache line when the final cache is used for eliminating the target cache line by the non-final cache, so that the target cache line is directly written back to the memory without passing through the final cache;
and acquiring a write-back request of the target cache line, and releasing target entries in the consistency directory.
2. The access request processing method according to claim 1, wherein the detecting of the transmission frequency of the return consistency invalidation includes:
Detecting the count value of a counter arranged in the home agent, wherein the counter counts according to the number of times that the home agent sends back consistency invalidation;
when the consistency directory has no free entry to record the consistency state of a new cache line, the home agent releases the occupied entry in the consistency directory, and returns consistency invalidation by sending to inform the cache of the processor core to reject the cache line corresponding to the occupied entry; when the new cache line obtains data from the memory to the cache, the address of the data corresponds to the cache line.
3. The access request processing method according to claim 2, wherein the detecting that the transmission frequency of the return consistency invalidation is higher than a preset frequency threshold value includes:
and if the count value of the counter exceeds a count threshold value within a preset time, determining that the sending frequency of returning consistency invalidation is higher than a preset frequency threshold value.
4. A method of processing an access request as claimed in claim 3, wherein the predetermined time comprises a predetermined number of clock cycles.
5. The method for processing an access request according to claim 1, wherein the setting an identifier carrying a temporary storage state in a response packet corresponding to the data access request includes: and setting the identification by utilizing an original reserved bit in the response packet.
6. The access request processing method of claim 1, wherein the scratch pad state is used to indicate that the target cache line is not written back to memory directly through the last level cache when it is evicted from the cache of the processor core.
7. The access request processing method according to claim 1, wherein the obtaining the target data according to the target address indicated by the data access request includes:
if the target entry in the consistency directory is recorded, determining the cache of the processor core for caching the target cache line according to the consistency state of the target cache line corresponding to the target address; acquiring target data of the target cache line from the determined cache of the processor core;
and if no entry for recording the consistency state of the target cache line exists in the consistency directory, acquiring target data of the target address from a memory.
8. The access request processing method according to claim 7, further comprising:
if no entry for recording the consistency state of the target cache line exists in the consistency directory, judging whether a usable entry exists in the consistency directory;
When determining that a usable entry exists in the consistency directory, allocating the usable entry for the target cache line, and taking the allocated usable entry as a target entry to record the consistency state of the target cache line;
when determining that no usable item exists in the consistency directory, releasing the occupied item in the consistency directory to obtain an idle target item; and simultaneously sending a return consistency invalidation to the processor core, and recording the consistency state of the target cache line by utilizing the idle target entry.
9. An access request processing method, applied to a processor core, comprising:
sending a data access request to a home agent;
acquiring target data and response packets corresponding to the data access request;
if the response packet is provided with an identifier carrying a temporary storage state, loading target data to a target cache line of a non-final cache, and storing the temporary storage state of the target cache line in the final cache; when the sending frequency of the return consistency invalidation of the home agent is higher than a preset frequency threshold, the home agent sets the identification in a response packet responding to the data access request; the temporary storage state is used for indicating that when the target cache line is removed from the cache of the processor core, the target cache line is directly written back to the memory without passing through the final cache;
And when the target cache line is rejected from the cache, sending a write-back request of the target cache line, wherein the write-back request is used for rejecting the target cache line to a memory and indicating a home agent to release a target entry corresponding to the target cache line in a consistency directory.
10. The access request processing method according to claim 9, wherein the flag is set using an original reserved bit in the response packet.
11. The access request processing method as claimed in claim 9, wherein the transmitting the data access request to the home agent includes:
and when the fact that the target data of the target address does not exist in the cache of the processor core is determined, sending a data access request corresponding to the target address to the home agent.
12. A multi-core processor system, comprising:
a home agent for detecting a transmission frequency at which the return consistency is invalid; when the sending frequency of the return consistency invalidation is higher than a preset frequency threshold, aiming at a data access request sent by a processor core, acquiring target data according to a target address indicated by the data access request; the consistency state of the target cache line corresponding to the target address is recorded in a target entry of a consistency directory; setting an identification carrying a temporary storage state in a response packet corresponding to the data access request, and feeding back the response packet to the processor core; acquiring a write-back request of the target cache line, and releasing target items in the consistency directory;
A processor core for sending a data access request to a home agent; acquiring target data and response packets corresponding to the data access request; if the response packet is provided with an identifier carrying a temporary storage state, loading target data to a target cache line of a non-final cache, and storing the temporary storage state of the target cache line in the final cache; the temporary storage state is used for indicating that when the target cache line is removed from the cache of the processor core, the target cache line is directly written back to the memory without passing through the final cache; and sending a write-back request of the target cache line when the target cache line is removed from the cache.
13. The multi-core processor system of claim 12, wherein the home agent to detect a frequency of transmissions returning a consistency invalidation comprises:
detecting the count value of a counter arranged in the home agent, wherein the counter counts according to the number of times that the home agent sends back consistency invalidation;
when the consistency directory has no free entry to record the consistency state of a new cache line, the home agent releases the occupied entry in the consistency directory, and returns consistency invalidation by sending to inform the cache of the processor core to reject the cache line corresponding to the occupied entry; when the new cache line obtains data from the memory to the cache, the address of the data corresponds to the cache line.
14. The multi-core processor system of claim 13, wherein the home agent to detect that the frequency of transmission of the return consistency invalidation is above a preset frequency threshold comprises:
and if the count value of the count value exceeds the count threshold value within the preset time, determining that the sending frequency of returning the invalid consistency is higher than a preset frequency threshold value.
15. The multi-core processor system of claim 14, wherein the predetermined time comprises a predetermined number of clock cycles.
16. The multi-core processor system of claim 12, wherein the home agent to set an identifier carrying a temporary storage state in a response packet corresponding to the data access request comprises: and setting the identification by utilizing an original reserved bit in the response packet.
17. The multi-core processor system of claim 12, wherein the scratch pad state is to indicate that the target cache line is to be written back to memory directly without going through a last level cache when it is evicted from the cache of the processor core.
18. The multi-core processor system of claim 12, wherein the processor core is to load target data into a target cache line of a non-last level cache based on an identification carrying a scratch pad state being provided in a reply packet, the identification being set using an original reserved bit in the reply packet.
19. The multi-core processor system of claim 12, wherein the processor core to send a data access request to a home agent comprises:
and when the fact that the target data of the target address does not exist in the cache of the processor core is determined, sending a data access request corresponding to the target address to the home agent.
20. A chip comprising the multi-core processor system of any of claims 12-19.
21. An electronic device comprising the chip of claim 20.
CN202211593341.5A 2022-12-13 2022-12-13 Access request processing method, multi-core processor system, chip and electronic device Pending CN116126517A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211593341.5A CN116126517A (en) 2022-12-13 2022-12-13 Access request processing method, multi-core processor system, chip and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211593341.5A CN116126517A (en) 2022-12-13 2022-12-13 Access request processing method, multi-core processor system, chip and electronic device

Publications (1)

Publication Number Publication Date
CN116126517A true CN116126517A (en) 2023-05-16

Family

ID=86309059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211593341.5A Pending CN116126517A (en) 2022-12-13 2022-12-13 Access request processing method, multi-core processor system, chip and electronic device

Country Status (1)

Country Link
CN (1) CN116126517A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116962259A (en) * 2023-09-21 2023-10-27 中电科申泰信息科技有限公司 Consistency processing method and system based on monitoring-directory two-layer protocol

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116962259A (en) * 2023-09-21 2023-10-27 中电科申泰信息科技有限公司 Consistency processing method and system based on monitoring-directory two-layer protocol
CN116962259B (en) * 2023-09-21 2024-02-13 中电科申泰信息科技有限公司 Consistency processing method and system based on monitoring-directory two-layer protocol

Similar Documents

Publication Publication Date Title
US8606997B2 (en) Cache hierarchy with bounds on levels accessed
US6460114B1 (en) Storing a flushed cache line in a memory buffer of a controller
US6295582B1 (en) System and method for managing data in an asynchronous I/O cache memory to maintain a predetermined amount of storage space that is readily available
US5317738A (en) Process affinity scheduling method and apparatus
US8171223B2 (en) Method and system to increase concurrency and control replication in a multi-core cache hierarchy
US8015365B2 (en) Reducing back invalidation transactions from a snoop filter
TWI647567B (en) Method for locating hot and cold access zone using memory address
US6314491B1 (en) Peer-to-peer cache moves in a multiprocessor data processing system
US20060143384A1 (en) System and method for non-uniform cache in a multi-core processor
US20080301371A1 (en) Memory Cache Control Arrangement and a Method of Performing a Coherency Operation Therefor
JP5063104B2 (en) Loss of ownership of cache entry over time of entry
US20020138699A1 (en) Cache memory device
US7380068B2 (en) System and method for contention-based cache performance optimization
US6560681B1 (en) Split sparse directory for a distributed shared memory multiprocessor system
US6748496B1 (en) Method and apparatus for providing cacheable data to a peripheral device
US8661208B2 (en) Non-inclusive cache systems and methods
CN116126517A (en) Access request processing method, multi-core processor system, chip and electronic device
KR101689094B1 (en) System cache with sticky removal engine
US20080301324A1 (en) Processor device and instruction processing method
CN109213425B (en) Processing atomic commands in solid state storage devices using distributed caching
KR102069696B1 (en) Appartus and method for controlling a cache
US6826655B2 (en) Apparatus for imprecisely tracking cache line inclusivity of a higher level cache
CN113742115A (en) Method for processing page fault by processor
JP2000267935A (en) Cache memory device
US20240134796A1 (en) Persistent storage with dual interface

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination