CN115587052A - Processing method of cache performance and related equipment thereof - Google Patents

Processing method of cache performance and related equipment thereof Download PDF

Info

Publication number
CN115587052A
CN115587052A CN202211228384.3A CN202211228384A CN115587052A CN 115587052 A CN115587052 A CN 115587052A CN 202211228384 A CN202211228384 A CN 202211228384A CN 115587052 A CN115587052 A CN 115587052A
Authority
CN
China
Prior art keywords
address information
target
prefetch
fetching
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211228384.3A
Other languages
Chinese (zh)
Inventor
李柯
梁剑
齐良颉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nationz Technologies Inc
Original Assignee
Nationz Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nationz Technologies Inc filed Critical Nationz Technologies Inc
Priority to CN202211228384.3A priority Critical patent/CN115587052A/en
Publication of CN115587052A publication Critical patent/CN115587052A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch

Abstract

The embodiment of the application provides a processing method of cache performance, which is used for reducing the waiting time of instruction fetching. The method of the embodiment of the application is applied to a prefetch circuit and comprises the following steps: receiving a target instruction fetching request, wherein the target instruction fetching request carries target address information; judging whether the target address information is matched with the locally stored prefetch address information; if yes, sending target prefetch data corresponding to the prefetch address information to a central processing unit; if not, sending a target access request to a buffer memory so that the buffer memory obtains data to be prefetched corresponding to the target address information from a main memory according to the target access request, and uploading the data to be prefetched to the central processing unit.

Description

Processing method of cache performance and related equipment thereof
Technical Field
The embodiment of the application relates to the technical field of computer storage, in particular to a method for processing cache performance and related equipment thereof.
Background
In the hierarchical structure of a computer storage system, the difference between the access speed of the storage system and the operation speed of a processor is more and more obvious, the access performance becomes the bottleneck of the system performance, and at present, a cache is generally adopted by the processor as an effective method for improving the performance.
cache is a high-speed, small-capacity memory interposed between a Central Processing Unit (CPU) and a main memory. cached in the cache is the data in main memory. Data is stored in a cache in a way of dividing the cache into groups, and the cache structure is called a cache connected by multiple (way) groups. The scheduling and transfer of information between the cache and main memory is automated by hardware. At present, the traditional cache only reads the instruction from the main memory and replaces the instruction cache line of the current cache when miss (hit failure) occurs in the access. It is clear that the latency of the CPU to read an instruction (fetch) stored in the main memory is greatly increased if the miss occurs multiple times.
However, the cache is limited by the small capacity, the phenomenon of miss access is inevitable, and in the case of slow access, the main memory tends to increase the waiting time of CPU instruction fetching, thereby reducing the instruction execution efficiency. Moreover, in the multi-way set-connected cache controller scheme, the fetch instruction will preferentially query the way in which the last hit cache memory is located, and if the query fails, the way is adjusted to re-initiate the access, which also increases the fetch time.
Disclosure of Invention
The embodiment of the application provides a processing method of cache performance, which is used for reducing the waiting time of instruction fetching.
A first aspect of the embodiments of the present application provides a method for processing cache performance, which is applied to a prefetch circuit, and includes:
receiving a target instruction fetching request, wherein the target instruction fetching request carries target address information;
judging whether the target address information is matched with the locally stored prefetch address information;
if yes, sending target prefetch data corresponding to the prefetch address information to a central processing unit;
if not, sending a target access request to a buffer memory so that the buffer memory obtains data to be prefetched corresponding to the target address information from a main memory according to the target access request, and uploading the data to be prefetched to the central processing unit.
Optionally, before receiving the target instruction fetch request, the method further includes:
receiving an initial instruction fetching request, wherein the initial instruction fetching request carries initial address information corresponding to the initial instruction fetching request;
if the initial address information contains two adjacent address information, determining the two adjacent address information as the prefetch address information;
and sending a target prefetching request to the main memory according to the prefetching address information to acquire the target prefetching data corresponding to the prefetching address information.
Optionally, after receiving the initial instruction fetch request, the method further includes:
if nonadjacent address information exists in the initial address information, determining that every two nonadjacent address information are address information to be backfilled;
and sending the address information to be backfilled to the cache memory, so that the cache memory acquires the data to be prefetched corresponding to the address information to be backfilled from the main memory according to the address information to be backfilled.
Optionally, if the prefetch address information includes first address information and second address information adjacent to each other, the target prefetch data includes first prefetch data and second prefetch data, and sending the target prefetch data corresponding to the prefetch address information to the central processing unit includes:
and if the address information in the target address information is matched with the prefetching address information, sequentially sending the first prefetching data corresponding to the first address information and the second prefetching data corresponding to the second address information to the central processing unit.
Optionally, the prefetch circuitry comprises a prefetch cache; wherein, the prefetch buffer includes at least a first prefetch buffer and a second prefetch buffer, and the sending the target prefetch data corresponding to the prefetch address information to the central processing unit includes:
and if the first pre-fetching buffer and the second pre-fetching buffer locally store the pre-fetching address information, and any two pieces of address information in the pre-fetching address information are adjacent in pairs, sending target pre-fetching data corresponding to the adjacent address information in the pre-fetching address information to the central processing unit.
Optionally, the determining whether the target address information matches locally stored prefetch address information includes:
judging whether any address information in the target address information is matched with the prefetch address information stored in the first prefetch buffer;
if yes, and address information to be prefetched adjacent to any address information in the target address information is not matched with the prefetching address information stored in the second prefetching buffer, the target prefetching data corresponding to the address information to be prefetched is obtained from the main memory according to the address information to be prefetched.
Optionally, the prefetch buffer further comprises a read buffer; the judging whether the target address information is matched with the locally stored prefetch address information comprises:
judging whether the target address information is matched with the prefetch address information stored in the reading buffer to obtain a first matching result;
if the first matching result is negative, the reading buffer acquires initial pre-fetching data corresponding to the target address information from the main memory according to the target address information, and sends the initial pre-fetching data to the first pre-fetching buffer and the second pre-fetching buffer.
Optionally, after determining whether the target address information matches the prefetch address information stored in the prefetch buffer, the method further includes:
judging whether the target address information is matched with the pre-fetching address information stored in the first pre-fetching buffer and the second pre-fetching buffer or not to obtain a second matching result;
if the first matching result and the second matching result are both negative, executing the step of sending a target access request to a buffer memory, deleting the target pre-fetching data stored in the first pre-fetching buffer and the second pre-fetching buffer, and acquiring the initial pre-fetching data corresponding to the target address information from the main memory according to the target address information;
determining the initial prefetch data as the target prefetch data.
Optionally, after determining whether the target address information matches with the prefetch address information stored in the first prefetch buffer and the second prefetch buffer, the method further includes:
if the first matching result is negative and the second matching result is positive, judging whether to execute the step of sending the target instruction fetching request to the main memory;
if the step of sending the target instruction fetching request to the main memory is executed, acquiring the initial pre-fetching data corresponding to the target address information from the main memory according to the target address information, and determining the initial pre-fetching data as the target pre-fetching data;
and if the step of sending the target fetch request to the main memory is not executed, sending target prefetch data corresponding to the prefetch address information to a central processing unit.
A second aspect of the present application provides a method for processing cache performance, which is applied to a cache memory, and includes:
receiving a target access request sent by a pre-fetching circuit, wherein the target access request carries target address information;
judging whether target label information in the target address information is matched with all label information to be prefetched stored locally in the buffer memory;
if the target tag information is matched with any tag information to be prefetched, uploading the data to be prefetched corresponding to the tag information to be prefetched to a central processing unit;
and if the target tag information is not matched with any tag information to be prefetched, sending a target backfill request to a main memory according to the target access request to acquire data to be prefetched corresponding to the target address information, and uploading the data to be prefetched to the central processing unit.
A third aspect of the embodiments of the present application provides a cache performance processing system, which is applied to a prefetch circuit, and includes:
the device comprises a receiving unit, a processing unit and a sending unit, wherein the receiving unit is used for receiving a target instruction fetching request which carries target address information;
a judging unit configured to judge whether the target address information matches prefetch address information stored locally;
a sending unit, configured to send target prefetch data corresponding to the prefetch address information to a central processing unit when the target address information matches locally-stored prefetch address information;
the sending unit is further configured to send a target access request to a buffer memory when the target address information is not matched with the locally stored prefetch address information, so that the buffer memory obtains data to be prefetched corresponding to the target address information from a main memory according to the target access request, and uploads the data to be prefetched to the central processing unit.
Optionally, the system for processing cache performance further includes: a determination unit;
the receiving unit is further configured to receive an initial instruction fetch request, where the initial instruction fetch request carries initial address information corresponding to the initial instruction fetch request;
the determining unit is configured to determine, when two adjacent pieces of address information exist in the initial address information, that the two adjacent pieces of address information are the prefetch address information;
the sending unit is further configured to send a target prefetch request to the main memory according to the prefetch address information, so as to obtain the target prefetch data corresponding to the prefetch address information.
Optionally, the cache performance processing system includes:
the determining unit is further configured to determine, when non-adjacent address information exists in the initial address information, that two pairs of non-adjacent address information are address information to be refilled;
the sending unit is further configured to send the address information to be refilled to the cache memory, so that the cache memory obtains the data to be prefetched corresponding to the address information to be refilled from the main memory according to the address information to be refilled.
Optionally, if the prefetch address information includes first address information and second address information adjacent to each other, the target prefetch data includes the first prefetch data and the second prefetch data, and the processing system with cache performance includes:
the sending unit is specifically configured to send the first prefetch data corresponding to the first address information and the second prefetch data corresponding to the second address information to the central processing unit in sequence when address information in the target address information matches prefetch address information.
Optionally, the prefetch circuitry comprises a prefetch cache; wherein the prefetch cache region at least comprises a first prefetch cache and a second prefetch cache, and the cache performance processing system further comprises:
the sending unit is specifically configured to send target prefetch data corresponding to two adjacent address information in the prefetch address information to the central processing unit when the first prefetch buffer and the second prefetch buffer locally store the prefetch address information and any two address information in the prefetch address information are adjacent to each other.
Optionally, the system for processing cache performance further includes: an acquisition unit;
the judging unit is specifically configured to judge whether any address information in the target address information matches the prefetch address information stored in the first prefetch buffer;
the obtaining unit is configured to obtain the target prefetch data corresponding to the address information to be prefetched from the main memory according to the address information to be prefetched when any address information in the target address information matches with the prefetch address information stored in the first prefetch buffer, and address information to be prefetched in the target address information adjacent to any address information does not match with the prefetch address information stored in the second prefetch buffer.
Optionally, the prefetch circuitry further comprises a read buffer; the system for processing the cache performance further comprises:
the judging unit is specifically configured to judge whether the target address information matches with the prefetch address information stored in the read buffer to obtain a first matching result;
the obtaining unit is specifically configured to, when the first matching result is negative, the prefetch buffer obtains initial prefetch data corresponding to the target address information from the main memory according to the target address information, and sends the initial prefetch data to the first prefetch buffer and the second prefetch buffer.
Optionally, the system for processing cache performance further includes: an execution unit;
the judging unit is further configured to judge whether the target address information matches with the prefetch address information stored in the first prefetch buffer and the second prefetch buffer, so as to obtain a second matching result;
the execution unit is configured to, when both the first matching result and the second matching result are negative, execute the step of sending the target access request to the buffer memory, delete the target prefetch data stored in the first prefetch buffer and the second prefetch buffer, and obtain the initial prefetch data corresponding to the target address information from the main memory according to the target address information;
the determining unit is further configured to determine the initial prefetch data as the target prefetch data.
Optionally, the system for processing cache performance further includes:
the judging unit is further configured to judge whether to execute the step of sending the target instruction fetch request to the main memory when the first matching result is no and the second matching result is yes;
the obtaining unit is further configured to, when the step of sending the target fetch request to the main memory is executed, obtain the initial prefetch data corresponding to the target address information from the main memory according to the target address information, and determine the initial prefetch data as the target prefetch data;
the execution unit is further configured to execute the step of sending the target prefetch data corresponding to the prefetch address information to a central processing unit when the step of sending the target fetch request to the main memory is not executed.
The third aspect of the embodiments of the present application provides a processing method for performing the cache performance described in the first aspect.
A fourth aspect of the present invention provides a system for processing cache performance, which is applied to a cache memory, and includes:
the device comprises a receiving unit, a pre-fetching circuit and a sending unit, wherein the receiving unit is used for receiving a target access request sent by the pre-fetching circuit, and the target access request carries target address information;
the judging unit is used for judging whether target label information in the target address information is matched with all label information to be prefetched stored locally in the buffer memory;
the uploading unit is used for uploading the data to be prefetched corresponding to the tag information to be prefetched to a central processing unit when the target tag information is matched with any one of the tag information to be prefetched;
and the sending unit is used for sending a target backfill request to a main memory according to the target access request when the target tag information is not matched with any tag information to be prefetched, so as to obtain data to be prefetched corresponding to the target address information, and uploading the data to be prefetched to the central processing unit.
The fourth aspect of the present embodiment provides a processing method for performing the cache performance described in the second aspect.
A fifth aspect of the present application provides a device for processing cache performance, including:
the system comprises a central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;
the memory is a transient memory or a persistent memory;
the central processor is configured to communicate with the memory and execute the instructions in the memory to perform the method for processing the cache performance of the first aspect or the second aspect.
A sixth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium includes instructions, which, when executed on a computer, cause the computer to perform the processing method for cache performance according to the first aspect or the second aspect.
According to the technical scheme, the embodiment of the application has the following advantages:
according to the cache performance processing method provided by the embodiment of the application, a target instruction fetching request is received, whether target address information is matched with the locally stored prefetch address information is judged, and finally, if yes, target prefetch data corresponding to the prefetch address information is sent to a central processing unit; if not, sending a target access request to the buffer memory so that the buffer memory obtains data to be prefetched corresponding to the target address information from the main memory according to the target access request, and uploading the data to be prefetched to the central processing unit. By adding the prefetching circuit, the central processing unit can automatically acquire the prefetching data corresponding to the prefetching address and send the address information which is not stored in the prefetching circuit to the high-speed memory, so that the high-speed memory acquires the corresponding data to be prefetched, the efficiency of directly acquiring the prefetching data by the central processing unit is improved, and the waiting time for the central processing unit to fetch the data is reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to these drawings.
FIG. 1 is a block diagram of a cache performance processing system according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a prefetch circuit according to an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a cache memory according to an embodiment of the disclosure;
FIG. 4 is a flowchart illustrating a method for processing cache performance applied to a prefetch circuit according to an embodiment of the present disclosure;
FIG. 5 is a flow chart illustrating another exemplary method for handling cache performance of a prefetch circuit according to the present disclosure;
fig. 6 is a schematic flowchart illustrating a processing method applied to the caching performance of the cache memory according to an embodiment of the present disclosure;
fig. 7 is an interaction diagram of a method for processing cache performance according to an embodiment of the present disclosure;
FIG. 8 is a block diagram of a cache performance processing system according to an embodiment of the present disclosure;
FIG. 9 is a block diagram of another exemplary processing system with cache performance disclosed in the embodiments of the present application;
fig. 10 is a schematic structural diagram of a device for processing cache performance according to an embodiment of the present disclosure.
Detailed Description
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be implemented in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the descriptions in this application referring to "first", "second", etc. are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.
In the hierarchical structure of a computer storage system, the difference between the access speed of the storage system and the operation speed of a processor is more and more obvious, the access performance becomes the bottleneck of the system performance, and at present, the processor generally adopts a cache memory as an effective method for improving the performance. Cache memory cache, referred to as cache memory for short, is a high-speed, small-capacity memory between central processing units and main memory. cached in the cache is data in main memory (internal memory). Data are stored in a cache in a way of dividing and grouping mode, and the cache structure is called a multi-way (way) group-connected cache. The scheduling and transfer of information between the cache and main memory is automated by hardware. The currently popular cache only reads the instruction from the main memory and replaces the instruction cache line of the cache when miss (hit failure) occurs in the access. Clearly, if miss occurs multiple times, the latency of fetching is greatly increased. For convenience of understanding and description, the cache of the cache memory is described in detail later, and the corresponding cache memory and the cache memory are the same product.
The conventional cache controller scheme accesses the slow main memory when miss occurs, reads out the instructions stored in the main memory, and replaces the old instructions in the cache. The cache is limited by the small capacity of the cache, the miss access is inevitable, the waiting time of CPU instruction fetching must be increased when the slow main memory is accessed, and the instruction execution efficiency is reduced. Moreover, in the scheme of the cache controller connected by the multi-way set, the fetch instruction can preferentially inquire the way of the cache line of the last hit instruction, and if the inquiry fails, the way is adjusted to reinitiate the access, so that the instruction fetching time is additionally increased.
Therefore, the embodiment of the application provides a processing method of cache performance, which is used for improving the hit rate and the access performance of an instruction cache. By adding a sequential instruction prefetching mechanism, when a sequentially executed code is reprocessed, when a CPU requests a current instruction line, a prefetching operation is used for reading and storing a next continuously stored instruction line, when a non-sequentially executed code (with branches) is processed, an instruction may not exist in the currently used or prefetched instruction line, the non-sequential instruction can be stored in a cache (a cache memory), and if the instruction requested by the CPU exists in a prefetching circuit or an instruction cache memory, the instruction can be immediately acquired without any delay. Meanwhile, an instruction fetching circuit of the cache is optimized, and the waiting time of a CPU for reading instructions is reduced. By the two methods, when the system deals with the conditions of sequential value taking and non-sequential value taking, the waiting period for fetching the instruction from the slow memory bank can be reduced, and the access efficiency is improved. It should be noted that the instruction is an operation required by the CPU to execute each step of the program. The instructions fetched by the prefetch circuitry are lines of instructions that are read from main memory by the prefetch circuitry via a prefetch scheme. For convenience of understanding and description, the description thereof is omitted hereinafter.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating an architecture of a cache performance processing system according to an embodiment of the present disclosure. Including a bus 101, prefetch circuitry 102, cache memory 103, and main memory 104.
The present embodiment discloses a cache performance processing system architecture, which includes a bus 101, a prefetch circuit 102, a cache memory 103, and a memory 104. The prefetch circuit 102 is connected to the bus 101, the cache memory 103, and the main memory 104, respectively, and the cache memory 103 is also connected to the bus 101 and the main memory 104.
The bus 101 is a common communication trunk line for transmitting information between various functional units of a computer, and is a transmission line bundle composed of wires, and the bus of the computer may be divided into a data bus, an address bus, and a control bus for transmitting data, data addresses, and control signals, respectively, according to the kind of information transmitted by the computer. In this embodiment, the CPU executes the corresponding instruction fetching operation through the bus 101, that is, the function of the bus 101 in this embodiment may be understood as the CPU, and for convenience of understanding and description, details thereof are not described later. In this embodiment, bus 101 may send a fetch request to prefetch circuitry 102.
Prefetch circuitry 102 includes a prefetch controller and a prefetch cache. Prefetch circuitry 102 primarily handles instruction fetch requests initiated by bus 101. Wherein the prefetch controller initiates a prefetch to the main memory bank based on the instruction fetch request. The prefetch cache stores instructions prefetched from main memory 104. Prefetch circuitry 102 compares the requested target address with the addresses of instructions already stored in the prefetch cache, and if a prefetch cache hit occurs, it returns the sequential instructions stored in the prefetch cache directly to bus 101, while initiating a prefetch operation for the next sequential address instruction. When the address of the instruction stored in the prefetch cache does not match the target address of the bus 101 request, i.e., a prefetch miss (miss), the prefetch circuitry 102 initiates an access request to the cache memory 103.
The cache memory portion includes cache memory control circuitry and a TagRAM and DataRAM that house an instruction tag (tag) and an instruction cache line (cacheline), respectively. After receiving an access request initiated by a prefetch controller in the prefetch circuit 102, the cache memory 103 compares a tag field in a request address with a tag field stored in the TagRAM, if the comparison is successful, it indicates that a target instruction is stored in the cache, that is, the cache hits, and sends the request instruction read from the DataRAM to the bus; if the comparison fails, namely the cache misses, an access request to the main memory is initiated, the instructions in the main memory are read out and backfilled into a DataRAM of the cache, and meanwhile, the instructions read out from the main memory are sent to a bus.
The main memory 104 stores instructions and corresponding instruction data, and for ease of understanding and description, the contents stored in the main memory 104 will not be limited in the following.
For the convenience of describing the prefetch circuit in detail, please refer to fig. 2, in which fig. 2 is a schematic structural diagram of the prefetch circuit according to an embodiment of the disclosure. Including prefetch circuitry 102, prefetch controller 201, prefetch buffer0 (2023), prefetch buffer1 (2022), and cache rbuffer2023, and main memory 104.
The prefetch circuit 102 includes a prefetch controller 201 and a prefetch cache including a prefetch buffer0 (2023), a prefetch buffer1 (2022) and a cache rbuffer2023. That is, the prefetch circuit includes three buffer buffers, wherein prefetch buffer0 (2023) and prefetch buffer1 (2022) are ping-pong prefetch buffers, i.e., the prefetch controller 201 described above. Whenever a piece of data is stored in prefetch buffer0 (2023) or prefetch buffer1 (2022), the ping-pong switch flag signal jumps, thereby selecting the next prefetch data to be stored. The other buffer rbuffer2023 is a read buffer into which the stored result is stored whenever an operation to read main memory occurs, whether fetch, finger, or prefetch. Correspondingly, each time the prefetch circuit 102 outputs an instruction, it needs to combine the results of the prefetch buffer0 (2023), the prefetch buffer1 (2022), and the buffer rbuffer2023, and perform arbitration to output the instruction.
For the convenience of describing the cache memory in detail, please refer to fig. 3, where fig. 3 is a schematic structural diagram of the cache memory disclosed in the embodiment of the present application. Including cache memory 103, control circuit 301, dataRAM302, tagRAM303, and bus 101.
The cache memory 103 includes a control circuit 301, a DataRAM302, and a TagRAM303. The control circuit 301 receives the access request, compares a tag field in the request address with a tag field stored in the TagRAM303, if the comparison is successful, it indicates that the target instruction is stored in the cache memory 103, that is, the cache memory 103 hits, and sends the request instruction read from the DataRAM302 to the bus; if the comparison fails, that is, if the cache memory 103 misses, an access request to the main memory 104 is issued, and the instructions in the main memory 104 are read out and refilled into the Data RAM302 of the cache memory 103, and the instructions read out from the main memory 104 are also sent to the bus 101.
Referring to fig. 4, fig. 4 is a flowchart illustrating a processing method for improving the cache performance of the prefetch circuit according to an embodiment of the present disclosure. Comprising steps 401-404.
401. A target fetch request is received.
As can be understood from the descriptions of the prefetch circuit in fig. 1 to fig. 3, the prefetch circuit and the prefetch controller circuit mainly process the instruction fetch request initiated by the bus, and as can be understood, the instruction fetch request is the target instruction fetch request described in the above section, and the instruction fetch request carries target address information, that is, address information corresponding to an instruction when the bus requests the main memory for the instruction.
Specifically, when the CPU needs to read the main memory through the bus to obtain an instruction, the CPU sends a target instruction fetch request to the prefetch circuit through the bus, where the target instruction fetch request carries an address of a desired instruction, that is, the instruction is stored in the main memory, each instruction has an address where it is stored, and the CPU reads the main memory to obtain a corresponding instruction by searching for the address.
In one embodiment, the instruction fetch request may include sequential address instructions of continuity or non-sequential address instructions (instruction jumps). For example, sequential address instructions in succession refer to the instruction physical address of a CPU instruction fetch request and the instruction physical address of the last instruction fetch in ascending order, and non-sequential address instructions refer to the instruction physical address of an instruction fetch request and the physical address of the last instruction fetch in irregular non-sequential order, which are considered to be instruction address jumps.
In another embodiment, the prefetch circuitry may only prefetch the sequential address instructions requested in the target fetch request.
402. It is determined whether the target address information matches locally stored prefetch address information. If yes, go to step 403; if not, go to step 404.
After the prefetch circuit obtains the target address information, the prefetch circuit compares the target address information required by the bus with the address of the stored instruction in the prefetch cache area, and if the match is successful, step 403 is executed; if the match fails, go to step 404. It will be appreciated that local as described above is the prefetch cache in the prefetch circuitry. For convenience of understanding and description, the description thereof is omitted hereinafter.
In one embodiment, the address information in the prefetch address information is an instruction having consecutive addresses.
403. And sending the target pre-fetching data corresponding to the pre-fetching address information to the central processing unit.
If the target address of the bus request is matched with the address of the stored instruction in the prefetch cache region, namely the instruction in the prefetch cache region hits, the prefetch circuit returns the instruction in the prefetch cache region to the bus, and simultaneously initiates the prefetch operation of the next sequential address instruction, thereby sending the prefetch data corresponding to the address instruction to the bus. Correspondingly, the CPU can obtain the prefetched data through the bus.
Specifically, the stored instruction in the prefetch cache is an instruction that the prefetch circuit predicts to be read by the bus next time, and is taken out in advance to be placed in the cache. The next time the bus fetches the instruction, the instruction is output from the prefetch cache if the required instruction is in the prefetch cache.
In one embodiment, the instruction fetch initiated by the prefetch circuit is an instruction that the prefetch circuit predicts to fetch next by the CPU and is fetched in advance to be placed in the cache. And (5) fetching the instruction by the CPU next time, and if the required instruction is in the pre-fetch cache, outputting the instruction from the pre-fetch cache. It should be noted that the instruction fetching behavior and the prefetching behavior of the bus are different behaviors, and have a certain correspondence. The fetch of the bus is an instruction for the CPU to fetch a target address through the bus. The prefetching is that the prefetching circuit actively judges the next target address according to the target address of the CPU and then reads the instruction stored in the main memory. The bus fetch request may be initiated at any time, and the prefetch circuit initiates the prefetch when main memory and the bus are idle. That is, the process of storing the target prefetch data corresponding to the prefetch address information by the prefetch circuit is when the CPU does not read the instruction in the main memory, that is, the fetch request of the bus can be initiated at any time, and the prefetch circuit initiates the prefetch when the main memory and the bus are idle.
404. And sending a target access request to the buffer memory so that the buffer memory acquires data to be prefetched corresponding to the target address information from the main memory according to the target access request, and uploading the data to be prefetched to the central processing unit.
If the target address of the bus request does not match the address of the instruction already stored in the prefetch cache, i.e., the prefetch cache instruction misses. Specifically, when the address of the instruction stored in the prefetch cache does not match the target address of the bus request, i.e., a prefetch miss, the prefetch circuitry initiates an access request to the cache. Correspondingly, the prefetch circuit sends a target access request to the cache, so that the cache acquires data to be prefetched corresponding to the target address information from the main memory according to the target access request and uploads the data to be prefetched to a bus, namely a CPU.
In one embodiment, the target access request carries a non-sequential address instruction. Specifically, the prefetch circuit initiates the read operation of the cache, and the cache hits, i.e., the non-sequential address instruction can be obtained in a zero-waiting manner. The specific process can be referred to steps 601-604 in fig. 6.
According to the processing method for the cache performance, a target instruction fetching request is received, whether target address information is matched with locally stored prefetch address information is judged, and finally, if yes, target prefetch data corresponding to the prefetch address information are sent to a central processing unit; if not, sending a target access request to the buffer memory so that the buffer memory obtains the data to be prefetched corresponding to the target address information from the main memory according to the target access request, and uploading the data to be prefetched to the central processing unit. By adding the prefetching circuit, the central processing unit can automatically acquire the prefetching data corresponding to the prefetching address and send the address information which is not stored in the prefetching circuit to the high-speed memory, so that the high-speed memory acquires the corresponding data to be prefetched, the efficiency of directly acquiring the prefetching data by the central processing unit is improved, and the waiting time for the central processing unit to fetch the data is reduced.
To facilitate a detailed description of the processing method applied to the cache performance of the prefetch circuit, please refer to fig. 5, where fig. 5 is a schematic flow chart of another processing method applied to the cache performance of the prefetch circuit according to an embodiment of the present disclosure. Comprising steps 501-507.
501. An initial fetch request is received.
As will be appreciated from the description of the prefetch circuitry in fig. 1-3, the prefetch circuitry or prefetch controller circuitry primarily handles bus-initiated fetch requests. Specifically, the prefetch circuitry receives an initial fetch request sent by the bus. The initial instruction fetching request carries initial address information corresponding to the initial instruction fetching request.
In one embodiment, the initial address information may be understood as a fetch request that the bus first sends to the prefetch circuitry, or may be understood as an instruction that the CPU is to read from the main memory. Correspondingly, based on the above embodiment, as long as the address information carried in the value taking request sent by the bus to the prefetch circuit is sent for the first time, the address information can be counted as the initial instruction taking request.
In another embodiment, the initial instruction fetch request may contain sequential address instructions in succession, or may be non-sequential address instructions (instruction jumps).
502. And if the initial address information contains two adjacent address information, determining the two adjacent address information as the prefetch address information.
After the prefetch circuit acquires the initial address information, the initial address information is analyzed, so that a sequential address instruction and a non-sequential address instruction are acquired. The prefetch circuit stores sequential address instructions, which are correspondingly adjacent address information, namely the prefetch address information described in the above section. If nonadjacent address information exists in the initial address information, every two pieces of nonadjacent address information are address information to be backfilled, at the moment, the address information to be backfilled is the non-sequential address instruction in the description, and for the non-sequential address instruction, the prefetch circuit can store the non-sequential instruction into the cache.
503. And sending a target prefetching request to the main memory according to the prefetching address information to obtain target prefetching data corresponding to the prefetching address information.
Since the prefetch circuitry is not storing the prefetch data corresponding to the initial fetch request at this time, the prefetch circuitry needs to retrieve the prefetch data from the main memory.
Specifically, as can be seen from fig. 2, it is also necessary to supplement that prefetch buffer0 and prefetch buffer1 are ping-pong prefetch buffers. Wherein, the ping-pong prefetch cache is that two prefetch buffers alternately perform prefetch operation. If the prefetch buffer0 initiates a prefetch, the read instruction is stored in the prefetch buffer0. The next prefetch is initiated by prefetch buffer1. The prefetch buffer0 and prefetch buffer1 have a tag field for storing an instruction address and a valid flag for indicating whether the storing instruction is valid, respectively, and the cache rbuffer also has rtag for storing an address and rvld for storing data. Prefetch buffer0 and prefetch buffer1 differ from cache rbuffer in that they operate in different clock domains. It should be understood that the clock domains described in the present embodiment, specifically, the operating clocks of the prefetch buffer0 and prefetch buffer1 are clocks generated by frequency division of the system clock, and the operating clock of the cache rbuffer is the system clock. The two pre-fetching buffers work in the same clock domain to ensure that the sequential logic function of the ping-pong pre-fetching operation is normal, and the buffer rbuffer works in the system clock domain to ensure that the transmission of the instruction data is matched with the bus system clock.
In one embodiment, the prefetch buffer area contains 3 cache buffers, where there are 2 buffers, i.e., 0 and 1, to be prefetched. The other is a read cache rbuffer, and as long as an operation of reading the main memory occurs, that is, the prefetch controller initiates a prefetch to the main memory, target prefetch data corresponding to the prefetch address information is obtained, and the results of the target prefetch data are all stored in the read cache rbuffer. Specifically, when the instruction is fetched, the prefetch address information (sequence) corresponding to the instruction is cached in the prefetch cache region, and is used for comparing with the address of the bus instruction fetching request to determine whether the instruction is hit. The address stored refers to the tag. Each buffer contains tag and valid flags, so prefetch buffer0 has tag0 and vld0, prefetch buffer1 has tag1 and vld1, and read cache rbuffer has rtag and rvld.
When the bus fetch hits one instruction of the prefetch buffers, if the next instruction of the instruction is not in the prefetch buffer0 or prefetch buffer1, the prefetch buffer initiates the prefetch operation of the next instruction, and the ping-pong switching flag is turned over when the prefetch is completed. When an instruction in the current prefetch buffer is executed, a new prefetch operation is not initiated as long as an instruction at an address subsequent to the instruction is in another prefetch buffer.
Correspondingly, the instruction stored in the prefetch buffer needs to be the instruction of the next address of the instruction taken last time by the CPU, but if neither prefetch buffer satisfies the above condition, a prefetch operation is issued to take the instruction of the next address. The next address refers to the address that is subsequent to the address where the CPU last fetched the instruction (address plus 1). The ping-pong switch flag indicates which buffer is currently prefetched, for example, a switch flag of 0 indicates that prefetch buffer0 is prefetched, and 1 indicates that prefetch buffer1 is prefetched. If the prefetch Buffer0 finishes prefetching, the next prefetch is prefetch Buffer1, and the corresponding flag is changed from 0 to 1.
Thus, sequentially executed code (instructions that are deposited contiguously) in the initial instruction fetch request may be sequentially cached in the prefetch buffer.
504. A target fetch request is received.
Step 504 in this embodiment is similar to step 401 in fig. 4, and is not described herein again. However, the target instruction fetch request may be understood as an instruction fetch request sent to the prefetch circuit by the bus for the second time, and may also be understood as an instruction to be read by the CPU from the main memory. Correspondingly, based on the above embodiment, as long as the address information carried in the value taking request sent by the bus to the prefetch circuit is sent for the second time, the address information can be counted as the target instruction taking request.
505. It is determined whether any of the target address information matches prefetch address information stored in the first prefetch buffer. If yes, go to step 506; if not, go to step 507.
When the prefetch circuitry retrieves the target address information, it enters a prefetch state. Specifically, the fetch and instruction operations initiated by the bus compare addresses to determine whether to prefetch a hit or not and to read a cache rbuffer for direct hit.
In one embodiment, the prefetch circuit determines whether any address information in the target address information matches prefetch address information stored in the prefetch buffer, and acquires a corresponding matching result. Correspondingly, if the bus fetch hits one of the prefetch buffers, it is determined that the matching fails, i.e., if the instruction following the bus fetch is not in prefetch buffer0 or prefetch buffer1, step 507 is executed. Correspondingly, if the latter instruction is an instruction sent from the first bus, refer to step 503.
In another embodiment, the prefetch circuitry first determines whether the target address information matches prefetch address information stored in the read buffer rbuffer to obtain a first matching result. If the first matching result is negative, the read buffer rbbuffer will obtain the prefetch data corresponding to the target address information from the main memory according to the target address information, and send the prefetch data to prefetch buffer0 or prefetch buffer1. And then judging whether the target address information is matched with the pre-fetching address information stored in the pre-fetching buffer0 or 1 to obtain a second matching result. If the second matching result is negative, go to step 507; if so, go to step 506.
Specifically, when the prefetch circuit is in a prefetch state, the fetch and instruction operations initiated by the bus compare addresses to determine whether a prefetch hit occurs and whether an rbuffer hit occurs directly. When a bus requests an instruction of a target address, if the instruction of the target address is not read from the rbuffer of the cache (rbuffer miss), the prefetch controller does not consider the prefetch operation miss, and the read cache rbuffer puts the instruction currently read from the rbuffer of the cache into prefetch buffer0 or prefetch buffer1. The content of the instruction in the current read cache rbuffer is the content already read from the main memory, and the instruction is stored in the read cache rbuffer regardless of whether the instruction is the content read by the prefetch buffer or not, and the instruction may not be the effective content required by the current CPU, so that the result of the prefetch judgment (whether the instruction is hit or not) is not influenced even if the instruction is not the effective content in the read cache rbuffer, and the result of the prefetch judgment (whether the instruction is hit or not) is influenced by the result of the prefetch judgment (whether the instruction is hit or not) and the result of the prefetch judgment (whether the instruction is hit or not) is finally judged to be the result of the prefetch buffer.
If the prefetch buffer0 or prefetch buffer1 also misses, the prefetch controller considers the prefetch operation miss, discards the data in the currently read prefetch buffer, and initiates a read operation on the target address information. Correspondingly, the above step 503 is executed.
If the bus fetch hits in the prefetch buffer, it continues to determine whether to read the next address. I.e. continue to step 505.
506. And sending target pre-fetching data corresponding to the pre-fetching address information to the central processing unit.
Step 506 in this embodiment is similar to step 403 in fig. 4, and is not described herein again. It should be noted that, if the address information in the target address information matches the prefetch address information, the prefetch circuit may sequentially send the prefetch data stored in the prefetch buffer0 and the prefetch buffer1 to the bus. For example, the target prefetch data in prefetch buffer0 is sent first, and the target prefetch data in prefetch buffer1 is sent next.
In one embodiment, the prefetch circuitry only returns prefetched sequential instructions to the bus, corresponding to target prefetched data for which two adjacent address information corresponds.
507. And sending a target access request to the buffer memory so that the buffer memory acquires data to be prefetched corresponding to the target address information from the main memory according to the target access request, and uploading the data to be prefetched to the central processing unit.
Step 507 in this embodiment is similar to step 404 in fig. 4, and is not described herein again.
By the cache performance processing method provided by the embodiment, the cache control scheme is optimized by adding the prefetch circuit, so that the hit rate of the CPU request instruction is improved, and the time consumed by the CPU when the CPU requests the instruction is reduced. Meanwhile, by adding the prefetch circuit, the hardware automatically prefetches the next sequential address instruction, so that the hit rate of the CPU for directly acquiring the sequential instruction is improved, the sequential address instruction can be read from the prefetch cache in zero wait, and the value taking efficiency is improved.
Fig. 6 is a flowchart illustrating a processing method applied to the cache performance of the cache memory according to an embodiment of the present disclosure. Comprising steps 601-604.
601. A target access request sent by the prefetch circuitry is received.
As can be seen from fig. 1 and 3, the Cache portion includes a Cache control circuit and a TagRAM and a DataRAM for storing an instruction tag (tag) and an instruction Cache line (Cache), respectively. The instruction tag is stored in the TagRAM. Stored within the DataRAM is an instruction cache line. Each cacheline has its unique tag, i.e. the contents are in one-to-one correspondence.
And a control circuit in the Cache receives an access request for the Cache sent by the pre-fetching circuit, wherein the access request carries target address information. It is understood that the target address information is non-sequential address information. It is to be understood that the access request is the target access request described in the above section.
602. And judging whether the target label information in the target address information is matched with all label information to be prefetched which is locally stored in the buffer memory. If yes, go to step 603; if not, go to step 604.
When the cache acquires the target address information, tag fields in the request address are compared with Tag fields stored in a Tag RAM.
In one embodiment, a control circuit in the cache analyzes the target address information to obtain target tag information therein, that is, a tag field in the request address, and then matches with all tag information to be prefetched stored in the cache, that is, the tag field stored in the TagRAM, and if matching is successful, step 603 is executed; if the matching fails, go to step 604.
From the above description of the cache, see tables 1 and 2. Where table 1 is the DataRAM cache line and table 2 is the TagRAM cache line.
Table 1:
Figure BDA0003880975350000191
table 2:
Figure BDA0003880975350000192
as can be understood from tables 1 and 2, the multiple paths of tag data and data in the way address where the cache line in the DataRAM and TagRAM is located correspond to each other one by one.
In another embodiment, the cache reads multiple paths of tag data and data at one time, and directly compares the multiple paths of tag data and the data with target tag information. If yes, go to step 603; if not, go to step 604.
603. And uploading the data to be prefetched corresponding to the tag information to be prefetched to a central processing unit.
If the comparison is successful, it indicates that the cache stores the data to be prefetched corresponding to the target address information, namely the cache hits, and then the request instruction read out from the dataRAM is sent to the bus.
In one embodiment, if the tag information to be prefetched of a certain path is matched with the target tag information (cache hit), the data of the path is sent to the bus, that is, if the tag of the certain path is matched with the target tag, the data of the path is sent to the bus. That is, data of matching way is output. See, for example, tables 1 and 2. If the tag matching of the Way1 in the TagRAM is successful, the data of the Way1 corresponding to the DataRAM is output to the bus.
604. And sending a target backfill request to the main memory according to the target access request so as to acquire data to be prefetched corresponding to the target address information, and uploading the data to be prefetched to the central processing unit.
If the comparison fails, namely the cache misses, an access request to the main memory is initiated, the instructions in the main memory are read out and backfilled into a DataRAM of the cache, and meanwhile, the instructions read out from the main memory are sent to a bus. Specifically, a target backfill request is sent to the main memory, so that an instruction corresponding to the target access request in the main memory is stored into the DataRAM of the cache, and then the instruction is sent to the bus, so that the next instruction can be conveniently prefetched.
According to the processing method for the cache performance, non-sequential address instructions (instruction jumps) are stored in the cache, and when miss is prefetched, the read operation of the cache is initiated, and the cache hits, namely the non-sequential address instructions can be obtained in a zero-waiting mode. That is, only when cache is executed, the cache will backfill the instruction line of the current address read from the main memory, so that only the prefetch data in discontinuous execution is stored in the cache, and the prefetch data corresponding to the instruction fetched by the prefetch circuit in continuous execution is obtained through the prefetch mechanism, thereby improving the hit rate and efficiency of fetching the instruction.
To facilitate the detailed description of the interaction flow between the prefetch circuit and the cache memory, please refer to fig. 7, where fig. 7 is an interaction diagram of a method for processing cache performance according to an embodiment of the present disclosure. Comprising steps 701-710.
701. The bus sends an initial fetch request to the prefetch circuitry.
702. The prefetch circuit determines two adjacent pieces of address information as prefetch address information.
In this embodiment, steps 701 to 702 are similar to steps 501 to 502 in fig. 5, and are not described herein again. It should be noted that the prefetch circuit sends non-adjacent address information to the cache, and correspondingly, the cache may also obtain data to be prefetched from the main memory according to the non-adjacent address information, so as to upload the data to be prefetched to the bus. The step of the cache acquiring the data to be prefetched can be referred to as step 604 in fig. 6.
703. The prefetch circuit sends a target prefetch request to the main memory according to the prefetch address information to obtain target prefetch data.
704. The bus sends a target fetch request to the prefetch circuitry.
705. The prefetch circuit determines whether any one of the target address information matches prefetch address information stored in the first prefetch buffer. If yes, go to step 706; if not, go to step 707.
706. The prefetch circuit sends the target prefetch data corresponding to the prefetch address information to the bus.
707. The prefetch circuit sends a target access request to the cache memory.
Step 707 in this embodiment is similar to step 503 to step 507 in fig. 5, and details are not repeated here.
708. The cache memory judges whether the target label information in the target address information is matched with all the label information to be prefetched stored locally in the cache memory. If yes, go to step 709; if not, go to step 710.
709. And the cache memory sends the data to be prefetched corresponding to the tag information to be prefetched to the bus.
710. The cache memory sends a target backfill request to the main memory to acquire data to be prefetched corresponding to the target address information and uploads the data to be prefetched to the bus.
Steps 708 to 710 in this embodiment are similar to steps 602 to 604 in fig. 6, and are not described herein again.
However, it should be noted that if the target address information carried in the target access request at this time is not cached by the cache to the corresponding data to be prefetched, that is, the corresponding instruction, the cache may obtain the corresponding instruction from the main memory according to the target address information. See step 604 for details.
According to the processing method of the cache performance provided by the embodiment, the prefetch circuit is added outside the prefetch data cache, when the CPU executes the fetched instruction, the prefetch circuit automatically reads the following sequential address instruction, and when the CPU instruction fetch address is discontinuous, namely jump occurs, the instruction at the discontinuous address is stored into the cache (note that only the instruction at the discontinuous address is stored into the cache, and the instruction at the rear address still continues to execute the prefetch function). When the CPU requests an instruction and the instruction fetched by the prefetch circuitry hits, a cache read is not initiated. When Miss is prefetched, the cache needs to wait for the result given by the cache, if the cache hits, the next address prefetching is initiated, and if the cache misses, the instruction of the current address is read from the main memory. Therefore, only when cachemiss occurs, the cache will backfill the instruction line of the current address read from the main memory, so that only the prefetch data in discontinuous execution is stored in the cache, and the instruction fetched by the prefetch circuit in continuous execution is obtained by the prefetch mechanism.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
If the scenario involves sensitive information (e.g., user information, business information), it should be noted that the collection, use, and handling of the sensitive information need to comply with relevant national and regional laws and regulations and standards, and need to be performed under the permission or consent of the corresponding subject (e.g., user or business, etc.).
Referring to fig. 8, fig. 8 is a schematic structural diagram of a cache performance processing system according to an embodiment of the present disclosure, which is applied to a prefetch circuit.
A receiving unit 801, configured to receive a target instruction fetch request, where the target instruction fetch request carries target address information;
a judging unit 802 for judging whether the target address information matches with the prefetch address information stored locally;
a sending unit 803, configured to send target prefetch data corresponding to the prefetch address information to the central processing unit when the target address information matches the locally stored prefetch address information;
the sending unit 803 is further configured to send a target access request to the buffer memory when the target address information does not match the locally stored prefetch address information, so that the buffer memory obtains the data to be prefetched corresponding to the target address information from the main memory according to the target access request, and uploads the data to be prefetched to the central processing unit.
Illustratively, the cache performance processing system further comprises: a determination unit 804;
the receiving unit 801 is further configured to receive an initial instruction fetch request, where the initial instruction fetch request carries initial address information corresponding to the initial instruction fetch request;
a determining unit 804, configured to determine, when there are two adjacent pieces of address information in the initial address information, that the two adjacent pieces of address information are prefetch address information;
the sending unit 803 is further configured to send a target prefetch request to the main memory according to the prefetch address information to obtain target prefetch data corresponding to the prefetch address information.
Illustratively, the processing system of the cache performance includes:
the determining unit 804 is further configured to determine, when nonadjacent address information exists in the initial address information, every two nonadjacent address information as address information to be backfilled;
the sending unit 803 is further configured to send address information to be refilled to the cache memory, so that the cache memory obtains data to be prefetched corresponding to the address information to be refilled from the main memory according to the address information to be refilled.
Illustratively, if the prefetch address information includes two adjacent first address information and second address information, the target prefetch data includes first prefetch data and second prefetch data, and the cache performance processing system includes:
the sending unit 803 is specifically configured to send, to the central processing unit, first prefetch data corresponding to the first address information and second prefetch data corresponding to the second address information in sequence when the address information in the target address information matches the prefetch address information.
Illustratively, the prefetch circuitry includes a prefetch cache; wherein, the prefetch buffer area at least comprises a first prefetch buffer and a second prefetch buffer, and the processing system of the cache performance also comprises:
the sending unit 803 is specifically configured to send target prefetch data corresponding to pairwise adjacent address information in the prefetch address information to the central processing unit when the first prefetch buffer and the second prefetch buffer locally store prefetch address information and any two address information in the prefetch address information are pairwise adjacent.
Illustratively, the cache performance processing system further comprises: an acquisition unit 805;
a determining unit 802, specifically configured to determine whether any address information in the target address information matches the prefetch address information stored in the first prefetch buffer;
the obtaining unit 805 is configured to, when any address information in the target address information matches with the prefetch address information stored in the first prefetch buffer, and address information to be prefetched, which is adjacent to any address information in the target address information, does not match with the prefetch address information stored in the second prefetch buffer, obtain target prefetch data corresponding to the address information to be prefetched from the main memory according to the address information to be prefetched.
Illustratively, the prefetch buffer further comprises a read buffer; the system for processing the cache performance further comprises:
a determining unit 802, specifically configured to determine whether the target address information matches the prefetch address information stored in the read buffer to obtain a first matching result;
the obtaining unit 805 is specifically configured to, when the first matching result is negative, the read buffer obtains initial prefetch data corresponding to the target address information from the main memory according to the target address information, and sends the initial prefetch data to the first prefetch buffer and the second prefetch buffer.
Illustratively, the cache performance processing system further comprises: an execution unit 806;
the determining unit 802 is further configured to determine whether the target address information matches with the prefetch address information stored in the first prefetch buffer and the second prefetch buffer to obtain a second matching result;
an executing unit 806, configured to execute the step of sending the target access request to the buffer memory when the first matching result and the second matching result are both negative, delete the target prefetch data stored in the first prefetch buffer and the second prefetch buffer, and obtain initial prefetch data corresponding to the target address information from the main memory according to the target address information;
the determining unit 804 is further configured to determine the initial prefetch data as the target prefetch data.
Illustratively, the processing system of the cache performance further comprises:
the determining unit 802 is further configured to determine whether to execute the step of sending the target fetch request to the main memory if the first matching result is negative and the second matching result is positive;
the obtaining unit 805 is further configured to, when the step of sending the target instruction fetch request to the main memory is executed, obtain initial prefetch data corresponding to the target address information from the main memory according to the target address information, and determine the initial prefetch data as target prefetch data;
the execution unit 806 is further configured to perform a step of sending target prefetch data corresponding to the prefetch address information to the central processing unit when the step of sending the target fetch request to the main memory is not performed.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a processing system with another caching performance according to an embodiment of the present disclosure, which is applied to a cache memory.
A receiving unit 901, configured to receive a target access request sent by a prefetch circuit, where the target access request carries target address information;
a judging unit 902, configured to judge whether target tag information in the target address information matches all tag information to be prefetched that is locally stored in the buffer memory;
an uploading unit 903, configured to upload, to the central processing unit, to-be-prefetched data corresponding to tag information to be prefetched when the target tag information matches any of the tag information to be prefetched;
and a sending unit 904, configured to send a target backfill request to the main memory according to the target access request when the target tag information does not match any tag information to be prefetched, so as to obtain data to be prefetched corresponding to the target address information, and upload the data to be prefetched to the central processing unit.
Referring to fig. 10, a schematic structural diagram of a processing apparatus for cache performance according to an embodiment of the present disclosure includes:
a central processing unit 1001, a memory 1005, an input/output interface 1004, a wired or wireless network interface 1003 and a power supply 1002;
the memory 1005 is a transient storage memory or a persistent storage memory;
the cpu 1001 is configured to communicate with the memory 1005 and execute the operations of the instructions in the memory 1005 to perform the method of any one of the embodiments shown in fig. 4-7.
The chip system according to an embodiment of the present application is further provided, where the chip system includes at least one processor and a communication interface, where the communication interface and the at least one processor are interconnected by a line, and the at least one processor is configured to execute a computer program or instructions to perform the method in any one of the embodiments shown in fig. 4 to fig. 7.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various media capable of storing program codes.

Claims (14)

1. A method for processing cache performance, applied to a prefetch circuit, the method comprising:
receiving a target instruction fetching request, wherein the target instruction fetching request carries target address information;
judging whether the target address information is matched with the locally stored prefetch address information;
if yes, sending target prefetch data corresponding to the prefetch address information to a central processing unit;
if not, sending a target access request to a buffer memory so that the buffer memory obtains data to be prefetched corresponding to the target address information from a main memory according to the target access request, and uploading the data to be prefetched to the central processing unit.
2. The method of claim 1, wherein before receiving the target fetch request, the method further comprises:
receiving an initial instruction fetching request, wherein the initial instruction fetching request carries initial address information corresponding to the initial instruction fetching request;
if the initial address information contains two adjacent address information, determining the two adjacent address information as the prefetch address information;
and sending a target prefetching request to the main memory according to the prefetching address information to acquire the target prefetching data corresponding to the prefetching address information.
3. The method of claim 2, wherein after receiving the initial instruction fetch request, the method further comprises:
if nonadjacent address information exists in the initial address information, determining that every two nonadjacent address information are address information to be backfilled;
and sending the address information to be backfilled to the cache memory, so that the cache memory acquires the data to be prefetched corresponding to the address information to be backfilled from the main memory according to the address information to be backfilled.
4. The method as claimed in claim 1, wherein if the prefetch address information includes a first address information and a second address information adjacent to each other, the target prefetch data includes a first prefetch data and a second prefetch data, and the sending the target prefetch data corresponding to the prefetch address information to the cpu comprises:
and if the address information in the target address information is matched with the prefetching address information, sequentially sending the first prefetching data corresponding to the first address information and the second prefetching data corresponding to the second address information to the central processing unit.
5. The method of handling cache performance of claim 1, wherein the prefetch circuitry comprises a prefetch cache; wherein the prefetch buffer at least comprises a first prefetch buffer and a second prefetch buffer, and the determining whether the target address information matches locally stored prefetch address information comprises:
judging whether any address information in the target address information is matched with the prefetch address information stored in the first prefetch buffer;
if yes, and address information to be prefetched adjacent to any address information in the target address information is not matched with the prefetching address information stored in the second prefetching buffer, the target prefetching data corresponding to the address information to be prefetched is obtained from the main memory according to the address information to be prefetched.
6. The method for processing cache performance according to claim 5, wherein the sending the target prefetch data corresponding to the prefetch address information to a central processing unit comprises:
and if the first pre-fetching buffer and the second pre-fetching buffer locally store the pre-fetching address information and any two pieces of address information in the pre-fetching address information are adjacent in pairs, sending target pre-fetching data corresponding to the adjacent address information in the pre-fetching address information to the central processing unit.
7. The method for handling cache performance according to claim 5, wherein the prefetch buffer further comprises a read buffer; the judging whether the target address information is matched with the locally stored prefetch address information comprises:
judging whether the target address information is matched with the prefetch address information stored in the reading buffer to obtain a first matching result;
if the first matching result is negative, the reading buffer acquires initial pre-fetching data corresponding to the target address information from the main memory according to the target address information, and sends the initial pre-fetching data to the first pre-fetching buffer and the second pre-fetching buffer.
8. The method for processing caching performance according to claim 7, wherein after determining whether the target address information matches with the prefetch address information stored in the prefetch buffer, the method further comprises:
judging whether the target address information is matched with the pre-fetching address information stored in the first pre-fetching buffer and the second pre-fetching buffer or not to obtain a second matching result;
if the first matching result and the second matching result are both negative, executing the step of sending a target access request to a buffer memory, deleting the target pre-fetching data stored in the first pre-fetching buffer and the second pre-fetching buffer, and acquiring the initial pre-fetching data corresponding to the target address information from the main memory according to the target address information;
determining the initial prefetch data as the target prefetch data.
9. The method for processing cache performance according to claim 8, wherein after determining whether the target address information matches with the prefetch address information stored in the first prefetch buffer and the second prefetch buffer, the method further comprises:
if the first matching result is negative and the second matching result is positive, judging whether to execute the step of sending the target instruction fetching request to the main memory;
if the step of sending the target instruction fetching request to the main memory is executed, acquiring the initial pre-fetching data corresponding to the target address information from the main memory according to the target address information, and determining the initial pre-fetching data as the target pre-fetching data;
and if the step of sending the target instruction fetching request to the main memory is not executed, sending target pre-fetching data corresponding to the pre-fetching address information to a central processing unit.
10. A method for processing cache performance, which is applied to a buffer memory, the method comprises:
receiving a target access request sent by a pre-fetching circuit, wherein the target access request carries target address information;
judging whether target label information in the target address information is matched with all label information to be prefetched stored locally in the buffer memory;
if the target tag information is matched with any tag information to be prefetched, uploading the data to be prefetched corresponding to the tag information to be prefetched to a central processing unit;
and if the target tag information is not matched with any tag information to be prefetched, sending a target backfill request to a main memory according to the target access request to acquire data to be prefetched corresponding to the target address information, and uploading the data to be prefetched to the central processing unit.
11. A cache performance processing system for use with a prefetch circuit, the system comprising:
the device comprises a receiving unit, a sending unit and a receiving unit, wherein the receiving unit is used for receiving a target instruction fetching request, and the target instruction fetching request carries target address information;
a judging unit configured to judge whether the target address information matches prefetch address information stored locally;
a sending unit, configured to send target prefetch data corresponding to the prefetch address information to a central processing unit when the target address information matches locally-stored prefetch address information;
the sending unit is further configured to send a target access request to a buffer memory when the target address information does not match with the locally stored prefetch address information, so that the buffer memory obtains data to be prefetched corresponding to the target address information from a main memory according to the target access request, and uploads the data to be prefetched to the central processing unit.
12. A system for handling cache performance, applied to a cache memory, the system comprising:
the device comprises a receiving unit, a pre-fetching circuit and a sending unit, wherein the receiving unit is used for receiving a target access request sent by the pre-fetching circuit, and the target access request carries target address information;
the judging unit is used for judging whether target label information in the target address information is matched with all label information to be prefetched stored locally in the buffer memory;
the uploading unit is used for uploading the data to be prefetched corresponding to the tag information to be prefetched to a central processing unit when the target tag information is matched with any one of the tag information to be prefetched;
and the sending unit is used for sending a target backfill request to a main memory according to the target access request when the target tag information is not matched with any tag information to be prefetched, so as to obtain data to be prefetched corresponding to the target address information, and uploading the data to be prefetched to the central processing unit.
13. An apparatus for handling cache performance, the apparatus comprising:
the system comprises a central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;
the memory is a transient memory or a persistent memory;
the central processor is configured to communicate with the memory and execute the operations of the instructions in the memory to perform the method of any one of claims 1 to 10.
14. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 10.
CN202211228384.3A 2022-10-08 2022-10-08 Processing method of cache performance and related equipment thereof Pending CN115587052A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211228384.3A CN115587052A (en) 2022-10-08 2022-10-08 Processing method of cache performance and related equipment thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211228384.3A CN115587052A (en) 2022-10-08 2022-10-08 Processing method of cache performance and related equipment thereof

Publications (1)

Publication Number Publication Date
CN115587052A true CN115587052A (en) 2023-01-10

Family

ID=84779566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211228384.3A Pending CN115587052A (en) 2022-10-08 2022-10-08 Processing method of cache performance and related equipment thereof

Country Status (1)

Country Link
CN (1) CN115587052A (en)

Similar Documents

Publication Publication Date Title
US4980823A (en) Sequential prefetching with deconfirmation
JP3969009B2 (en) Hardware prefetch system
US6484239B1 (en) Prefetch queue
US6782454B1 (en) System and method for pre-fetching for pointer linked data structures
EP0097790A2 (en) Apparatus for controlling storage access in a multilevel storage system
EP0106667A2 (en) Central processing unit
JPH06318177A (en) Method, device and computer system for reducing cache mistake penalty
EP2097809B1 (en) Methods and apparatus for low-complexity instruction prefetch system
US7047362B2 (en) Cache system and method for controlling the cache system comprising direct-mapped cache and fully-associative buffer
JPH09160827A (en) Prefetch of cold cache instruction
US20080307169A1 (en) Method, Apparatus, System and Program Product Supporting Improved Access Latency for a Sectored Directory
US9003123B2 (en) Data processing apparatus and method for reducing storage requirements for temporary storage of data
JPH02239331A (en) Data processing system and method with heightened operand usability
US20110022802A1 (en) Controlling data accesses to hierarchical data stores to retain access order
CN108874691B (en) Data prefetching method and memory controller
US20050198439A1 (en) Cache memory prefetcher
US6823430B2 (en) Directoryless L0 cache for stall reduction
EP2722763A1 (en) Arithmetic processing unit, information processing device, and arithmetic processing unit control method
CN116610262A (en) Method, device, equipment and medium for reducing SSD sequential reading delay
CN115587052A (en) Processing method of cache performance and related equipment thereof
CN114925001A (en) Processor, page table prefetching method and electronic equipment
CN115563031A (en) Instruction cache prefetch control method, device, chip and storage medium
CN115309453A (en) Cache access system supporting out-of-order processor data prefetching
EP0296430A2 (en) Sequential prefetching with deconfirmation
US20050193172A1 (en) Method and apparatus for splitting a cache operation into multiple phases and multiple clock domains

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination