CN112527395B

CN112527395B - Data prefetching method and data processing apparatus

Info

Publication number: CN112527395B
Application number: CN202011307964.2A
Authority: CN
Inventors: 胡世文
Original assignee: Haiguang Information Technology Co Ltd
Current assignee: Haiguang Information Technology Co Ltd
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2023-03-07
Anticipated expiration: 2040-11-20
Also published as: CN112527395A

Abstract

A data prefetching method and a data processing apparatus. The data prefetching method comprises the following steps: receiving an operation instruction, and providing an access request of the operation instruction to a data prefetcher; determining whether an access request of an operation instruction is a load access request or a store access request; responding to the access request of the operation instruction, namely a loading access request, training a loading access prefetching function in the data prefetcher by adopting the loading access request, outputting the loading access prefetching request, and performing loading access prefetching based on the loading access prefetching request; and responding to the access request of the operation instruction, namely a storage access request, training a storage access prefetching function in the data prefetcher by adopting the storage access request, outputting the storage access prefetching request, and prefetching the storage access based on the storage access prefetching request. The data prefetching method can improve the performance of a data processing device adopting the data prefetching method.

Description

Data prefetching method and data processing apparatus

Technical Field

The embodiment of the disclosure relates to a data prefetching method and a data processing device.

Background

In a related Central Processing Unit (CPU) architecture, program instructions and data may be stored in a Dynamic Random Access Memory (DRAM), a type of Memory.

Disclosure of Invention

At least one embodiment of the present disclosure provides a data prefetching method, including: receiving an operation instruction, and providing an access request of the operation instruction to a data prefetcher; determining whether an access request of the operation instruction is a load access request or a store access request; responding to the access request of the operating instruction, namely the loading access request, training a loading access prefetching function in the data prefetcher by adopting the loading access request, outputting the loading access prefetching request, and performing loading access prefetching based on the loading access prefetching request; and responding to the access request of the operating instruction, namely the storage access request, training a storage access prefetching function in the data prefetcher by adopting the storage access request, outputting the storage access prefetching request, and prefetching the storage access based on the storage access prefetching request.

For example, in at least one example of the data prefetching method, the performing the memory access prefetch based on the memory access prefetch request includes: discarding the memory access prefetch request in response to target data of the memory access prefetch request already being in a first level cache memory or a second level cache memory; and responding to the condition that the target data of the storage access prefetching request is not in the first-level cache memory and the second-level cache memory, acquiring the target data of the storage access prefetching request, and writing the target data of the storage access prefetching request into the second-level cache memory.

For example, in at least one example of the data prefetching method, the discarding the memory access prefetch request in response to the target data of the memory access prefetch request already being in a level one cache memory or a level two cache memory comprises: responsive to the target data of the memory access prefetch request already being in the level one cache memory, discarding the memory access prefetch request; providing the memory access prefetch request to the level two cache memory in response to the target data of the memory access prefetch request not being in the level one cache memory; and responsive to the target data of the memory access prefetch request already being in the secondary cache memory, discarding the memory access prefetch request.

For example, in at least one example of the data prefetching method, the providing the memory access prefetch request to the secondary cache memory in response to the target data of the memory access prefetch request not being in the primary cache memory comprises: responding to the target data of the storage access prefetching request not in the first-level cache memory, applying for a first storage item from a storage access prefetching sequence cache memory, and distributing the first storage item to the storage access prefetching request; and the memory access prefetch sequence buffer memory provides the memory access prefetch request to the secondary cache memory.

For example, in at least one example of the data prefetching method, the obtaining target data of the memory access prefetching request and writing the target data of the memory access prefetching request into the secondary cache memory includes: the second-level cache memory requests the target data of the memory access prefetch request from the memory device at the next level of the second-level cache memory, and stores the target data of the memory access prefetch request provided by the memory device at the next level in the second-level cache memory.

For example, in at least one example of the data prefetching method, the performing the load access prefetch based on the load access prefetch request includes: discarding the load access prefetch request in response to target data of the load access prefetch request already being in a level one cache memory; and responding to the condition that the target data of the load access pre-fetching request is not in the first-level cache memory, acquiring the target data of the load access pre-fetching request, and writing the target data of the load access pre-fetching request into the first-level cache memory.

For example, in at least one example of the data prefetching method, the obtaining target data of the load access prefetching request and writing the target data of the load access prefetching request into the level one cache memory includes: applying for a second memory entry to the miss address buffer and allocating said second memory entry to said load access prefetch request; the miss address buffer memory requests the target data of the load access prefetch request from a second-level cache memory; the second-level cache memory acquires the target data of the load access prefetching request and provides the target data of the load access prefetching request to the miss address cache memory; and the miss address buffer memory writes the target data of the load access prefetch request into the first level cache memory.

For example, in at least one example of the data prefetching method, the data prefetcher includes a first class of prefetcher entries and a second class of prefetcher entries different from the first class of prefetcher entries; the access request responding to the operation instruction is the load access request, and the training of the load access prefetching function in the data prefetcher by adopting the load access request comprises the following steps: responding to the access request of the operation instruction, namely the load access request, allocating the address of the load access request to the first type of prefetcher entry, and training the first type of prefetcher entry by adopting the address of the load access request; and the access request responding to the second operation instruction is the storage access request, and the training of the storage access prefetching function in the data prefetcher by adopting the storage access request comprises the following steps: responding to the access request of the operation instruction is the storage access request, allocating the address of the storage access request to the second type prefetcher entry, and training the second type prefetcher entry by adopting the address of the storage access request.

For example, in at least one example of the data prefetching method, the first class of prefetcher entries includes a first plurality of prefetcher entries, and the second class of prefetcher entries includes a second plurality of prefetcher entries; said assigning an address of said load access request to said first class prefetcher entry comprising: determining a first prefetcher entry to which the address of the load access request belongs based on the address of the load access request, and allocating the address of the load access request to the first prefetcher entry to which the address of the load access request belongs; and said assigning the address of the memory access request to the second class of prefetcher entries comprises: determining a second prefetcher entry to which the address of the memory access request belongs based on the address of the memory access request, and assigning the address of the memory access request to the second prefetcher entry to which the address of the memory access request belongs.

For example, in at least one example of the data prefetch method, the data prefetcher includes a plurality of prefetcher entries. The data prefetching method further comprises: marking each of a first portion of the plurality of prefetcher entries as the first class of prefetcher entries in response to an access request for an operating instruction corresponding to a first data in a data stream tracked by the first portion of the plurality of prefetcher entries being for a load access; and marking each of the second portion of the plurality of prefetcher entries as the second class of prefetcher entries in response to an access request for a storage access by an operating instruction corresponding to a first data in the data stream tracked by the second portion of the plurality of prefetcher entries.

For example, in at least one example of the data prefetching method, the marking each of the first portion of the plurality of prefetcher entries as the first class of prefetcher entries comprises: setting a value of a load access identification data field included in each of a first portion of the plurality of prefetcher entries to a first value; and said marking each of a second portion of said plurality of prefetcher entries as said second class of prefetcher entries comprises: setting a value of a load access identification data field included in each of a second portion of the plurality of prefetcher entries to a second value different from the first value.

For example, in at least one example of the data prefetching method, the outputting a load access prefetch request includes: outputting a first access prefetch address based on the regularity of each tracked data stream of a first part of the plurality of prefetcher entries and identifying an access prefetch request corresponding to the first access prefetch address as the load access prefetch request; and said outputting a memory access prefetch request, comprising: outputting a second access prefetch address based on the regularity of each tracked data stream of a second portion of the plurality of prefetcher entries, and identifying an access prefetch request corresponding to the second access prefetch address as the storage access prefetch request.

For example, in at least one example of the data prefetching method, the address of the load access request, the address of the store access request, the first access prefetching address, and the second access prefetching address are all virtual addresses: and the data prefetching method further comprises: translating the address of the first access prefetch address and the address of the second access prefetch into a physical address, so as to perform the load access prefetch and the load store prefetch based on the physical address of the first access prefetch address and the address of the second access prefetch, respectively.

At least one embodiment of the present disclosure provides a data processing apparatus including a data prefetcher. The data prefetcher is configured to: receiving an operation instruction, and determining whether an access request of the operation instruction is a load access request or a storage access request; responding to the access request of the operation instruction, namely the load access request, training a load access prefetching function in the data prefetcher by adopting the load access request, outputting the load access prefetching request, and performing load access prefetching based on the load access prefetching request; and responding to the access request of the operating instruction, namely the storage access request, training a storage access prefetching function in the data prefetcher by adopting the storage access request, outputting the storage access prefetching request, and prefetching the storage access based on the storage access prefetching request.

For example, in at least one example of the data processing apparatus, the data processing apparatus further comprises a level one cache memory and a level two cache memory. The performing the memory access prefetch based on the memory access prefetch request comprises: discarding the memory access prefetch request in response to target data of the memory access prefetch request already being in the first level cache memory or the second level cache memory; and responding to the condition that the target data of the storage access pre-fetching request is not in the first-level cache memory and the second-level cache memory, acquiring the target data of the storage access pre-fetching request, and writing the target data of the storage access pre-fetching request into the second-level cache memory.

For example, in at least one example of the data processing apparatus, discarding the memory access prefetch request in response to the target data of the memory access prefetch request having been in the level one cache memory or the level two cache memory comprises: discarding the memory access prefetch request in response to target data of the memory access prefetch request already being in the level one cache memory; providing the memory access prefetch request to the secondary cache memory in response to target data of the memory access prefetch request not being in the primary cache memory; and responsive to the target data of the memory access prefetch request already being in the secondary cache memory, discarding the memory access prefetch request.

For example, in at least one example of the data processing apparatus, the data processing apparatus further comprises a memory access prefetch sequence buffer memory. The providing the memory access prefetch request to the secondary cache memory in response to the target data of the memory access prefetch request not being in the primary cache memory comprises: responding to the condition that the target data of the storage access prefetching request is not in the first-level cache memory, applying for a first storage item from the storage access prefetching sequence cache memory, and distributing the first storage item to the storage access prefetching request; and the memory access prefetch sequence buffer memory provides the memory access prefetch request to the secondary cache memory.

For example, in at least one example of the data processing apparatus, the data prefetcher comprises a first class of prefetcher entries and a second class of prefetcher entries different from the first class of prefetcher entries; the access request responding to the operation instruction is the load access request, and the training of the load access prefetching function in the data prefetcher by adopting the load access request comprises the following steps: responding to the access request of the operation instruction, namely the load access request, allocating the address of the load access request to the first type of prefetcher entry, and training the first type of prefetcher entry by adopting the address of the load access request; and the access request responding to the second operation instruction is the storage access request, and the training of the storage access prefetching function in the data prefetcher by adopting the storage access request comprises the following steps: responding to the access request of the operation instruction is the storage access request, allocating the address of the storage access request to the second type prefetcher entry, and training the second type prefetcher entry by adopting the address of the storage access request.

For example, in at least one example of the data processing apparatus, the data prefetcher comprises a plurality of prefetcher entries; each included load access identification data field of the first portion of the plurality of prefetcher entries has a first value; each of the second portion of the plurality of prefetcher entries includes a load access identifying data field having a value that is a second value different from the first value; the first value is used to identify each of a first portion of the plurality of prefetcher entries as the first class of prefetcher entries; and the second value is used to identify each of the second portion of the plurality of prefetcher entries as the second class of prefetcher entries.

For example, in at least one example of the data processing apparatus, the load access identification data field is a load access identification data bit.

For example, in at least one example of the data processing apparatus, the performing the load access prefetch includes: outputting a first access prefetch address based on the regularity of each tracked data stream of a first part of the plurality of prefetcher entries and identifying an access prefetch request corresponding to the first access prefetch address as the load access prefetch request; and said performing said memory access prefetch comprises: outputting a second access prefetch address based on the regularity of each tracked data stream of a second portion of the plurality of prefetcher entries, and identifying an access prefetch request corresponding to the second access prefetch address as the storage access prefetch request.

For example, in at least one example of the data processing apparatus, the data processing apparatus further comprises an address translator. The address of the load access request, the address of the store access request, the first access prefetch address and the second access prefetch address are all virtual addresses: and the address translator is configured to translate the first access prefetch address and the second access prefetch address into physical addresses for the load access prefetch and the load store prefetch based on the first access prefetch address and the second access prefetch physical addresses, respectively.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure and are not limiting to the present disclosure.

FIG. 1 shows a flow chart of a CPU core reading data;

FIG. 2 illustrates a flow diagram of a data prefetcher for training and prefetching;

FIG. 3 is a flow diagram of a data prefetch method provided by at least one embodiment of the present disclosure;

FIG. 4 illustrates a schematic diagram of a single prefetcher entry provided by at least one embodiment of the present disclosure;

FIG. 5 is one example of a data prefetch method provided by at least one embodiment of the present disclosure;

fig. 6 is an exemplary block diagram of a data processing apparatus provided by at least one embodiment of the present disclosure; and

fig. 7 is a schematic diagram of an example of the data processing apparatus shown in fig. 6.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.

Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and the like in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. Similarly, the word "comprising" or "comprises", and the like, means that the element or item preceding the word comprises the element or item listed after the word and its equivalent, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

Terms to which at least one embodiment of the present disclosure may relate are explained as follows.

Address translation: modern operating systems often support multiple processes running simultaneously. To simplify multi-process management and enhance security, an application uses a complete virtual address, e.g. a 32-bit application having at most 2 ³² A virtual address space of =4GB is available. When a program is run, these virtual addresses are mapped into multiple memory pages, each having its own physical memory address. When the application program accesses the instruction and the data, the virtual address of the instruction and the data must be translated into a physical address, whether the access of the application program to the page is legal or not is detected, and then the access is sent to a memory or a cache to obtain corresponding data to be transmitted to a CPU core. The process of converting a virtual address to a physical address is called address translation.

Load Queue (LDQ): a queue for holding all dispatched load instructions (read instructions) in the CPU core. The data read/write pipeline may retrieve information (e.g., an address) associated with the data to be loaded from the LDQ and attempt to retrieve the corresponding data from the cache memory. After obtaining this data, the corresponding entry for LDQ is released.

Store sequence (Store Queue, STQ): a queue to hold all the dispatched store instructions (write instructions) in the CPU core. The STQ will cache the data to be written into memory first. When the store instruction (write instruction) exits (reire) and is the oldest instruction in the STQ, the data read-write pipeline acquires the data to be modified and the related information from the STQ, and tries to write the data to be modified into the cache. After the write cache is successful, the corresponding entry of the STQ is released.

CPU core pipeline stage and pipeline stage: in order to improve the performance of the CPU core, the modern CPU core uses a pipeline mode, namely, the whole process of extracting, decoding, executing and writing a result of one instruction is divided into a plurality of pipeline stages, and one instruction can only be in one certain pipeline stage in one clock; the CPU core may have multiple instructions running in different pipeline stages. Each pipeline stage may include multiple pipeline stages, each performing limited operations to further improve the performance of the CPU core.

A stream prefetcher (stream prefetcher) is used to access an entire piece of data.

Stride prefetcher (stride prefetch) is used for data access for a fixed stride (such as access address A, A +2, A +4, A +6, …).

The Miss Address Buffer (MAB) is also called MSHR (Missing Status Handling Register). When a read-write and prefetch request is not in the first level cache but needs to be read to the next level cache, the request and the corresponding attribute are stored in the MAB until the next level cache returns the requested data.

Fig. 1 shows a flow chart of a CPU core reading data. As shown in fig. 1, the flow of the CPU core reading data includes the following steps S01 to S07.

Step S01: the CPU core outputs a virtual address of target data of an access request to an address translator (e.g., address translation pipeline) 511, the address translator 511 translates the virtual address into a physical address, and then the CPU checks to see whether the target data of the access request is in the level one cache memory.

Step S02: if the target data of the access request (e.g., read request) is in the primary cache memory, the target data of the access request is fetched from the primary cache memory and supplied to the CPU core through the following step S07.

Step S03: if the target data of the access request is not in the first-level cache memory, a storage item is applied to a Missing Address Buffer (MAB) 512, and the storage item is allocated to the access request.

Step S04: the miss address buffer memory 512 requests the target data of the access request to a buffer memory of the next stage (for example, a second-stage cache memory).

Step S05: the next-stage buffer memory obtains the target data of the access request and returns the target data of the access request to the missed address buffer memory.

For example, as shown in fig. 1, in a case where the target data of the access request is stored in the secondary cache memory, the secondary cache memory obtains the target data of the access request from the secondary cache memory. For example, in a case where the target data of the access request is not stored in the second level cache memory, the second level cache memory obtains the target data of the access request from a storage device located at a next level of the second level cache memory. For example, the memory device at the next level of the second level cache memory may be a third level cache memory or a memory (e.g., a DRAM).

Step S06: the miss address buffer memory writes the target data of the access request into the first-level cache memory.

Step S07: the first-level cache memory supplies target data of the access request to the CPU core.

The buffer memory can only store data that has been recently accessed by the CPU core, for example. When reading data that is never accessed by the CPU core or data that is kicked (evicted) limited by the size of the buffer memory, it is necessary to fetch the data from a memory (e.g., a DRAM memory). However, since the operating frequency of the CPU core is much higher than that of the DRAM memory, for example, hundreds of clocks of the CPU core are needed to obtain data and instructions from the DRAM memory, the CPU core needs to wait tens of clock cycles or even hundreds of clock cycles when reading data that is not accessed by the CPU core or data that is kicked out (evicted) by the size of the buffer memory, which often causes the CPU core to idle due to the fact that the CPU core cannot continue to operate related instructions, and thus the performance of the CPU is lost.

Therefore, recently accessed data can be stored by adopting a multi-level cache architecture, the previous data is analyzed by utilizing a prefetching technology to obtain a data access rule of the CPU core, and the data to be used is prefetched in advance based on the data access rule, so that the clock period of the CPU core waiting for the data is reduced, and the overall performance of the CPU is improved.

A prefetcher for prefetching instructions is called an instruction prefetcher; the prefetcher used to prefetch data is referred to as a data prefetcher. The data prefetchers may be divided into a first-Level data prefetcher (i.e., a data prefetcher that prefetches target data into a first-Level cache memory), a second-Level data prefetcher (i.e., a data prefetcher that prefetches target data into a second-Level cache memory), a Last-Level data prefetcher (i.e., a data prefetcher that prefetches target data into a Last-Level cache memory), and the like. In a multi-level cache architecture, the access speed of a first-level (L1) cache memory is the fastest, but the capacity of the first-level (LLC, generally, the third-level) cache memory is the largest, but the access speed of the last-level (LLC) cache memory is the slowest, and both the access speed and the capacity of a second-level (L2) cache memory are between those of the L1 cache memory and the LLC cache memory.

FIG. 2 shows a flow diagram of a data prefetcher for training and prefetching. The prefetcher shown in fig. 2 is a level one data prefetcher, that is, a data prefetcher that prefetches target data into a first level cache memory. In addition, the prefetcher shown in FIG. 2 trains a data prefetcher using virtual addresses.

For example, as shown in fig. 2, the data prefetcher performs training and prefetching through steps S012 to S017 as follows.

Step S012: the data prefetcher receives the virtual address and other attributes of all (or part) of the access requests (such as historical access requests), trains by using the virtual address of the access requests (such as historical access requests) to acquire the data access rule of the CPU core, and outputs the data prefetch request based on the data access rule. And performing data prefetching based on the data prefetching request.

Step S013: the virtual address of the target data of the data prefetch request is translated to a physical address using an address translator, and then a check is made (e.g., the CPU core checks using the physical address of the target data of the data prefetch request) whether the target data of the data prefetch request is in the level one cache memory.

For example, if the data targeted by the data prefetch request is in the level one cache memory, the data prefetch request is discarded. Correspondingly, the following steps S014 to S017 need not be performed for the data prefetch request.

Step S014: if the target data of the data prefetch request is not in the first-level cache memory, a storage item is applied to a Missing Address Buffer (MAB) 512, and the storage item is allocated to the data prefetch request.

Step S015: the miss address buffer memory 512 requests the target data from a buffer memory of the next level (for example, a level two cache memory).

Step S016: the next level of buffer memory fetches the target data and returns the target data to the miss address buffer memory. For example, as shown in fig. 2, in a case where the target data of the data prefetch request is stored in the secondary cache memory, the secondary cache memory fetches the target data of the data prefetch request from the secondary cache memory. For example, in a case where the target data of the data prefetch request is not stored in the second-level cache memory, the second-level cache memory acquires the target data of the data prefetch request from a storage device located at a next level of the second-level cache memory.

Step S017: the miss address buffer memory writes the target data into the first level cache memory.

It should be noted that, while the data prefetcher is performing training and prefetching, the CPU core reads data to perform related operations. Correspondingly, for convenience of description, fig. 2 also shows a part of a flow of the CPU core reading data. For example, the flow of the CPU core reading data includes the following step S011.

Step S011: the CPU core outputs a virtual address of target data of the access request to an address translator (e.g., address translation pipeline) 511, which translates the virtual address into a physical address, and then, the CPU checks to see whether the target data of the access request is in the level one cache memory.

And if the target data of the access request is in the first-level cache memory, the CPU core fetches the target data of the access request from the first-level cache memory. If the target data of the access request is not in the primary cache memory, the miss address buffer memory 512 obtains the target data of the access request and writes the target data of the access request into the primary cache memory, and then the CPU core fetches the target data of the access request from the primary cache memory.

For example, the data prefetcher shown in FIG. 2 may be trained using only the addresses of load access requests (e.g., historical load access requests). In this case, the access rule obtained by the data prefetcher is only the rule of the load access request, and correspondingly, the data prefetcher can only perform load access prefetching, that is, the data prefetcher can only prefetch the target data of the load access request. However, since the data prefetcher cannot prefetch the target data of the storage access request, in the case where the data of the target data of the storage access request is not located in the buffer memories (e.g., the first and second buffer memories), the data needs to be fetched to the memory (DRAM), which may cause adverse effects on the CPU performance.

For example, in order that the data prefetcher can prefetch target data of a store access request, the data prefetcher (single data prefetcher) shown in fig. 2 can be trained using both the address of a load access request (e.g., a historical load access request) and the address of a store access request (e.g., a historical store access request) such that the access requests output by the data prefetcher include both the load access request and the store access request.

However, the inventors of the present disclosure noted in their research that, for an example in which a single data prefetcher is trained using both the address of a load access request and the address of a store access request, prefetching to a level one cache memory includes not only the target data of the load access request (i.e., the data obtained based on the load access prefetch request) but also the target data of the store access request (i.e., the data obtained based on the store access prefetch request); due to the small capacity of the level one cache memory, prefetching both the data obtained based on the load access prefetch request and the data obtained based on the store access prefetch request into the level one cache memory will cause the data already stored in the first cache memory (e.g., the data obtained based on the load access prefetch request or other useful data) to be replaced by the newly prefetched data obtained based on the store access prefetch request, which may cause a large amount of data obtained based on the store access prefetch request (some of which may be invalid or non-urgent) to replace the useful data in the first cache memory, thereby increasing the probability that the target data of the load access request is not in the level one cache memory; in addition, since the impact of the load instruction stalled due to cache miss (cache miss) on the performance of the CPU is greater (for example, much greater) than the impact of the store instruction stalled due to cache miss on the performance of the CPU, the data already stored in the first buffer memory is replaced by the target data of the newly prefetched store access request, and thus the data prefetcher has a limited effect on improving the performance of the CPU, and even the overall performance of the CPU is reduced.

Among them, for a CPU (high-performance CPU) supporting out-of-order execution, the load instruction stalled due to a cache miss has a larger (e.g., much larger) influence on the performance of the CPU than the store instruction stalled due to a cache miss, for the following reason. The stalling of a load instruction will cause all instructions following the load instruction that depend on the load instruction's target data to fail to execute and occupy resources within the CPU core (which is often a major reason why an application cannot fully utilize the CPU core's maximum performance). Generally speaking, the dependency of the application program on the target data of the store instruction is small, or the dependency can be eliminated by some optimization method, such as a store to load forwarding (store to load forwarding) method, so that the influence of the load instruction which is halted due to the store miss on the performance of the CPU is small. For example, in the case where the memory sequence has no memory space (e.g., all memory instructions in the memory sequence need to wait for data fetched from DRAM to execute), the CPU core stops dispatching new memory instructions to the memory sequence, in which case CPU performance will be adversely affected.

Although researchers have noticed that the related data prefetchers have limited performance improvement capability of the CPU, the researchers have focused more on improving the hit rate of the data prefetched by the data prefetchers through the design of the data prefetchers (for example, improving the performance of load prefetching through more accurate discovery rules) to reduce the proportion of the data which is not utilized in the prefetched data, or reducing the possibility that the useful data is replaced by the new prefetched data through optimizing a replacement algorithm (for example, reducing the possibility that the target data of the load access request which is already stored in the first buffer memory is replaced by the newly prefetched missed data).

In addition, since the aforementioned load instructions stalled due to a cache miss have a greater impact on the performance of the CPU than the store instructions stalled due to a cache miss (which impact on the performance of the CPU), researchers have been particularly concerned with improving the performance of load prefetching.

However, the inventors of the present disclosure have also noted in their research that after the hit rate of the data prefetched by the data prefetcher has increased to a certain degree and/or the replacement algorithm has been optimized to a certain degree, further promotion and optimization will result in a complexity promotion (e.g., a substantial promotion) of the data prefetcher and/or the replacement algorithm, and the effect is limited.

At least one embodiment of the present disclosure provides a data prefetching method and a data processing apparatus. The data prefetching method comprises the following steps: receiving an operation instruction, and providing an access request of the operation instruction to a data prefetcher; determining whether an access request of an operation instruction is a load access request or a store access request; responding to the access request of the operation instruction, namely a loading access request, training a loading access prefetching function in the data prefetcher by adopting the loading access request, outputting the loading access prefetching request, and performing loading access prefetching based on the loading access prefetching request; and responding to the fact that the access request of the operation instruction is a storage access request, training a storage access prefetching function in the data prefetcher by adopting the storage access request, outputting the storage access prefetching request, and prefetching the storage access based on the storage access prefetching request. For example, the data prefetching method and the data processing apparatus provided by at least one embodiment of the present disclosure may improve the accuracy of the rule discovered by the data prefetcher.

For example, when the load access prefetch is performed based on the store access prefetch request, the data acquired based on the store access prefetch is written into other cache memories at a lower hierarchy than the first-level cache memory, so that the possibility of occurrence of a load instruction stalled due to cache miss can be reduced, and thus the performance of the data processing apparatus adopting the data prefetch method can be improved.

In the following, a data prefetching method according to at least one embodiment of the present disclosure is described in a non-limiting manner by using several examples and embodiments, and as described below, different features in these specific examples and embodiments may be combined with each other without mutual conflict, so as to obtain new examples and embodiments, which also belong to the protection scope of the present disclosure.

Fig. 3 is a flowchart of a data prefetching method according to at least one embodiment of the disclosure. For example, as shown in fig. 3, the data prefetching method includes the following steps S10 to S30.

Step S10: receiving an operation instruction, and providing an access request of the operation instruction to the data prefetcher.

Step S20: it is determined whether the access request of the operation instruction is a load access request or a store access request.

Step S30: responding to the access request of the operation instruction, namely a loading access request, training a loading access prefetching function in the data prefetcher by adopting the loading access request, outputting the loading access prefetching request, and performing loading access prefetching based on the loading access prefetching request; and responding to the access request of the operation instruction, namely a storage access request, training a storage access prefetching function in the data prefetcher by adopting the storage access request, outputting the storage access prefetching request, and prefetching the storage access based on the storage access prefetching request.

For example, the operation instructions include a load instruction and a store instruction; the access request corresponding to the loading instruction is a loading access request, and the access request corresponding to the storing instruction is a storing access request.

For example, by having the data prefetcher receive load access requests as well as store access requests, the data prefetcher may be trained with more addresses of the target data of the access requests, thereby increasing the likelihood that the data prefetcher will find more data access laws.

For example, when the data prefetcher is trained, the load access prefetching function and the store access prefetching function of the data prefetcher are trained respectively, so that not only can the load access prefetching request and the store access prefetching request be distinguished, but also the accuracy of the rule found by the data prefetcher can be improved.

For example, by distinguishing the load access prefetch request from the store access prefetch request, the data obtained based on the load access prefetch (i.e., the target data of the load access prefetch request) and the data obtained based on the store access prefetch (i.e., the target data of the store access prefetch request) may be processed differently according to the actual application requirements, so that the data prefetch method has the potential of reducing the possibility of a load instruction that is stalled due to cache miss, and thus has the potential of improving the performance of the data processing apparatus 100 (e.g., a central processing unit, CPU) that employs the data prefetch method.

For example, in the case of load access prefetching based on a load access prefetch request, data acquired based on the load access prefetch (i.e., target data of the load access prefetch request) is written into the level one cache memory. For example, in the load access prefetch based on the store access prefetch request, data acquired based on the store access prefetch (i.e., target data of the store access prefetch request) is written in other cache memories of a lower hierarchy than the first-level cache memory. For example, other cache memories of a lower hierarchy than the first level cache memory may be second level cache memories. For another example, other cache memories of a lower hierarchy than the first level cache memory may be last level cache memories (e.g., third level cache memories).

For example, when data reading is performed based on a load access request, the CPU core may directly acquire target data of the load access request from the level one cache memory. For example, when data reading is performed based on a memory access request, the target data of the memory access request may be requested from the second-level cache memory via the miss address buffer memory 112, the second-level cache memory obtains the target data of the memory access request from itself or a memory device of a next level and provides the target data to the miss address buffer memory 112, the miss address buffer memory 112 writes the target data of the memory access request into the first-level cache memory, and then, the CPU core may obtain the target data of the memory access request from the first-level cache memory.

For example, by writing data acquired based on the store access prefetch request into another cache memory (second-level cache memory) at a lower level than the first-level cache memory, it is possible to avoid replacing data already stored in the first cache memory (e.g., data acquired based on the load access prefetch) with data acquired based on the new store access prefetch, and to increase the storage space of data acquired based on the load access prefetch (i.e., target data of the load access prefetch request), whereby it is possible to reduce the possibility of a load instruction that stalls due to cache miss, and to improve the performance of the data processing apparatus 100 that employs this data prefetch method.

For example, the inventors of the present disclosure have found through research that, compared to a method of increasing a hit rate of data prefetched by a data prefetcher (e.g., target data of a load prefetch request), a method of dividing a prefetch request generated by the data prefetcher into a load access prefetch request and a store access prefetch request, and writing data obtained based on the load access prefetch (i.e., target data of the load access prefetch request) into another cache memory (e.g., a second-level cache memory) at a lower level than the first-level cache memory may be better for increasing performance of the data processing apparatus 100.

For example, the data prefetcher includes logic circuitry and a buffer memory (located within the data prefetcher). For example, the data prefetcher may include logic circuitry for at least one of comparing addresses, calculating a difference between addresses, and performing a hash operation. For example, a data prefetcher may include a buffer memory that may be used to store address access laws that have been discovered or are being discovered.

For example, compared to a method of distinguishing between load access prefetch requests and store access prefetch requests by employing two independent data prefetchers (for outputting load access prefetch requests and store access prefetch requests, respectively), by distinguishing between load access prefetch requests and store access prefetch requests using a single data prefetcher, the data prefetch method provided by at least one embodiment of the present disclosure may distinguish between load access prefetch requests and store access prefetch requests with less hardware resources and with less changes to the CPU, thereby reducing the workload and cost of development, register Transfer Level (RTL) implementation and verification.

For example, "using less hardware resources" means that logic circuits and internal caches in the same data prefetcher can be shared to respectively train the load access prefetch function and the store access prefetch function of the data prefetcher, without using a first set of logic circuits and internal caches to train the load access prefetch function of the data prefetcher, and using a second set of logic circuits and internal caches, which are independent of the first set of logic circuits and internal caches to train the store access prefetch function of the data prefetcher. For example, "make fewer changes to the CPU" means that the existing data prefetcher configuration needs to be changed only a small amount, without adding an additional interface for providing load/store access instructions from the CPU to the data prefetcher. For example, in the case where the CPU does not originally support two separate data prefetchers, the development, RTL implementation, and verification overhead involved in re-implementing and optimizing the two data prefetchers is relatively large.

For example, in step S30, memory access prefetching is performed based on the memory access prefetching request, including the following step S31.

Step S31: discarding the memory access prefetch request in response to the target data of the memory access prefetch request already being in the first level cache memory or the second level cache memory; and in response to the target data of the storage access prefetch request not being in the first-level cache memory and the second-level cache memory, obtaining the target data of the storage access prefetch request, and writing the target data of the storage access prefetch request into the second-level cache memory.

For example, the second level cache memory is a cache memory located at a level lower than the first level cache memory.

In one example, in step S31, in response to the target data of the memory access prefetch request not being in the primary cache memory, the memory access prefetch request may be provided directly to the secondary cache memory.

In another example, in step S31, in response to the target data of the memory access prefetch request not being in the primary cache memory, the memory access prefetch request is provided to the secondary cache memory, including the following steps S311 and S312.

Step S311: responding to the condition that the target data of the memory access prefetching request is not in the first-level cache memory, applying for a first memory item from a memory access prefetching sequence (L2 Prefetch Queue) cache memory, and allocating the first memory item to the memory access prefetching request.

Step S312: the memory access prefetch sequence buffer memory 113 supplies a memory access prefetch request to the second level cache memory.

For example, in step S312, after the secondary cache memory can receive the memory access prefetch request, the memory access prefetch sequence buffer memory 113 provides the memory access prefetch request to the secondary cache memory.

For example, memory access Prefetch sequence buffer memory 113 is used to transiently hold memory access Prefetch requests (L2 Prefetch Queue). For example, by applying for a first memory entry to the memory access prefetch sequence buffer memory 113 and allocating the first memory entry to a memory access prefetch request, the likelihood of being discarded in the event that the secondary cache memory cannot immediately receive the memory access prefetch request and adversely affecting the data prefetcher and the data processing apparatus 100 as a result of the memory access prefetch request being discarded may be reduced.

For example, after the memory access prefetch request is provided to the second level cache memory, the memory access prefetch request may be deleted and the first memory entry where the memory access prefetch request is located may be released, so the memory access prefetch request may only dwell briefly in the memory access prefetch sequence buffer memory 113; correspondingly, the number of entries storing the access prefetch sequence buffer memory 113 may also be small (e.g., 4-8), and thus the capacity available to store the access prefetch sequence buffer memory 113 may be small. For example, the first storage item may only hold an address (e.g., a physical address) of the target data of the memory access prefetch request, so that the memory space occupied by the first storage item may be smaller, and the capacity of the memory access prefetch sequence buffer memory 113 may be further reduced.

For example, in step S31, acquiring target data of the storage access prefetch request, and writing the target data of the storage access prefetch request into the second-level cache memory includes: the second level cache memory requests the memory device (e.g., the third level cache memory or the DRAM) located at a next level of the second level cache memory for the target data of the memory access prefetch request, and stores the target data of the memory access prefetch request provided by the memory device of the next level in the second level cache memory.

It should be noted that, for convenience of description, the example in at least one embodiment of the present disclosure is described by taking the data obtained based on the memory access prefetch as an example to be written into the second level cache memory, but those skilled in the art may understand that the performance of the data processing apparatus 100 may also be improved by writing the data obtained based on the memory access prefetch into the third level cache memory or a cache memory at a lower level than the third level cache memory; in this case, the memory access prefetch is performed based on the memory access prefetch request in step S30, and includes, for example, the following step S32. Step S32: discarding the memory access prefetch request in response to the target data of the memory access prefetch request having been in the first level cache memory, the second level cache memory, or the third level cache memory; and in response to the target data of the storage access prefetch request not being in the first-level cache memory, the second-level cache memory and the third-level cache memory, acquiring the target data of the storage access prefetch request and writing the target data of the storage access prefetch request into the third-level cache memory. For example, step S31 may be referred to for a specific implementation of step S32, and is not described herein again.

For example, in step S30, load access prefetching is performed based on the load access prefetching request, including the following step S33.

Step S33: discarding the load access prefetch request in response to the target data of the load access prefetch request already being in the level one cache memory; and in response to the target data of the load access prefetch request not being in the first-level cache memory, acquiring the target data of the load access prefetch request and writing the target data of the load access prefetch request into the first-level cache memory.

For example, in step S33, the target data of the load access prefetch request is acquired and written into the primary cache memory, including the following steps S331 to S334.

Step S331: a second memory entry is applied to the miss address buffer 112 and allocated to the load access prefetch request. For example, the load access prefetch request may include an address of target data of the load access prefetch request. For example, the load access prefetch request may further include a storage address of target data of the load access prefetch request in the level one cache memory.

Step S332: the miss address buffer 112 requests a load from the second level cache memory that accesses the target data of the prefetch request.

Step S333: the second-level cache memory fetches the target data of the load access prefetch request (fetches the target data of the load access prefetch request from the memory device itself or a memory device at a next level of the second-level cache memory) and supplies the target data of the load access prefetch request to the miss address buffer memory 112.

Step S334: the miss address buffer memory 112 writes the target data of the load access prefetch request into the first-level cache memory. For example, after the miss address buffer 112 writes the target data of the load access prefetch request into the first-level cache memory, the load access prefetch request is deleted, and the second memory entry where the load access prefetch request is located is released.

For example, a method of data prefetcher and training of data prefetchers is illustrated below.

For example, a data prefetcher includes a buffer memory that can store a master table. For example, the master table may be single (direct mapped) or multi (associative).

For example, the data prefetcher may include a plurality of prefetcher entries, each of the plurality of prefetcher entries corresponding to an entry of the primary table. For example, each of the plurality of prefetcher entries (i.e., each entry of the master table) is used to track a data stream (correspondingly, an address of data in the data stream is received) to find a rule of the data stream (correspondingly, a found or finding rule is saved), and predict a data address to be accessed next according to the rule, so as to issue a prefetch request.

For example, the data prefetcher includes a first type of prefetcher entry and a second type of prefetcher entry different from the first type of prefetcher entry. For example, a first type of prefetcher entry is used (e.g., only for) tracking data streams corresponding to load access requests and a second type of prefetcher entry is used (e.g., only for) tracking data streams corresponding to store access requests.

For example, a data prefetcher may be tagged (tagged as a first type of prefetcher entry or a second type of prefetcher entry) to include multiple prefetcher entries by the following method.

For example, in response to an access request for an operation instruction corresponding to first data in a data stream tracked by a first portion of the plurality of prefetcher entries being used for a load access, marking each of the first portion of the plurality of prefetcher entries as a prefetcher entry of a first type; in response to an access request for an operation instruction corresponding to first data in a data stream tracked by a second portion of the plurality of prefetcher entries for a storage access, marking each of the second portion of the plurality of prefetcher entries as a prefetcher entry of a second type.

Fig. 4 illustrates a schematic diagram of a single prefetcher entry provided by at least one embodiment of the present disclosure. For example, as shown in FIG. 4, the prefetcher entry includes a load access identification data field 201. For example, load access identification data field (e.g., a single bit data field) 201 may be a load access identification data bit.

For example, marking each of a first portion of the plurality of prefetcher entries as a first class of prefetcher entries includes: setting a value of a load access identification data field included in each of a first portion of the plurality of prefetcher entries to a first value; marking each of a second portion of the plurality of prefetcher entries as a second class of prefetcher entries comprising: the value IS _ LOAD of the LOAD access identification data field included in each of the second portion of the plurality of prefetcher entries IS set to a second value different from the first value. For example, the first value may be 1 and the second value may be zero.

It is noted that, in some examples, the data prefetcher may include a buffer memory that may also store other applicable tables (e.g., for holding other data that assists in implementing the prefetching function). For example, a data prefetcher includes a buffer memory that may store three tables, the first of which holds the currently active data stream; the second table stores the rules detected in the past; the third table holds the prefetch requests to be output, where entries of the first and second tables need to include the IS _ LOAD data field, and entries of the third table do not.

In step S30, the load access request is used to train the load access prefetch function in the data prefetcher, including the following step S34.

Step S34: and responding to the access request of the operation instruction, namely a load access request, allocating the address of the load access request to the first type of prefetcher entries, and training the first type of prefetcher entries by adopting the address of the load access request.

For example, in step S34, the address of the load access request is assigned to a first type prefetcher entry, including: a first prefetcher entry to which the address of the load access request belongs is determined based on the address of the load access request (the address of the target data of the load access request), and the address of the load access request is assigned to the first prefetcher entry to which the address of the load access request belongs.

For example, the addresses for data for load accesses that are located in the same page (e.g., the same 4k page) in the memory space are affiliated with the same first prefetcher entry; correspondingly, the first prefetcher entry receives and tracks the address of the data for load access located in the same page of the storage space, and trains with the address of the data for load access located in the same page of the storage space to find the regularity of the data stream tracked by the first prefetcher entry (i.e., the regularity of the address of the data for load access located in the same page of the storage space).

For example, determining a first prefetcher entry to which an address of a load access request is subordinate based on the address of the load access request includes: an INDEX value INDEX of the address of the load access request is calculated based on the address of the load access request, and a first prefetcher entry to which the address of the load access request belongs is determined based on at least the INDEX value INDEX of the address of the load access request.

For example, each of the plurality of prefetcher entries corresponds to an index value. For example, the index value of the address of the first data in the data stream tracked by the prefetcher entry (i.e., the first address received by the prefetcher entry) may be calculated based on the address of the first data in the data stream tracked by the prefetcher entry, and the index value of the address of the first data received by the prefetcher entry may be used as the index value corresponding to the prefetcher entry.

In one example, determining a first prefetcher entry to which an address of a load access request is subordinate based at least on an INDEX value INDEX of the address of the load access request includes: and taking the first prefetcher entry with the same index value as the address of the load access request in the first prefetcher entries as the first prefetcher entry affiliated to the address of the load access request.

The inventors of the present disclosure noted in their research that the data prefetcher includes a number of prefetcher entries (e.g., tens to thousands) that is much smaller than the address range of the access request (e.g., 2) ³² -2 ⁶⁴ ) In the case of (2), a plurality of index values respectively calculated based on a plurality of addresses that are far apart may be identical to each other; however, since data prefetchers are typically used to discover data access laws that are located within a local address range (e.g., addresses within 4KB of the address range), if far away addresses are used in training for the same data prefetcher entry, the accuracy of the laws discovered by the prefetcher entry and the accuracy of prefetching may be reduced.

In another example, as shown in FIG. 4, each of the plurality of prefetcher entries has an identification data field 202, and the value TAG of the identification data field 202 (e.g., a multi-bit) is a computed hash value based on the address of the first data in the data stream tracked by the prefetcher entry (i.e., the first address received by the prefetcher entry); in this case, determining a first prefetcher entry to which the address of the load access request belongs based at least on the INDEX value INDEX of the address of the load access request includes: taking a first prefetcher entry of the first prefetcher entries having the same index value as the address of the load access request as a candidate first prefetcher entry (a first prefetcher entry to which the address of the load access request may belong); obtaining a value TAG (hash value) of the identification data field 202 of the candidate first prefetcher entry; calculating a hash value of an address of a loading access request; in response to the value TAG (Hash value) of the identification data field 202 of the candidate first prefetcher entry being equal to the Hash value of the address of the load access request, the candidate first prefetcher entry is taken as the first prefetcher entry to which the address of the load access request belongs. For example, by using the first prefetcher entry having the same index value and hash value as the address of the load access request in the first prefetcher entry as the first prefetcher entry affiliated to the address of the load access request, the accuracy of the rule found by the prefetcher entry and the accuracy of prefetching can be improved.

For example, assume that the main table of a prefetcher has 16 prefetcher entries for tracking 16 data streams, respectively; the following method may be used based on the INDEX value INDEX and the hash value TAG of the address a of the target data of the access request of the operation instruction: index = ((a > > 12)% 16); TAG = (a > > 16), where in the above two expressions, "> >" indicates that the address a is shifted to the right (e.g., 12 bits or 16 bits to the right), and "%" indicates a remainder.

In step S30, the memory access prefetch function in the data prefetcher is trained using the memory access request, including the following step S35.

Step S35: the access request in response to the operation instruction is a memory access request, an address of the memory access request (an address of target data of the memory access request) is assigned to the second type prefetcher entry, and the second type prefetcher entry is trained using the address of the memory access request.

For example, in step S35, assigning the address of the memory access request to a second type of prefetcher entry includes: a second prefetcher entry to which the address of the memory access request is subordinate is determined based on the address of the memory access request and the address of the memory access request is assigned to the second prefetcher entry to which the address of the memory access request is subordinate.

For example, the addresses for storing accessed data located in the same page (e.g., the same 4k page) in the memory space are affiliated with the same second prefetcher entry; correspondingly, the second prefetcher entry receives and tracks the address of the data for storage access located in the same page of the storage space, and trains with the address of the data for storage access located in the same page of the storage space to find the regularity of the data stream tracked by the second prefetcher entry (i.e., the access regularity of the address of the data for storage access located in the same page of the storage space). For example, determining the second prefetcher entry to which the address of the memory access request is subordinate based on the address of the memory access request is similar to determining the first prefetcher entry to which the address of the load access request is subordinate based on the address of the load access request, and will not be described in detail herein.

For example, in step S30, outputting the load access prefetch request includes: the method includes outputting a first access prefetch address based on a regularity of a data stream tracked by a first type of prefetcher entry (e.g., each of a first portion of a plurality of prefetcher entries), and identifying an access prefetch request corresponding to the first access prefetch address as a load access prefetch request.

For example, in step S30, outputting the memory access prefetch request includes: and outputting a second access prefetch address based on the rule of each tracked data stream of the second part of the plurality of prefetcher entries, and identifying an access prefetch request corresponding to the second access prefetch address as a storage access prefetch request.

For example, where the prefetcher is a strided prefetcher, each of the plurality of prefetcher entries may be trained to find a jump distance for the data stream tracked by that prefetcher entry. For example, if the data stream tracked by a prefetcher entry has a jump distance of N and the address most recently received by the prefetcher entry (the address used for training) is a, the prefetch address output by the prefetcher entry is a + N. It should be noted that the data prefetching method provided by at least one embodiment of the present disclosure is not limited to using a stride prefetcher, but may also use a stream prefetcher or any other suitable prefetcher (e.g., any prefetcher trained using virtual addresses).

For example, a prefetcher entry may tag an access prefetch request output by the prefetcher entry based on a value IS _ LOAD for a LOAD access identification data field included by the prefetcher entry. For example, in the case that the value of the load access identification data field included in the prefetcher entry is a first value, the access prefetch request output by the prefetcher entry is identified as a load access prefetch request; and in the case that the value IS _ LOAD of the LOAD access identification data field included by the prefetcher entry IS a second value, identifying the access prefetch request output by the prefetcher entry as a storage access prefetch request.

In one example, the data prefetcher may be trained using a physical address, and correspondingly, the prefetch address output by the data prefetcher is also a physical address, so that the prefetch address output by the data prefetcher need not be translated when prefetching.

However, the inventors of the present disclosure have studied and noticed that addresses in two pages with consecutive virtual addresses may be allocated as discontinuous physical addresses, but a data prefetcher trained by using the physical addresses can only detect a read rule in one page, thereby limiting the accuracy and validity of the rule found by the data prefetcher; in addition, a data prefetcher trained using physical addresses cannot generate a physical address for a page that is not trained.

In another example, the data prefetcher may be trained using virtual addresses (e.g., the address of a load access request and the address of a store access request received by the data prefetcher are both virtual addresses), and correspondingly, the prefetched address output by the data prefetcher is also a virtual address (e.g., the first access prefetch address and the second access prefetch address are both virtual addresses); in this case, the data prefetching method further includes: the address of the first access prefetch address and the address of the second access prefetch are translated into physical addresses for load access prefetch and load store prefetch based on the physical addresses of the first access prefetch address and the second access prefetch, respectively.

For example, training the data prefetcher by using the virtual address can enable the data prefetcher to have the capability of finding out the page-crossing reading rule and output the page-crossing prefetching address, so that the accuracy and the effectiveness of the rule detected by the data prefetcher and the performance of the data prefetcher can be further improved.

For example, a specific method for determining whether an access request of an operation instruction is a load access request or a store access request, and receiving the operation instruction and providing the access request of the operation instruction to the data prefetcher may be referred to in the related art, and will not be described herein again.

Fig. 5 is an example of a data prefetching method provided by at least one embodiment of the present disclosure. A data prefetching method provided by at least one embodiment of the present disclosure is exemplified below with reference to fig. 5.

For example, as shown in fig. 5, the data prefetching method includes the following steps S2 to S4.

Step S2: the data prefetcher receives access requests (such as access requests of historical operation instructions) of all (or part of) operation instructions, trains by using virtual addresses of the access requests (such as historical access requests) to acquire data access rules of a CPU core, outputs addresses of target data of the prefetch requests based on the data access rules, and marks the prefetch requests as load access prefetch requests or store access prefetch requests.

For example, in step S2, the access request includes the virtual address of the target data of the access request. For example, the access request may further include other applicable information, such as whether the access request is a load access request or a store access request; whether the access request is a miss to the level one cache memory.

For example, in step S2, the data prefetcher receives a load access request and stores the load access request, so that the data prefetcher can be trained by using more addresses of target data of the access request, thereby increasing the possibility that the data prefetcher finds more data access rules. For example, in step S2, when the data prefetcher is trained, the load access prefetching function and the store access prefetching function of the data prefetcher are respectively trained, for example, the same prefetcher entry is trained only by using the address of the target data of the load access request or the address of the target data of the store access request, so that not only the load access prefetching request and the store access prefetching request can be distinguished, but also the accuracy of the rule found by the data prefetcher can be improved.

For example, the changes to the existing data prefetcher and the CPU core by using a single data prefetcher to distinguish between load access prefetch requests and store access prefetch requests are small, so that the distinguishing between load access prefetch requests and store access prefetch requests by using a single data prefetcher is low in cost, simple and easy to implement.

And step S3: the virtual address of the target data of the prefetch request is translated into a physical address using the address translator 111 (e.g., an address translation pipeline), and then it is checked (e.g., the CPU core checks using the physical address of the prefetch request) whether the target data of the prefetch request is in the level one cache memory.

For example, if the data targeted by the prefetch request is in the level one cache memory, the prefetch request is discarded. Correspondingly, the following steps S4 and steps subsequent to S4 need not be performed for the prefetch request.

And step S4: in response to the prefetch request being a load access prefetch request, applying for a second memory entry from the miss address buffer 112 and allocating the second memory entry to the load access prefetch request, if the target data of the prefetch request is not in the first level cache memory; and responding to the memory access prefetching request as a memory access prefetching request, applying for a first memory item from the memory access prefetching sequence buffer, and allocating the first memory item to the memory access prefetching request.

For example, as shown in fig. 5, in response to the prefetch request being a load access prefetch request and the target data of the prefetch request not being in the level one cache memory, the following steps S5 to S7 are performed (correspondingly, the data prefetch method further includes steps S5 to S7).

Step S5: the miss address buffer 112 requests a load from the second level cache memory that accesses the target data of the prefetch request.

Step S6: the second-level cache memory fetches the target data of the load access prefetch request (from itself or a memory device located at the next level of the second-level cache memory) and supplies the target data of the load access prefetch request to the miss address buffer memory 112.

Step S7: the miss address buffer memory 112 writes the target data of the load access prefetch request into the level one buffer.

For example, as shown in fig. 5, in response to the prefetch request being a store access prefetch request and the target data of the prefetch request not being in the level one cache memory, the following step S5 '-step S7' (correspondingly, the data prefetch method further includes step S5 '-step S7').

Step S5': the memory access prefetch sequence buffer provides memory access prefetch requests to the secondary cache memory.

For example, by providing a memory access prefetch sequence buffer between the data prefetcher and the secondary cache memory, the memory access prefetch request sequence may be cached, and the memory prefetch request may be sent to the secondary cache memory when the secondary cache memory may receive the memory prefetch request, thereby reducing the possibility of being discarded in the event that the secondary cache memory cannot immediately accept the memory access prefetch request and adversely affecting the data prefetcher and the data processing apparatus 100 as a result of the memory access prefetch request being discarded. For example, the capacity of the memory access prefetch sequence buffer may be set according to the actual application requirements. For example, in the event that the memory access prefetch sequence buffer does not have free memory space, if a new memory access prefetch request comes, the memory access prefetch request may be discarded or a memory access prefetch request already stored in the memory access prefetch sequence buffer may be evicted.

Step S6', the secondary cache memory checks whether the target data of the storage access pre-fetching request is stored in the secondary cache memory; if so, the second-level cache memory discards the memory access prefetch request; if not, the second-level cache memory requests the memory device located at the next level of the second-level cache memory to store the target data of the access prefetch request.

And S7', the secondary cache memory acquires the target data of the storage access prefetching request from the storage device positioned at the next level of the secondary cache memory and stores the target data of the storage access prefetching request in the secondary cache memory.

For example, by storing the target data of the store access prefetch request in the secondary cache memory, more useful data (e.g., the target data of the load access prefetch request) can be stored in the primary cache memory, so that the probability that the data of the load access request is not in the primary cache memory can be reduced, and the efficiency of the data prefetcher and the overall performance of the system adopting the data prefetcher and the prefetching method can be improved.

It should be noted that, in the process of training and prefetching the data prefetcher, the CPU core reads data to execute related operations. Correspondingly, fig. 5 also shows a part of the flow of the CPU core reading data. For example, the flow of the CPU core reading data includes the following step S1.

Step S1: the CPU core outputs a virtual address of target data of the access request to an address translator (e.g., address translation pipeline) 111, which translates the virtual address into a physical address, and then, the CPU checks to see whether the target data of the access request is in the level one cache memory.

For example, if the target data of the access request is in the first-level cache memory, the target data of the access request is fetched from the first-level cache memory; if the target data of the access request is not in the first-level cache memory, the target data of the access request is obtained via the missed address buffer memory 112, and the target data of the access request is written into the first-level cache memory.

At least one embodiment of the present disclosure also provides a data processing apparatus 100. Fig. 6 is an exemplary block diagram of a data processing apparatus 100 provided by at least one embodiment of the present disclosure.

As shown in fig. 6, the data processing apparatus 100 includes a data prefetcher. The data prefetcher is configured to: receiving an operation instruction, and determining whether an access request of the operation instruction is a load access request or a store access request; responding to the access request of the operation instruction, namely a loading access request, training a loading access prefetching function in the data prefetcher by adopting the loading access request, outputting the loading access prefetching request, and performing loading access prefetching based on the loading access prefetching request; and responding to the access request of the operation instruction, namely a storage access request, training a storage access prefetching function in the data prefetcher by adopting the storage access request, outputting the storage access prefetching request, and prefetching the storage access based on the storage access prefetching request.

Fig. 7 is a schematic diagram of one example of the data processing apparatus 100 shown in fig. 6.

For example, as shown in fig. 6 and 7, the data processing apparatus 100 further includes a primary buffer and a secondary buffer memory. Performing a memory access prefetch based on a memory access prefetch request, comprising: discarding the memory access prefetch request in response to the target data of the memory access prefetch request already being in the primary cache or the secondary cache memory; and responding to the condition that the target data of the storage access prefetching request is not in the first-level cache and the second-level cache, acquiring the target data of the storage access prefetching request, and writing the target data of the storage access prefetching request into the second-level cache.

For example, discarding a memory access prefetch request in response to the data targeted by the memory access prefetch request already being in the level one cache or the level two cache memory comprises: discarding the memory access prefetch request in response to the target data of the memory access prefetch request already being in the level one buffer; providing the memory access prefetch request to the secondary cache memory in response to the target data of the memory access prefetch request not being in the primary cache; and discarding the memory access prefetch request in response to the target data of the memory access prefetch request already being in the secondary cache memory.

For example, as shown in fig. 6 and 7, the data processing apparatus 100 further includes a memory access prefetch sequence buffer. Providing the memory access prefetch request to the secondary cache memory in response to the data targeted by the memory access prefetch request not being in the primary cache memory, comprising: responding that the target data of the storage access prefetching request is not in the first-level buffer, applying for a first storage item from the storage access prefetching sequence buffer, and distributing the first storage item to the storage access prefetching request; and a memory access prefetch sequence buffer to provide a memory access prefetch request to the secondary cache memory.

For example, the data prefetcher includes a first type of prefetcher entry and a second type of prefetcher entry different from the first type of prefetcher entry; responding to the access request of the operation instruction, namely a load access request, and training a load access prefetching function in the data prefetcher by adopting the load access request, wherein the method comprises the following steps: responding to the access request of the operation instruction, namely a loading access request, allocating the address of the loading access request to a first type of prefetcher entry, and training the first type of prefetcher entry by adopting the address of the loading access request; and in response to the access request of the second operation instruction being a storage access request, training a storage access prefetching function in the data prefetcher with the storage access request, comprising: and responding to the access request of the operation instruction, namely a storage access request, allocating the address of the storage access request to the second type of prefetcher entry, and training the second type of prefetcher entry by adopting the address of the storage access request.

For example, the data prefetcher includes a plurality of prefetcher entries; each of the first portion of the plurality of prefetcher entries includes a first value in a load access identification data field (e.g., the load access identification data field is a load access identification data bit); each included load access of the second portion of the plurality of prefetcher entries identifies a value in the data field as a second value different from the first value; the first value is used to identify each of a first portion of the plurality of prefetcher entries as a first class of prefetcher entries; and the second value is used to identify each of the second portion of the plurality of prefetcher entries as a second class of prefetcher entries.

For example, outputting a load access prefetch request includes: outputting a first access prefetch address based on the rule of each tracked data stream of a first part of the plurality of prefetcher entries, and identifying an access prefetch request corresponding to the first access prefetch address as a load access prefetch request; outputting a memory access prefetch request comprising: and outputting a second access prefetch address based on the rule of each tracked data stream of the second part of the plurality of prefetcher entries, and identifying an access prefetch request corresponding to the second access prefetch address as a storage access prefetch request.

For example, the data processing apparatus 100 further includes an address translator 111. For example, the address of the load access request, the address of the store access request, the first access prefetch address, and the second access prefetch address are all virtual addresses: the address translator 111 is configured to translate the first access prefetch address and the second access prefetch address into physical addresses for load access prefetching and load store prefetching based on the first access prefetch address and the second access prefetch physical address, respectively.

It should be noted that, for a specific implementation of the data processing apparatus 100, reference may be made to relevant steps of the data prefetching method, and details are not described herein again.

Although the present disclosure has been described in detail hereinabove with respect to general illustrations and specific embodiments, it will be apparent to those skilled in the art that modifications or improvements may be made thereto based on the embodiments of the disclosure. Accordingly, such modifications and improvements are intended to be within the scope of this disclosure, as claimed.

The above description is intended to be exemplary of the present disclosure, and not to limit the scope of the present disclosure, which is defined by the claims appended hereto.

Claims

1. A method of data prefetching, comprising:

receiving an operation instruction, and providing an access request of the operation instruction to a data prefetcher;

determining whether an access request of the operation instruction is a load access request or a store access request;

responding to the access request of the operating instruction, namely the loading access request, training a loading access prefetching function in the data prefetcher by adopting the loading access request, outputting the loading access prefetching request, and performing loading access prefetching based on the loading access prefetching request; and

and responding to the access request of the operation instruction, namely the storage access request, training a storage access prefetching function in the data prefetcher by adopting the storage access request, outputting the storage access prefetching request, and prefetching the storage access based on the storage access prefetching request.

2. The data prefetching method of claim 1, wherein the performing the memory access prefetch based on the memory access prefetch request comprises:

discarding the memory access prefetch request in response to target data of the memory access prefetch request already being in a first level cache memory or a second level cache memory; and

and responding to the condition that the target data of the storage access prefetching request is not in the first-level cache memory and the second-level cache memory, acquiring the target data of the storage access prefetching request, and writing the target data of the storage access prefetching request into the second-level cache memory.

3. The data prefetching method of claim 2, wherein the discarding the memory access prefetch request in response to the target data of the memory access prefetch request being already in a level one cache memory or a level two cache memory comprises:

discarding the memory access prefetch request in response to target data of the memory access prefetch request already being in the level one cache memory;

providing the memory access prefetch request to the secondary cache memory in response to target data of the memory access prefetch request not being in the primary cache memory; and

discarding the memory access prefetch request in response to the data targeted by the memory access prefetch request already being in the secondary cache memory.

4. The data prefetching method of claim 3, wherein the providing the memory access prefetch request to the secondary cache memory in response to the target data of the memory access prefetch request not being in the primary cache memory comprises:

responding to the target data of the storage access prefetching request not in the first-level cache memory, applying for a first storage item from a storage access prefetching sequence cache memory, and distributing the first storage item to the storage access prefetching request; and

the memory access prefetch sequence buffer memory provides the memory access prefetch request to the secondary cache memory.

5. The data prefetching method according to claim 2, wherein the obtaining and writing the target data of the memory access prefetching request into the secondary cache memory comprises:

the second-level cache memory requests target data of the memory access prefetch request from a memory device located at a next level of the second-level cache memory, and stores the target data of the memory access prefetch request provided by the memory device at the next level in the second-level cache memory.

6. The data prefetching method of claim 1, wherein the performing the load access prefetch based on the load access prefetch request comprises:

discarding the load access prefetch request in response to target data of the load access prefetch request already being in a level one cache memory; and

and responding to the condition that the target data of the load access pre-fetching request is not in the first-level cache memory, acquiring the target data of the load access pre-fetching request, and writing the target data of the load access pre-fetching request into the first-level cache memory.

7. The data prefetching method according to claim 6, wherein the obtaining and writing the target data of the load access prefetching request into the level one cache memory comprises:

applying for a second memory entry to the miss address buffer and allocating said second memory entry to said load access prefetch request;

the miss address buffer memory requests the target data of the load access prefetch request from a second-level cache memory;

the second-level cache memory acquires the target data of the load access prefetching request and provides the target data of the load access prefetching request to the miss address cache memory; and

the miss address buffer memory writes target data of the load access prefetch request into the level one cache memory.

8. The data prefetching method of any of claims 1-7, wherein the data prefetcher comprises a first class of prefetcher entries and a second class of prefetcher entries different from the first class of prefetcher entries;

the access request responding to the operation instruction is the load access request, and the training of the load access prefetching function in the data prefetcher by adopting the load access request comprises the following steps: responding to the access request of the operation instruction, namely the load access request, allocating the address of the load access request to the first type of prefetcher entry, and training the first type of prefetcher entry by adopting the address of the load access request; and

the access request responding to the operation instruction is the storage access request, and the training of the storage access prefetching function in the data prefetcher by adopting the storage access request comprises the following steps: responding to the access request of the operation instruction is the storage access request, allocating the address of the storage access request to the second type prefetcher entry, and training the second type prefetcher entry by adopting the address of the storage access request.

9. The data prefetching method of claim 8, wherein the first class of prefetcher entries comprises a first plurality of prefetcher entries and the second class of prefetcher entries comprises a second plurality of prefetcher entries;

said assigning an address of said load access request to said first class prefetcher entry comprising: determining a first prefetcher entry to which the address of the load access request belongs based on the address of the load access request, and allocating the address of the load access request to the first prefetcher entry to which the address of the load access request belongs; and

said assigning the address of the memory access request to the second class of prefetcher entries comprising: determining a second prefetcher entry to which the address of the memory access request belongs based on the address of the memory access request, and assigning the address of the memory access request to the second prefetcher entry to which the address of the memory access request belongs.

10. The data prefetching method of claim 8, wherein the data prefetcher comprises a plurality of prefetcher entries; and

the data prefetching method further comprises:

marking each of a first portion of the plurality of prefetcher entries as the first class of prefetcher entries in response to an access request for an operating instruction corresponding to a first data in a data stream tracked by the first portion of the plurality of prefetcher entries being for a load access; and

marking each of the second portion of the plurality of prefetcher entries as the second class of prefetcher entries in response to an access request for a storage access by an operating instruction corresponding to a first data in a data stream tracked by the second portion of the plurality of prefetcher entries.

11. The data prefetching method of claim 10, wherein said marking each of the first portion of the plurality of prefetcher entries as the first class of prefetcher entries comprises: setting a value of a load access identification data field included in each of a first portion of the plurality of prefetcher entries to a first value; and

said marking each of a second portion of said plurality of prefetcher entries as said second class of prefetcher entries comprises: setting a value of a load access identification data field included in each of a second portion of the plurality of prefetcher entries to a second value different from the first value.

12. The data prefetching method of claim 10, wherein said outputting a load access prefetch request comprises: outputting a first access prefetch address based on the regularity of each tracked data stream of a first part of the plurality of prefetcher entries, and identifying an access prefetch request corresponding to the first access prefetch address as the load access prefetch request; and

the output memory access prefetch request comprising: outputting a second access prefetch address based on the regularity of each tracked data stream of a second portion of the plurality of prefetcher entries, and identifying an access prefetch request corresponding to the second access prefetch address as the storage access prefetch request.

13. The data prefetching method of claim 12, wherein the address of the load access request, the address of the store access request, the first access prefetching address, and the second access prefetching address are virtual addresses: and

the data prefetching method further comprises: translating the addresses of the first access prefetch and the second access prefetch into physical addresses to perform the load access prefetch and the store access prefetch based on the physical addresses of the first access prefetch address and the second access prefetch, respectively.

14. A data processing apparatus comprising a data prefetcher, wherein the data prefetcher is configured to:

receiving an operation instruction, and determining whether an access request of the operation instruction is a load access request or a store access request;

responding to the access request of the operation instruction, namely the load access request, training a load access prefetching function in the data prefetcher by adopting the load access request, outputting the load access prefetching request, and performing load access prefetching based on the load access prefetching request; and

15. The data processing apparatus according to claim 14, further comprising a level one cache memory and a level two cache memory,

wherein said performing said memory access prefetch based on said memory access prefetch request comprises:

discarding the memory access prefetch request in response to target data of the memory access prefetch request already being in the first level cache memory or the second level cache memory; and

and responding to the condition that the target data of the storage access pre-fetching request is not in the first-level cache memory and the second-level cache memory, acquiring the target data of the storage access pre-fetching request, and writing the target data of the storage access pre-fetching request into the second-level cache memory.

16. The data processing apparatus of claim 15, wherein said discarding the memory access prefetch request in response to the target data of the memory access prefetch request having been in the level one cache memory or the level two cache memory comprises:

17. The data processing apparatus according to claim 16, further comprising a store access prefetch sequence buffer memory,

wherein said providing the memory access prefetch request to the secondary cache memory in response to the target data of the memory access prefetch request not being in the primary cache memory comprises:

responding to the condition that the target data of the storage access prefetching request is not in the level-one cache memory, applying for a first storage item from the storage access prefetching sequence cache memory, and allocating the first storage item to the storage access prefetching request; and

18. The data processing apparatus according to any of claims 14-17, wherein the data prefetcher comprises a first class of prefetcher entries and a second class of prefetcher entries different from the first class of prefetcher entries;

the access request responding to the operation instruction is the load access request, and the training of the load access prefetching function in the data prefetcher by adopting the load access request comprises the following steps: responding to the access request of the operation instruction, namely the loading access request, allocating the address of the loading access request to the first type of prefetcher entries, and training the first type of prefetcher entries by adopting the address of the loading access request; and

the access request responding to the operation instruction is the storage access request, and the training of the storage access prefetching function in the data prefetcher by adopting the storage access request comprises the following steps: responding to the access request of the operation instruction is the storage access request, allocating the address of the storage access request to the second type of prefetcher entries, and training the second type of prefetcher entries by adopting the address of the storage access request.

19. The data processing apparatus according to claim 18, wherein the data prefetcher comprises a plurality of prefetcher entries;

each of the first portion of the plurality of prefetcher entries includes a load access identifying a value in the data field as a first value;

each of the second portion of the plurality of prefetcher entries includes a load access identifying data field having a value that is a second value different from the first value;

the first value is used to identify each of a first portion of the plurality of prefetcher entries as the first class of prefetcher entries; and

the second value is used to identify each of the second portion of the plurality of prefetcher entries as the second class of prefetcher entries.

20. A data processing apparatus as claimed in claim 19, wherein the load access identification data field is a load access identification data bit.

21. The data processing apparatus of claim 19, wherein said outputting a load access prefetch request comprises: outputting a first access prefetch address based on the regularity of each tracked data stream of a first part of the plurality of prefetcher entries and identifying an access prefetch request corresponding to the first access prefetch address as the load access prefetch request; and

22. The data processing apparatus according to claim 21, further comprising an address translator,

wherein the address of the load access request, the address of the store access request, the first access prefetch address, and the second access prefetch address are virtual addresses: and

the address translator is configured to translate the first access prefetch address and the second access prefetch address into physical addresses for the load access prefetch and the store access prefetch based on the first access prefetch address and the second access prefetch physical addresses, respectively.