CN114238167B

CN114238167B - Information prefetching method, processor and electronic equipment

Info

Publication number: CN114238167B
Application number: CN202111529899.2A
Authority: CN
Inventors: 胡世文
Original assignee: Haiguang Information Technology Co Ltd
Current assignee: Haiguang Information Technology Co Ltd
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-09-09
Anticipated expiration: 2041-12-14
Also published as: CN114238167A

Abstract

Provided are an information prefetching method, a processor and an electronic device. The method is used for a processor. The processor includes a first level cache space, a first page table walker, a second page table walker, and at least one predetermined cache space. The at least one preset cache space comprises a target preset cache space, the first page table walker and the target preset cache space are arranged at the same path level, and the first page table walker is in communication connection with the target preset cache space. The information prefetching method comprises the following steps: responding to a first page table walker selected from a first page table walker and a second page table walker based on a preset rule to execute an address translation operation to obtain a physical address, and sending a pre-fetching request to a target preset cache space by the first page table walker; and responding to the prefetching request, and performing information prefetching operation on the target preset cache space based on the physical address. The method can reduce address translation time delay and simultaneously realize the data/instruction prefetching function, reduce the time delay of read-write operation and improve the overall performance of the system.

Description

Information prefetching method, processor and electronic equipment

Technical Field

The embodiment of the disclosure relates to an information prefetching method, a processor and an electronic device.

Background

In the field of computer technology, one of the important roles of computer operating systems is memory management. In a multi-process operating system, each process has its own Virtual Address space, and any Virtual Address (Virtual Address) within the system's specified range can be used. The address used by a Central Processing Unit (CPU) to execute an application is a virtual address. When the operating system allocates memory to a process, the used virtual Address needs to be mapped to a Physical Address (Physical Address), and the Physical Address is the true Physical memory access Address. By the mode of dividing the address into the virtual address and the physical address, the program compiling can be simplified, a compiler compiles the program based on a continuous and sufficient virtual address space, and the virtual addresses of different processes are distributed to different physical addresses, so that the system can run a plurality of processes simultaneously, and the running efficiency of the whole computer system is improved. In addition, because the application program can use but cannot change the address translation, one process cannot access the memory content of another process, thereby increasing the safety of the system.

Disclosure of Invention

At least one embodiment of the present disclosure provides an information prefetching method, which is used for a processor, where the processor includes a first-level cache space, a first page table walker, a second page table walker, and at least one preset cache space, where the first-level cache space and the at least one preset cache space are sequentially communicatively connected to form a communication link, the at least one preset cache space includes a target preset cache space, the first page table walker and the target preset cache space are set at a same path level, the first page table walker is communicatively connected to the target preset cache space, the second page table walker and the first-level cache space are set at a same path level, the second page table walker is communicatively connected to the first-level cache space, and the method includes: responding to a first page table walker selected from the first page table walker and the second page table walker based on a preset rule to execute an address translation operation to obtain a physical address, and sending a pre-fetching request to the target preset cache space by the first page table walker, wherein the pre-fetching request comprises the physical address; and responding to the prefetching request, and performing information prefetching operation on the target preset cache space based on the physical address.

For example, in a method provided in an embodiment of the present disclosure, a processor further includes a processor core, where the performing, by the target preset cache space, the information prefetching operation based on the physical address includes: determining a prefetch cache space, wherein the prefetch cache space is at least one of the first level cache space and the at least one preset cache space; based on the physical address, the target preset cache space obtains target information correspondingly stored by the physical address; and the target preset cache space sends the target information to the pre-fetching cache space.

For example, in a method provided by an embodiment of the present disclosure, determining the prefetch cache space includes: acquiring a preset identifier, wherein the preset identifier represents the level information of a cache space, and the preset identifier is stored in a specified storage space or carried in the prefetch request; and determining the pre-fetching cache space according to the preset identification.

For example, in the method provided in an embodiment of the present disclosure, the determining the prefetch cache space according to the preset identifier includes: in response to the target information being an instruction type, determining the prefetch cache space to be the first level instruction space; and determining the prefetch cache space as the first-level data space in response to the target information being a data type.

For example, in a method provided in an embodiment of the present disclosure, based on the physical address, the obtaining, by the target preset cache space, the target information stored corresponding to the physical address includes: and acquiring the target information in a step-by-step query mode from the path from the target preset cache space to the memory based on the physical address.

For example, in a method provided by an embodiment of the present disclosure, sending, by the target preset cache space, the target information to the prefetch cache space includes: and sending the target information to the pre-fetching cache space in a step-by-step transfer mode from the target preset cache space to the pre-fetching cache space.

For example, the method provided in an embodiment of the present disclosure further includes: and the target preset cache space sends the physical address to the processor core.

For example, in a method provided by an embodiment of the present disclosure, sending, by the target preset cache space, the physical address to the processor core includes: and the target preset cache space sends the physical address to the processor core in a step-by-step transfer mode.

For example, in a method provided by an embodiment of the present disclosure, the processor further includes a page table entry cache space, the processor core and the page table entry cache space are set at the same path level, and the processor core is communicatively connected to the page table entry cache space, and the method further includes: generating, with the processor core, the address translation request in response to an absence of page table entry data in the page table entry cache space required for address translation.

For example, a method provided in an embodiment of the present disclosure further includes: in response to selecting the first page table walker from the first page table walker and the second page table walker to perform the address translation operation based on the preset rule, the first page table walker performs the address translation operation in response to the address translation request to obtain the physical address.

For example, in a method provided by an embodiment of the present disclosure, the performing, by the first page table walker, the address translation operation in response to the address translation request to obtain the physical address includes: and the first page table walker receives the address translation request generated by the processor core, acquires page table entry data from a memory through the target preset cache space, and performs the address translation operation by using the page table entry data to acquire the physical address.

For example, in a method provided by an embodiment of the present disclosure, the acquiring, by the first page table walker, the page table entry data from the memory via the target preset cache space, and performing the address translation operation using the page table entry data includes: and the first page table traversal device acquires the page table entry data in a step-by-step query mode from the path from the target preset cache space to the memory according to the address translation request, and translates the page table entry data to acquire the physical address.

For example, the method provided in an embodiment of the present disclosure further includes: in response to the first page table walker being determined to perform the address translation operation, the first page table walker receives the address translation request forwarded by the second page table walker.

For example, in a method provided by an embodiment of the present disclosure, the address translation request includes translation information, and the translation information includes: address translation request sequence number, virtual address value to be translated, initial address of highest level page table.

For example, in the method provided by an embodiment of the present disclosure, the translation information further includes a request type identifier, where the request type identifier indicates that the target information that the physical address corresponds to stores is an instruction type or a data type.

For example, the method provided in an embodiment of the present disclosure further includes: the processor core determines a cache space for storing the target information according to the storage states of the first-level cache space and the at least one preset cache space, and enables the address translation request to carry level information of the cache space for storing the target information in a preset identification mode; and the first page table walker analyzes the preset identification, and the prefetch request carries the preset identification.

For example, the method provided in an embodiment of the present disclosure further includes: in response to selecting the second page table walker from the first page table walker and the second page table walker to perform the address translation operation based on the preset rule, the second page table walker performs the address translation operation in response to the address translation request to obtain the physical address.

For example, in a method provided by an embodiment of the present disclosure, the performing, by the second page table walker in response to the address translation request, the address translation operation to obtain the physical address includes: and the second page table walker receives the address translation request generated by the processor core, acquires page table entry data from a memory according to the address translation request, and performs address translation by using the page table entry data to acquire the physical address.

For example, in a method provided in an embodiment of the present disclosure, the preset rule includes: determining that the address translation operation is executed by the first page table walker when page table entry data required for address translation does not exist in the page table entry cache space or a page table level corresponding to the page table entry data required for address translation in the page table entry cache space is greater than a threshold.

For example, in a method provided in an embodiment of the present disclosure, the processor further includes a request buffer, the request buffer is set at the same path level as the first page table walker, the request buffer is communicatively connected to the first page table walker and communicatively connected to the target preset buffer space, and the method further includes: and sending a pending address translation request queue to the request cache region by using the processor core.

For example, in the method provided in an embodiment of the present disclosure, the at least one preset cache space includes a second-level cache space to an nth-level cache space, N is an integer greater than 2, the nth-level cache space is closest to the memory and farthest from the processor core, and any one of the second-level cache space to the nth-level cache space is used as the target preset cache space.

For example, in the method provided in an embodiment of the present disclosure, the nth level cache space is a shared type cache space, and the nth level cache space is used as the target preset cache space.

For example, in the method provided in an embodiment of the present disclosure, the second-level cache space is a private-type or shared-type cache space, and the second-level cache space is used as the target preset cache space.

The present disclosure provides a processor, including a first-level cache space, a first page table walker, a second page table walker and at least one preset cache space, wherein the first-level cache space and the at least one preset cache space are in communication connection according to a sequence to form a communication link, the at least one preset cache space includes a target preset cache space, the first page table walker and the target preset cache space are arranged at a same path level, the first page table walker and the target preset cache space are in communication connection, the second page table walker and the first-level cache space are arranged at a same path level, the second page table walker and the first-level cache space are in communication connection, and the first page table walker is configured to: responding to a first page table walker selected from the first page table walker and the second page table walker based on a preset rule to execute an address translation operation to obtain a physical address, and sending a pre-fetching request to the target preset cache space, wherein the pre-fetching request comprises the physical address; the target preset cache space is configured as follows: in response to the prefetch request, performing an information prefetch operation based on the physical address.

For example, in the processor provided in an embodiment of the present disclosure, the target preset cache space is further configured to determine a prefetch cache space, obtain target information, which is stored corresponding to the physical address, based on the physical address, and send the target information to the prefetch cache space, where the prefetch cache space is at least one of the first-level cache space and the at least one preset cache space.

At least one embodiment of the present disclosure also provides an electronic device including the processor provided in any one of the embodiments of the present disclosure.

Drawings

FIG. 1 is a schematic diagram of an address translation process;

FIG. 2 is a schematic diagram of an architecture of a multi-core processor;

FIG. 3 is a data flow diagram illustrating address translation using the processor shown in FIG. 2;

FIG. 4 is a schematic diagram of a process for address translation and requesting data using the processor shown in FIG. 2;

FIG. 5 is an architecture diagram of a processor according to some embodiments of the present disclosure;

FIG. 6 is a flow chart illustrating a method for prefetching information according to some embodiments of the present disclosure;

FIG. 7 is a schematic diagram illustrating a process of performing address translation and requesting data by using a processor according to an embodiment of the present disclosure;

FIG. 8 is an exemplary flowchart of step S20 in FIG. 6;

fig. 9 is an exemplary flowchart of step S21 in fig. 8;

FIG. 10 is an exemplary flowchart of step S212 in FIG. 9;

FIG. 11 is a flow chart illustrating another information prefetching method according to some embodiments of the present disclosure;

FIG. 12 is a flow chart illustrating another information prefetching method according to some embodiments of the present disclosure;

FIG. 13 is a flow chart illustrating another information prefetching method according to some embodiments of the present disclosure;

FIG. 14 is a flow chart illustrating another information prefetching method according to some embodiments of the present disclosure;

fig. 15 is a schematic block diagram of an electronic device provided by some embodiments of the present disclosure;

fig. 16 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described below clearly and completely with reference to the accompanying drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.

Unless otherwise defined, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

Fig. 1 is a schematic diagram of an address translation flow, which shows an address translation process of a four-level page table. As shown in FIG. 1, a virtual address is divided into several segments, represented as EXT, OFFSET _ lvl4, OFFSET _ lvl3, OFFSET _ lvl2, OFFSET _ lvl1, OFFSET _ pg, respectively. In this example, the upper virtual address segment EXT is not used. The virtual address segments OFFSET _ lvl4, OFFSET _ lvl3, OFFSET _ lvl2 and OFFSET _ lvl1 respectively represent OFFSET values of the four-level page table, that is, the virtual address segment OFFSET _ lvl4 represents OFFSET values of the fourth-level page table, the virtual address segment OFFSET _ lvl3 represents OFFSET values of the third-level page table, the virtual address segment OFFSET _ lvl2 represents OFFSET values of the second-level page table, and the virtual address segment OFFSET _ lvl1 represents OFFSET values of the first-level page table.

The initial address of the highest level page table (i.e. the fourth level page table) is stored in the architectural register REG _ pt, the content of which is set by the operating system and cannot be changed by the application program. In the second stage page table, the third stage page table and the fourth stage page table, the starting address of the next stage page table is stored in the page table entry of each stage page table. The first-level Page Table Entry (PTE) stores the high-order bits of the physical address of the corresponding memory Page, and the high-order bits are merged with the virtual address OFFSET (OFFSET _ pg) to obtain the physical address corresponding to the virtual address. Therefore, the starting address of the next-level page table is obtained step by step in such a way, and finally the first-level Page Table Entry (PTE) can be obtained, so that the corresponding physical address is further obtained, and the translation from the virtual address to the physical address is realized.

It should be noted that, although fig. 1 shows 4-level page tables, the embodiments of the present disclosure are not limited thereto, any number of multi-level page tables may be used, for example, 2-level page tables, 3-level page tables, 5-level page tables, etc., and a single-level page table may also be used, which may be determined by practical requirements, and the embodiments of the present disclosure are not limited thereto. For example, one system may support pages of different sizes, each page size being represented by the number of bits of the virtual address OFFSET _ pg. The larger the page, the fewer the number of address translation stages required in the same system.

FIG. 2 is a schematic diagram of an architecture of a multicore processor. For example, as shown in fig. 2, the processor has 4 processor cores (CPU cores). Meanwhile, the processor also has multiple levels of Cache, such as a first Level Cache (L1 Cache), a second Level Cache (L2 Cache), and a Last Level Cache (LLC). In this example, the last level Cache is actually a third level Cache (L3 Cache). Of course, the embodiments of the present disclosure are not limited thereto, and the processor may have any number of multi-level caches, and thus the last-level cache may also be any level of cache, which may depend on actual needs.

For example, in this example, the last level cache is shared by multiple processor cores and the second level cache is private to the individual processor cores. That is, multiple processor cores share a last level cache, while each processor core is individually provided with a dedicated second level cache. The last level cache and the second level cache are used for storing instructions and data, and the last level cache is connected with the memory. It should be noted that, in other examples, the second-level cache may also be a shared-type cache, which is not limited by the embodiments of the present disclosure.

For example, a dedicated first level cache is provided for each processor core individually, with the first level cache being provided within the processor core. For example, the first level cache may include a first level instruction cache (L1I cache) and a first level data cache (L1D cache) for caching instructions and data, respectively. The processor also comprises a memory, and the processor core realizes instruction transmission and data reading through a data caching mechanism of the multi-level cache and the memory.

For example, a Translation Lookaside Buffer (TLB) is provided separately for each processor core, which may include an instruction-specific Translation Lookaside Buffer (ITLB) and a data-specific Translation Lookaside Buffer (DTLB). Both ITLB and DTLB are provided within the processor core.

Address translation is a very time-consuming process, and for multi-level page tables, it usually requires multiple accesses to memory to obtain the corresponding physical address. Taking the 4-level page table shown in fig. 1 as an example, the memory needs to be accessed 4 times to obtain the corresponding physical address. Therefore, to save address translation time and improve computer system performance, a TLB (e.g., including ITLB and DTLB) may be provided in a processor core to deposit previously used first level Page Table Entries (PTEs). When address translation is needed, whether the needed PTE exists is inquired in the TLB firstly, and if the address translation is hit, the corresponding physical address can be obtained immediately. Similar to the CPU cache architecture, the TLB may also have various architectures, such as Fully Associative (full Associative), group Associative (Set Associative), direct index (direct Indexed), and the like. The TLB structure can also be a multi-level structure, the size of the lowest level of TLB is minimum and the speed is fastest, and when the lowest level of TLB does not hit, the next level of TLB is searched.

Although TLBs can reduce the latency of many address translations, they do not avoid accessing the page table for address translation during program execution. To reduce the time required for translation operations, a Hardware Page Table Walker (PTW) is typically provided separately for the processor core, and is provided inside the processor core. By using a hardware page table walker, the multi-level page table may be traversed to obtain the final memory page physical address.

The L1I cache and the L1D cache are accessed by using physical addresses (physical extended, virtual tagged manner), and the second-level cache, the last-level cache, and the memory are also accessed by using physical addresses. Therefore, before accessing data, address translation needs to be performed by the ITLB or DTLB. Consistent with normal data read requests, read requests from the hardware page table walker may reach the memory through the first level cache, the second level cache, and the last level cache at the farthest. If the data requested by the hardware page table walker is present in a certain level of cache, the cache returns the data and does not pass the hardware page table walker's request to the lower level cache/memory.

FIG. 3 is a data flow diagram for address translation using the processor shown in FIG. 2. As shown in fig. 3, in a possible case where the TLB does not hit and thus the memory needs to be accessed for address translation, the memory needs to be accessed 4 times to obtain the physical address of the final memory page. Under novel application scenarios such as big data, cloud computing, Artificial Intelligence (AI), etc., very big instruction and data space are often used simultaneously, and the number of hot spot instruction sections and data sections is many and mutually dispersed. Thus, these new applications tend to have more Cache misses (Cache Miss) and TLB misses (TLB Miss). This makes the data request of the hardware page table traversal device not in a certain level of cache, but only can do address translation through multiple memory accesses.

In a general CPU architecture, instructions and data of a program are stored in a memory, and an operating frequency of a processor core is much higher than that of the memory, so that acquiring data or instructions from the memory requires hundreds of clocks, which often causes the processor core to idle due to the fact that the processor core cannot continuously run related instructions, resulting in performance loss. Therefore, modern high performance processors include a multi-level cache architecture to hold recently accessed data while prefetching data, instructions, into the cache ahead of time that are to be accessed. By prefetching data and instructions to the cache in advance, corresponding read-write operations can hit the cache, so that time delay can be reduced.

When the processor shown in FIG. 2 is employed, the process of address translation and requesting data is shown in FIG. 4. For example, when a data read request has a TLB Miss (TLB Miss), a page table walk needs to be performed to obtain a physical address, that is, four levels of page table entries are read from a memory to perform address translation, and then corresponding data is obtained from a cache/memory according to the physical address obtained by the translation. The time between the two five-pointed star graphs in fig. 4 is all the delays of the operation, including the address translation delay (longest distance between solid lines) and the data read delay (distance between dashed lines).

In the example shown in fig. 4, a data read/write operation (which may itself be a data prefetch request) that requires address translation via page table traversal has no opportunity to prefetch data, and data can only be obtained from memory via the multi-level cache after the physical address is obtained. This makes data prefetching ineffective, does not reduce latency, and negatively impacts overall system performance.

At least one embodiment of the disclosure provides an information prefetching method, a processor and an electronic device. The information prefetching method can realize the data/instruction prefetching function while reducing the address translation time delay, effectively reduce the time delay of data/instruction reading and writing operation and improve the overall performance of the system.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. It should be noted that the same reference numerals in different figures will be used to refer to the same elements that have been described.

At least one embodiment of the present disclosure provides an information prefetching method for a processor. The processor comprises a first-level cache space, a first page table walker, a second page table walker and at least one preset cache space, wherein the first-level cache space and the at least one preset cache space are sequentially in communication connection to form a communication link. The at least one preset cache space comprises a target preset cache space, the first page table walker and the target preset cache space are arranged at the same path level, and the first page table walker is in communication connection with the target preset cache space. The second page table traverser and the first-level cache space are arranged at the same path level, and the second page table traverser is in communication connection with the first-level cache space. The information prefetching method comprises the following steps: responding to a first page table walker selected from a first page table walker and a second page table walker based on a preset rule to execute an address translation operation to obtain a physical address, and sending a pre-fetching request to a target preset cache space by the first page table walker, wherein the pre-fetching request comprises the physical address; and responding to the prefetching request, and performing information prefetching operation on the target preset cache space based on the physical address.

At least one embodiment of the present disclosure provides a processor. The processor includes a first level cache space, a first page table walker, a second page table walker, and at least one preset cache space. The first-level cache space and the at least one preset cache space are sequentially in communication connection to form a communication link. The at least one preset cache space comprises a target preset cache space, the first page table walker and the target preset cache space are arranged at the same path level, and the first page table walker is in communication connection with the target preset cache space. The second page table traverser and the first-level cache space are arranged at the same path level, and the second page table traverser is in communication connection with the first-level cache space. The first page table walker is configured to respond to the fact that the first page table walker is selected from the first page table walker and the second page table walker on the basis of a preset rule to execute address translation operation to obtain a physical address, and send a pre-fetching request to a target preset cache space, wherein the pre-fetching request comprises the physical address; the target pre-set cache space is configured to perform an information pre-fetch operation based on the physical address in response to the pre-fetch request.

Fig. 5 is a schematic diagram of an architecture of a processor according to some embodiments of the disclosure. The following first describes the processor provided in the embodiment of the present disclosure with reference to fig. 5, and then describes the information prefetching method provided in the embodiment of the present disclosure.

As shown in fig. 5, in some embodiments of the present disclosure, the processor includes a processor core, a first level cache space, a first page table walker, a second page table walker, and at least one preset cache space. In some examples, the first page table walker and the second page table walker may be the aforementioned hardware Page Table Walker (PTW). Here, the first page table walker and the second page table walker may be the PTWs described above, which means that the address translation functions implemented by the PTWs are similar, and the adopted address translation principles are similar, while the hardware structures, the setting positions, and the like of the first page table walker and the PTWs described above may be different, and the hardware structures, the setting positions, and the like of the second page table walker and the PTWs described above may also be different, which is not limited in this embodiment of the present disclosure. The first page table walker is a newly added hardware Page Table Walker (PTW), and is set at the same path level with any one level of cache space except the first level of cache space in each level of cache space of the processor, and the second page table walker can be set at the same path level with the first level of cache space.

For example, the first level cache space is an L1 cache, which is provided inside the processor core. For example, the first level cache space is set at the same path level as the processor core, the first level cache space is in communication connection with the processor core, and the processor core can directly obtain data or instructions from the first level cache space. Here, "disposed at the same path level" means that physical positions in the chip are adjacent or close to each other, and data interaction and transfer can be directly performed. Therefore, the first-level cache space and the processor core are arranged at the same path level, which means that the first cache space is arranged beside the processor core and is closer to the processor core, and the processor core can directly perform data interaction and transmission with the first-level cache space. For example, "communicatively coupled" means that data/instructions may be directly transferred.

For example, a second page table walker is disposed at the same path level as the first level cache space, the second page table walker communicatively coupled to the first level cache space. The second page table walker and the first-level cache space are arranged at the same path level, which means that the second page table walker is arranged beside the first-level cache space and is closer to the first-level cache space, and the first-level cache space can directly perform data interaction and transmission with the second page table walker. For another example, a second page table walker may be disposed within the processor core, the second page table walker may be logically disposed at the same path level as the processor core, the second page table walker being communicatively coupled to the processor core.

In some examples, the first level of cache space includes an L1I cache and an L1D cache, the L1I cache being used to store instructions and the L1D cache being used to store data. Of course, the embodiments of the present disclosure are not limited thereto, and in other examples, only one L1 cache may be provided for storing both data and instructions without distinguishing between the L1I cache and the L1D cache.

For example, in some examples, the at least one preset cache space includes second-level to nth-level cache spaces, N being an integer greater than 2. The nth level cache space is closest to the memory and farthest from the processor core. For example, in the example shown in fig. 4, the at least one preset cache space may include a second level cache space (L2 cache) and a last level cache space (LLC), that is, when N ═ 3. Of course, the embodiments of the present disclosure are not limited thereto, and N may be any integer greater than 2, such as 4, 5, 6, and the like, and accordingly, the processor is of a level 4 cache architecture, a level 5 cache architecture, a level 6 cache architecture, and the like. For example, in other examples, the at least one predetermined cache space includes one cache space, that is, only the second-level cache space, and in this case, the processor is in a 2-level cache architecture. It should be noted that, in the processor provided in the embodiment of the present disclosure, except for the first-level cache space, other levels of caches may be collectively referred to as a preset cache space.

For example, the first-level buffer space and the at least one preset buffer space are sequentially connected in a communication manner to form a communication link, so that the data acquisition in a downward manner can be realized. For example, when the processor core needs to obtain data, the processor core may first query the first level cache space, if there is no hit, continue to the second level cache space, and if there is still no hit, query the last level cache space. And if the last level cache space is not hit, acquiring data from the memory. Similarly, when a second page table walker needs to fetch data, it may first walk to the first level cache space, continue to the second level cache space if there is no hit, and walk to the last level cache space if there is still no hit. And if the last level cache space is not hit, acquiring data from the memory.

For example, the at least one preset cache space includes a target preset cache space, and the target preset cache space may be any one of a plurality of preset cache spaces. For example, any one of the second-level to nth-level cache spaces may be used as the target preset cache space. The first page table traversal device and the target preset cache space are arranged at the same path level, and the first page table traversal device is in communication connection with the target preset cache space.

For example, in the example of fig. 5, the last-level cache space is set as the target preset cache space, the first page table walker is set at the same path level as the last-level cache space, and the first page table walker is communicatively connected to the last-level cache space. Here, "disposed at the same path level" means that they are physically adjacent or close to each other in the chip, and data interaction and transfer can be directly performed. Therefore, the first page table walker and the last level cache space are arranged at the same path level, which means that the first page table walker is arranged beside the last level cache space and is closer to the last level cache space, and the last level cache space can directly perform data interaction and transmission with the first page table walker.

For example, in some examples, the nth level cache space is a shared type cache space, and the nth level cache space is a target preset cache space, which is the case shown in fig. 5. For example, in other examples, the second level cache space is a private type or a shared type cache space, and the second level cache space is a target preset cache space. That is, in some processor architectures, the second level cache space is provided separately for each processor core and is of a private type, while in other processor architectures, the second level cache space is shared by multiple processor cores and is of a shared type. The second-level cache space can be used as a target preset cache space no matter whether the second-level cache space is a private type or a shared type.

It should be noted that, although fig. 5 shows that the last-level cache space is taken as the target preset cache space and the first page table walker is disposed beside the last-level cache space, this does not constitute a limitation to the embodiments of the present disclosure. In other examples, the first page table walker is disposed next to and communicatively coupled to the second level cache space in a case where the second level cache space is targeted to the preset cache space. In some examples, when the processor includes more levels of cache, the set position of the first page table walker may be adjusted accordingly by setting any other level of cache space other than the first level of cache space as a target preset cache space. Note that the first page table walker is not located within the processor core or is not located next to the first level cache space.

For example, the first page table walker is configured to send a prefetch request to the target pre-set cache space in response to selecting the first page table walker to perform an address translation operation resulting in a physical address in the first page table walker and the second page table walker based on a pre-set rule. For example, the prefetch request includes a physical address, i.e., the prefetch request carries the physical address translated by the first page table walker. For example, when a virtual address needs to be translated to a physical address, if there is a miss in the ITLB or DTLB and it is determined that an address translation operation is performed by the first page table walker, the processor core may send an address translation request to the first page table walker. The architecture of the TLB is not limited to the ITLB and DTLB, and any applicable architecture may be adopted, and the embodiment of the present disclosure is not limited thereto.

For example, the address translation request may trigger the first page table walker to perform an address translation operation. The address translation request may be transmitted to the first page table walker through a multi-level cache architecture, or may be transmitted to the first page table walker through a pipeline inside the processor. In the case where the address translation request is passed to the first page table walker through the multi-level cache architecture, the address translation request employs a data read request type recognizable by the multi-level cache architecture.

For example, the address translation operation may be an address translation process of a multi-level page table, which may be described with reference to fig. 1 and will not be described herein. It should be noted that the page table for address translation is not limited to 4 levels, any number of multi-level page tables may be used, for example, a 2-level page table, a 3-level page table, a 5-level page table, etc., and a single-level page table may also be used, which may be determined by practical requirements, and embodiments of the present disclosure are not limited thereto. For example, the more the number of page table levels, the more times the memory is accessed per address translation, and thus the performance improvement space that the processor provided by the embodiment of the present disclosure can provide is also larger. For example, the physical page size of the page table is not limited and can be determined according to actual requirements.

The address translation request may include translation information. The translation information may include: address translation request sequence number, virtual address value to be translated, initial address of the highest level page table. The first page table walker is triggered to perform the address translation operation after receiving the address translation request, and can obtain the contents required for performing the address translation operation, such as the virtual address value, the initial address of the highest-level page table, and the like, based on the translation information. In some examples, the request may be denoted as an address translation request by Addr _ Trans _ Req, an address translation request sequence number by Addr _ Trans _ SN, an initial address of the highest level page table (i.e., the REG _ pt value of the process) by REG _ pt, and a virtual address value to be translated by VA.

For example, the translation information may also include a request type identification. The request type identification indicates that the target information stored corresponding to the physical address is an instruction type or a data type. In some examples, whether the request corresponds to an instruction or data may be represented by an I/D, e.g., an instruction is represented by an I and data is represented by a D.

Since the first page table walker is arranged beside the target preset cache space (in this example, the last level cache space), and the first page table walker is closer to the memory, the time for the first page table walker to acquire the page table entry from the memory each time is shorter, which significantly improves the efficiency of address translation and greatly shortens the time spent on address translation. The first page table walker is not arranged beside the first-level cache space (L1 cache), the constraint that the first page table walker is usually arranged in a processor core is eliminated, and the time delay of the first page table walker in accessing the memory and address translation can be reduced and the system performance of the processor is improved because the first page table walker can be closer to the memory. The setting mode of the first page table walker is suitable for various novel application scenes (such as big data, cloud computing, AI and the like) and various CPU architectures, and the performance of the novel application scenes can be further improved.

It should be noted that, in the embodiment of the present disclosure, the first page table walker may be disposed beside any one of the first-level cache spaces except the first-level cache space, or may be directly disposed beside the memory, which may be determined according to actual requirements, for example, according to a processor architecture, a process, a cache size and delay, a memory delay, whether cache coherence is supported, characteristics of a common application, and other factors, which is not limited in this embodiment of the present disclosure.

For example, the target default cache space is configured to perform an information prefetch operation based on the physical address in response to a prefetch request. For example, the target default cache space is further configured to: and determining a pre-fetching cache space, acquiring target information correspondingly stored by the physical address based on the physical address, and sending the target information to the pre-fetching cache space. For example, the prefetch buffer space is at least one of a first-level buffer space and at least one preset buffer space, that is, may be any one or more of the first-level buffer space and the preset buffer space. For example, in some examples, the prefetch buffer space is a buffer space closer to the processor core than the target preset buffer space in a communication link formed by the first level buffer space and the at least one preset buffer space, so that the prefetch efficiency can be improved. Of course, in other examples, the prefetch cache space may also be a cache space that is further away from the processor core than the target preset cache space.

That is, before the processor core receives the physical address and requests information (e.g., data or instructions), the target pre-set cache space performs an information pre-fetching operation according to the physical address, and stores target information stored corresponding to the physical address into the pre-fetch cache space. Therefore, when the processor core requests information based on the physical address, the information can be hit in the prefetch cache space, so that the time delay can be effectively reduced, and the data/instruction can be prefetched.

For example, as shown in FIG. 5, the processor also includes a page table entry cache space, which may be a Translation Lookaside Buffer (TLB) as previously described. For example, the page table entry cache space may include a translation lookaside buffer (ITLB) for instructions and a translation lookaside buffer (DTLB) for data. For example, the processor core and the page table entry cache space are set at the same path level, and the processor core is communicatively connected to the page table entry cache space. Here, "disposed at the same path level" means that physical positions in the chip are adjacent or close to each other, and data interaction and transfer can be directly performed. Therefore, the processor core and the page table entry cache space are set at the same path level, which means that the page table entry cache space is set beside the processor core and is closer to the processor core, and the processor core and the page table entry cache space can directly perform data interaction and transmission. For example, "communicatively coupled" means that data/instructions may be transferred directly. For another example, the page table entry cache space may be disposed within the processor core, and may be at the same path level as a second page table walker also disposed within the processor core, the second page table walker communicatively coupled to the page table entry cache space. For example, the second page table walker is set at the same path level as the translation lookaside buffer for Instructions (ITLB) and the translation lookaside buffer for Data (DTLB). For example, the page table entry cache space may be provided inside the processor core.

For example, the page table entry cache space, the first-level cache space, and the at least one preset cache space are sequentially communicatively connected to form a communication link, so that the step-by-step downward data acquisition can be realized. For example, when a second page table walker disposed within the processor core needs to fetch data (e.g., page table entry data), the page table entry may be queried first in the cache space, and if there is no hit, the page table walk may continue to the first level cache space query, if there is no hit, the page table walk may continue to the second level cache space query, and if there is still no hit, the page table walk may continue to the last level cache space query. And if the last level cache space is not hit, acquiring data from the memory.

For example, the page table entry cache space stores at least a part of page table entry data from the page table entry data of the first-level page table to the page table entry data of the mth-level page table, where M is an integer greater than 1. That is, the page table entry cache space may store any page table entry data that has been recently used, such as PTEs and the like.

For example, as shown in FIG. 5, the processor may also include a request cache. The Request Buffer may also be referred to as a Page Request Buffer (PRB), and the Request Buffer and the first Page table walker are set at the same path level. The request buffer is communicatively coupled to the first page table walker and to the target pre-set cache space, the request buffer being disposed, for example, between the first page table walker and the target pre-set cache space. Here, "disposed at the same path level" means that they are physically adjacent or close to each other in the chip, and data interaction and transfer can be directly performed. Therefore, the request buffer and the first page table walker are arranged at the same path level, which means that the request buffer is arranged beside the first page table walker and is closer to the first page table walker, and the first page table walker can directly perform data interaction and transmission with the request buffer. Meanwhile, the request cache region can also directly carry out data interaction and transmission with the target preset cache space.

The request buffer is configured to store a pending address translation request queue sent by the processor core. When the processor provided by the embodiment of the present disclosure includes a plurality of processor cores, the first page table walker cannot process address translation requests simultaneously sent by the plurality of processor cores, and therefore the request cache region may be used to store a pending address translation request queue. The first page table walker may sequentially fetch address translation requests from the request cache and perform corresponding address translation operations.

For example, in some examples, the processor core is configured to perform an address translation operation in response to selecting a first page table walker from among a first page table walker and a second page table walker based on a preset rule, and the first page table walker performs an address translation operation in response to an address translation request to obtain the physical address. For example, the processor dynamically selects a first page table walker or a second page table walker to process a new address translation request based on preset rules, and selects a more appropriate page table walker to reduce address translation latency. For example, the preset rules include: when page table entry data required for address translation does not exist in the page table entry cache space, or the page table level of the page table entry data required for address translation in the page table entry cache space is greater than a threshold value, determining that the address translation operation is executed by the first page table walker, otherwise, determining that the address translation operation is executed by the second page table walker. For example, the threshold is related to various factors in the CPU architecture and the chip process, and may be arbitrarily set according to actual requirements, which is not limited by the embodiment of the disclosure.

For example, in some examples, if page table entry data needed for address translation is not present in the page table entry cache space, it is determined that an address translation operation is performed by the first page table walker. For example, in other examples, it is determined that the address translation operation is performed by the first page table walker if a page table level of page table entry data in the page table entry cache space corresponding to the required address translation is greater than a threshold. Assuming that the threshold is 2, when the page table entry data required for address translation in the page table entry cache space is level 3 page table entry data or level 4 page table entry data, it is determined that the address translation operation is performed by the first page table walker. It should be noted that the preset rule is not limited to the above-described manner, and any applicable rule may be adopted to select one of the first page table walker and the second page table walker for the address translation operation, which may be determined according to practical requirements, and the embodiment of the present disclosure is not limited thereto.

For example, the processor core is configured to generate an address translation request in response to an absence of page table entry data required for address translation in the page table entry cache space, and to send the address translation request to the first page table walker if it is determined that an address translation operation is performed by the first page table walker. For example, when a virtual address needs to be translated into a physical address, if page table entry data required for address translation misses in ITLB or DTLB, an address translation operation needs to be performed. The processor determines one of the first page table walker and the second page table walker to perform an address translation operation based on a preset rule. When it is determined that an address translation operation is performed by the first page table walker, the processor core may send an address translation request to the first page table walker. When it is determined that an address translation operation is performed by the second page table walker, the processor core may send an address translation request to the second page table walker. For example, the TLB is not limited to the ITLB and DTLB, and any suitable TLB may be used, which is not limited in this respect by the embodiments of the disclosure.

For example, in some examples, the first page table walker is further configured to receive an address translation request generated by the processor core, retrieve page table entry data from the memory via the target pre-set cache space, and perform an address translation operation using the page table entry data to obtain the physical address.

For example, in this example, the first page table walker does not directly communicatively connect with the memory, does not directly access the memory, but indirectly accesses the memory through the target predetermined cache space to obtain the page table entry data. For example, the first page table walker is further configured to obtain the page table entry data in a step-by-step query manner from a path from the target preset cache space to the memory according to the address translation request. The progressive query is similar to the way data is obtained progressively through a multi-level cache. For example, when the target preset cache space is the last level cache space, the first page table walker accesses the memory through the last level cache space; and when the target preset cache space is a second-level cache space or other-level cache spaces, the first page table walker queries and accesses the memory step by step downwards through the target preset cache space. For example, the first-level cache space to the nth-level cache space store at least part of page table entry data in page table entry data of the first-level page table to page table entry data of the mth-level page table, where M is an integer greater than 1.

Therefore, the page table entry data read by the first page table walker can be stored in the target preset cache space and the cache space between the target preset cache space and the memory, so that the page table entries possibly existing in the cache space can be conveniently inquired in the next address translation process, and if the page table entries are hit, the memory does not need to be accessed, the address translation efficiency is further improved, and the time delay less than the memory access is obtained. Moreover, in the embodiment of the present disclosure, under a multi-core architecture, since the page table entry data read by the first page table walker is stored in the target preset cache space, the cache coherence mechanism can ensure that the first page table walker obtains correct page table entry content.

For example, in some examples, the first Page table walker may include a multi-level Page table Cache (PWT). The multi-level page table cache is configured to cache page table entry data of the first level page table into at least a portion of page table entry data of a page table entry data of an mth level page table, M being an integer greater than 1. For example, the multi-level page table cache is a cache inside the page table walker, and is used for storing any page table entries such as a first-level page table, a second-level page table entry, a third-level page table entry, and a fourth-level page table entry that have been used recently. If one address translation finds the corresponding page table entry in the multi-level page table cache region, higher level page table accesses can be skipped, thereby reducing the memory access times and address translation delay. It should be noted that the multi-level page table cache is optimized for the micro-architecture of the page table walker, and may also be omitted, which may be determined according to practical requirements, and embodiments of the present disclosure do not limit this. Similarly, the second page table walker may also include a multi-level page table cache.

For example, the first page table walker is further configured to send an error feedback instruction to the processor core to determine a page table entry read error in response to failing to acquire page table entry data required for address translation. That is, under the processor architecture, when a Page table accessed by a certain level is not in the memory, or a data operation does not match the attribute of the Page table entry to which the data belongs, a Page Fault (Page Fault) is triggered, and the operating system handles the exception. Therefore, when the Page table entry data required by address translation cannot be acquired, the first Page table walker sends an error feedback instruction to the processor core, where the error feedback instruction is, for example, an interrupt instruction or another type of instruction, and indicates that there is a Page table entry reading error, thereby triggering Page Fault.

For example, in some examples, the first page table walker is further configured to send a data return instruction to the processor core. After the first page table walker executes the address translation operation, the corresponding physical address can be obtained, so that the first page table walker sends a data return instruction to the processor core, and the physical address is transmitted to the processor core. The data return instruction may be transmitted to the processor core through the multi-level cache architecture, or may be transmitted to the processor core through a pipeline inside the processor. In the case where the data return instruction is passed to the processor core through the multi-level cache architecture, the data return instruction employs a request-response type recognizable by the multi-level cache architecture.

For example, the data return instruction includes an address translation request sequence number, a physical address and attributes of a memory page, and the like. For example, in some examples, Addr _ Trans _ Resp may be used to indicate that the information is a reply to an address translation request Addr _ Trans _ SN (i.e., that the information is a data return instruction), Addr _ Trans _ SN may be used to indicate an address translation request sequence number, and PTE may be used to indicate corresponding first level page table entry contents, e.g., including physical addresses and attributes of memory pages.

For example, the processor core is configured to generate an address translation request in response to selecting a second page table walker to perform an address translation operation among the first page table walker and the second page table walker based on a preset rule, and the second page table walker performs the address translation operation in response to the address translation request to obtain the physical address. Please refer to fig. 2 for a process of the second page table walker performing the address translation operation, which is not described herein again.

For example, in some examples, the second page table walker receives an address translation request generated by the processor core, retrieves page table entry data from memory according to the address translation request, and performs an address translation using the page table entry data to obtain the physical address. For example, in the case where it is determined that an address translation operation is performed by the first page table walker, the second page table walker may receive an address translation request from the processor core and then forward the address translation request to the first page table walker, thereby providing a diversified transmission for the address translation request. The first page table walker executes the address translation operation after receiving the address translation request, and for a specific address translation operation process, reference is made to the above embodiments, which are not described herein again.

It should be noted that, in the embodiment of the present disclosure, the processor may be in a single-core architecture or a multi-core architecture, and the embodiment of the present disclosure is not limited thereto. The number and arrangement of the caches are not limited, and can be determined according to actual requirements. The processor is not limited to the structure shown in fig. 5, and may include more or less components, and the connection manner between the components is not limited.

Fig. 6 is a flowchart illustrating an information prefetching method according to some embodiments of the disclosure. This information prefetching method may be used in the processor shown in FIG. 5. In some embodiments, as shown in FIG. 6, the information prefetching method comprises the following operations.

Step S10: responding to a first page table walker selected from a first page table walker and a second page table walker based on a preset rule to execute an address translation operation to obtain a physical address, and sending a pre-fetching request to a target preset cache space by the first page table walker, wherein the pre-fetching request comprises the physical address;

step S20: and responding to the prefetching request, and performing information prefetching operation on the target preset cache space based on the physical address.

For example, in step S10, the processor selects the first page table walker from the first page table walker and the second page table walker to perform an address translation operation to reduce the delay of address translation based on a preset rule, and generates an address translation request. The first page table traversal device receives an address translation request sent by the processor to execute address translation operation, and a physical address is obtained.

For example, the preset rules include: when page table entry data required for address translation does not exist in the page table entry cache space, or the page table level of the page table entry data required for address translation in the page table entry cache space is greater than a threshold value, determining that the address translation operation is executed by the first page table walker, otherwise, determining that the address translation operation is executed by the second page table walker. For example, the threshold is related to various factors in the CPU architecture and the chip process, and may be arbitrarily set according to actual requirements, which is not limited by the embodiment of the disclosure.

For example, after the first page table walker performs the address translation operation to obtain the physical address, the first page table walker sends a prefetch request to the target preset cache space. For example, the prefetch request includes a physical address, that is, the prefetch request carries the physical address, so that the physical address can be obtained by the target preset cache space.

For example, in step S20, after the target preset cache space receives the prefetch request, the information prefetch operation is performed according to the physical address carried in the prefetch request. For example, the prefetch request is a request for triggering the target preset cache space to perform an information prefetch operation, and any applicable request type may be adopted, which is not limited by the embodiment of the present disclosure. For example, the information prefetching operation is used to implement information prefetching, and the prefetched target information may be data or instructions, and the target information is stored in the storage space indicated by the physical address.

Fig. 7 is a schematic diagram of a process of performing address translation and requesting data by using a processor provided by an embodiment of the present disclosure. As shown in fig. 7, when a certain data read request has a TLB Miss (TLB Miss), a page table walk needs to be performed to obtain a physical address. For example, in the case of address translation by the first page table walker, the first page table walker reads a four-level page table entry from the memory for address translation, resulting in a physical address. The physical address is sent to the processor core, which then obtains the corresponding data from the cache/memory according to the physical address.

Since the first page table walker is disposed beside the target preset cache space (e.g., LLC), when the first page table walker performs address translation in a page table traversal manner, the first page table walker and the target preset cache space obtain the physical address of the request earlier than the processor core. At this time, before the processor core requests data based on the physical address, the target preset cache space may obtain corresponding data from the memory according to the physical address in advance, and send the data to the processor core (or may also send the data to a designated cache, such as an L1 cache or an L2 cache, etc.), thereby implementing data prefetching. The time between the two five-pointed star plots in fig. 7 is all the delays for this operation, including the address translation delay (longest distance between solid lines) and the data prefetch delay (distance between dashed lines).

In this example, the time of data prefetching is shown in FIG. 7 by the dashed line. The time saved by the data prefetch compared to the time the data is requested in fig. 4 (time shown by the dashed line in fig. 4) is roughly equal to the latency from the processor core to the target pre-set cache space (e.g., LLC). Therefore, on the basis that the first page table walker is close to the memory so as to save address translation time delay, the data prefetching method can further save time delay of data acquisition.

By the method, the information prefetching method can achieve the data/instruction prefetching function while reducing the address translation time delay, effectively reduces the time delay of data/instruction reading and writing operation, and improves the overall performance of the system.

Fig. 8 is an exemplary flowchart of step S20 in fig. 6. In some examples, the step S20 may further include the following operations.

Step S21: determining a prefetch cache space;

step S22: based on the physical address, the target preset cache space obtains target information stored corresponding to the physical address;

step S23: and the target preset cache space sends the target information to the pre-fetching cache space.

For example, in step S21, a prefetch cache space is first determined, and the prefetch cache space is used for caching the target information stored corresponding to the physical address. For example, the prefetch buffer space is at least one of a first-level buffer space and at least one preset buffer space, that is, may be any one or more of the first-level buffer space and the preset buffer space. For example, in some examples, the prefetch cache space is a cache space that is closer to the processor core than the target preset cache space in a communication link formed by the first level cache space and the at least one preset cache space, whereby prefetch efficiency may be improved. In the processor architecture shown in fig. 5, the prefetch cache space may be an L2 cache or an L1 cache (L1I cache or L1D cache).

Fig. 9 is an exemplary flowchart of step S21 in fig. 8. In some examples, as shown in fig. 9, the step S21 may further include the following operations.

Step S211: acquiring a preset identifier;

step S212: and determining a pre-fetching cache space according to a preset identifier.

For example, in step S211, level information indicating a cache space, that is, indicating which level of cache space the prefetch cache space is, is preset. For example, when the preset flag is 1, it indicates that the prefetch cache space is L1 cache; when the preset flag is 2, the prefetch cache space is L2 cache, and so on. It should be noted that, the specific data format and the representation manner of the preset identifier are not limited in the embodiment of the present disclosure, as long as it can be determined which level of the cache space the prefetch cache space is according to the preset identifier.

For example, the preset identification is stored in a designated storage space or carried in the prefetch request.

For example, in some examples, the preset identification is stored in a designated storage space, that is, the preset identification may be set in advance and fixed. When the preset identification needs to be acquired, the preset identification only needs to be read in the appointed storage space. This way, the preset identification obtaining mode can be simplified.

For example, in other examples, the preset identification is carried in the prefetch request. When the first page table walker sends a pre-fetching request to the target preset cache space, the pre-fetching request carries a preset identifier, so that the target preset cache space can obtain the preset identifier to determine which level of cache space the pre-fetching cache space is. The way for the first page table walker to determine the preset identifier will be described later, and will not be described herein again. By the method, the prefetch cache space can be dynamically selected, the prefetch cache space is not fixed as a certain level of cache space, and can be flexibly set during prefetching every time, so that the overall processing efficiency is improved.

It should be noted that, in the embodiment of the present disclosure, a manner of obtaining the preset identifier is not limited to the manner described above, and may also be any other applicable manner, which may be determined according to an actual requirement, and the embodiment of the present disclosure is not limited to this.

For example, in step S212, after the preset identifier is obtained, the prefetch cache space may be determined according to the preset identifier. For example, when the preset flag is 1, determining that the prefetch cache space is an L1 cache; when the preset flag is 2, the prefetch cache space is determined to be L2 cache, and so on. In the processor architecture shown in fig. 5, the target preset cache space is LLC, so the determined prefetch cache space L1 cache or L2 cache is closer to the processor core than to the LLC, thereby improving prefetch efficiency.

For example, in some examples, the first level cache space includes a first level instruction space (e.g., an L1I cache) and a first level data space (e.g., an L1D cache). In a possible case, the level information represented by the preset flag indicates the first level, that is, the preset flag is 1, and the step S212 may further include the following operation, as shown in fig. 10.

Step S212 a: determining that the prefetch cache space is a first-level instruction space in response to the target information being an instruction type;

step S212 b: and determining the prefetch cache space as a first level data space in response to the target information being the data type.

For example, in step S212a and step S212b, since the first level cache space includes an L1I cache and an L1D cache, which respectively cache different types of information, it is necessary to further determine which of an L1I cache and an L1D cache the prefetch cache space is. If the target information stored corresponding to the physical address is an instruction type, determining that the pre-fetching cache space is an L1I cache; if the target information stored corresponding to the physical address is the data type, the prefetch cache space is determined to be the L1D cache. Whereby the target information can be prefetched into the correct cache.

Returning to fig. 8, in step S22, based on the physical address, the target preset cache space obtains the target information stored corresponding to the physical address. For example, in some examples, step S22 may include: and acquiring target information in a step-by-step query mode from a path from a target preset cache space to the memory based on the physical address. If the target information hits in a certain level of cache, it can be directly fetched. If the target information is not hit in the cache, the target information needs to be acquired from the memory. The progressive query approach is similar to the approach of acquiring data progressively through multiple levels of cache.

For example, in some examples, the physical address PA may be represented as: PA ═ (first-level PTE value) < < X | OFFSET _ pg. Here, OFFSET _ pg denotes a virtual address OFFSET, and X denotes a log value of a memory page size. For example, for a 4KB page, the X value is 12. It should be noted that this is only one example of a physical address calculation manner, and does not constitute a limitation to the embodiments of the present disclosure.

For example, in step S23, after the target information is acquired, the target preset cache space transmits the target information to the prefetch cache space, thereby caching the target information in the prefetch cache space. For example, in some examples, step S23 may include: and sending the target information to the pre-fetching cache space in a step-by-step transfer mode from the target preset cache space to the pre-fetching cache space. The progressive transfer is similar to the progressive transfer of data through multi-level buffers.

By the mode, the target information is cached in the pre-fetching cache space, and when the processor core requests the target information according to the physical address, the target information can be hit in the pre-fetching cache space, so that the time delay of data/instruction reading and writing operation is effectively reduced, and the overall performance of the system is improved. The prefetching mode needs only few hardware changes and only needs to increase few hardware resources, and is easy to implement.

Fig. 11 is a flow chart illustrating another information prefetching method according to some embodiments of the disclosure. In some embodiments, in addition to including steps S10-S20, the information prefetching method may further include steps S30-S40. Steps S10-S20 in this embodiment are substantially the same as steps S10-S20 shown in FIG. 6, and are not repeated herein.

Step S30: responding to a first page table walker selected from a first page table walker and a second page table walker based on a preset rule to execute an address translation operation, and responding to an address translation request by the first page table walker to execute the address translation operation so as to obtain a physical address;

step S40: and the target preset cache space sends the physical address to the processor core.

For example, in step S30, when the virtual address needs to be translated into a physical address, if there is a miss in the ITLB or DTLB, the processor core may send an address translation request to the first page table walker when it is determined that an address translation operation is performed by the first page table walker according to a preset rule. The architecture of the TLB is not limited to the ITLB and DTLB, and any applicable architecture may be adopted, and the embodiment of the present disclosure is not limited thereto. For example, an address translation request may trigger a first page table walker to perform an address translation operation. For example, the address translation operation may be an address translation process of a multi-level page table, which may be described with reference to fig. 1 and will not be described herein. It should be noted that the page table for address translation is not limited to 4 levels, and any number of multi-level page tables may be used, for example, a 2-level page table, a 3-level page table, a 5-level page table, and the like, and a single-level page table may also be used, which may be determined according to actual needs, and embodiments of the present disclosure are not limited in this respect. For example, the more the number of page table levels, the more times the memory is accessed per address translation, and thus the performance improvement space that the processor provided by the embodiment of the present disclosure can provide is also larger. For example, the physical page size of the page table is not limited and can be determined according to actual requirements.

For example, in some examples, step S30 may include: the first page table walker receives an address translation request generated by the processor core, acquires page table entry data from the memory through a target preset cache space, and performs address translation operation by using the page table entry data to acquire a physical address. For example, the first page table walker acquires page table entry data in a step-by-step query manner from a path from a target preset cache space to the memory according to the address translation request, and translates the page table entry data to acquire a physical address.

For example, the address translation request includes translation information including: address translation request sequence number, virtual address value to be translated, initial address of highest level page table. In some examples, the request may be denoted as an address translation request by Addr _ Trans _ Req, an address translation request sequence number by Addr _ Trans _ SN, an initial address of the highest level page table (i.e., the REG _ pt value of the process) by REG _ pt, and a virtual address value to be translated by VA.

For example, the translation information may further include a request type identifier indicating that the target information stored corresponding to the physical address is an instruction type or a data type. In some examples, whether the request corresponds to instructions or data may be represented by I/D, e.g., instructions are represented by I and data are represented by D. Therefore, in the case that the preset cache space is the first-level cache space, the preset cache space may be determined to be the L1I cache or the L1D cache according to the type of the target information.

For example, in step S40, after the first page table walker performs the address translation operation to obtain the physical address and sends the prefetch request to the target preset cache space, the target preset cache space sends the physical address to the processor core. Since the prefetch request carries the physical address, the physical address is available to the target pre-set cache space. Of course, the embodiments of the present disclosure are not limited thereto, and the first page table walker may also separately send the physical address to the target preset cache space, which may be determined according to actual requirements.

For example, in some examples, step S40 may include: the target preset cache space sends the physical address to the processor core in a step-by-step transmission manner, that is, the physical address is transmitted to the processor core through the multi-level cache architecture. Of course, the embodiments of the present disclosure are not limited thereto, and the physical address may also be transferred to the processor core through a pipeline inside the processor, which is not limited by the embodiments of the present disclosure. For example, the transmission of the physical address may be implemented using a data return instruction. For example, the data return instruction includes an address translation request sequence number, a physical address and attributes of a memory page, and the like. For example, in some examples, Addr _ Trans _ Resp may be used to indicate that the information is a reply to an address translation request Addr _ Trans _ SN (i.e., to indicate that the information is a data return instruction), Addr _ Trans _ SN may be used to indicate an address translation request sequence number, and PTE may be used to indicate the corresponding first level page table entry content, e.g., including the physical address and attributes of a memory page.

It should be noted that, after the target preset cache space receives the prefetch request, the steps S20 and S40 may be executed in parallel, that is, the physical address is sent to the processor core while the information prefetch operation is performed. Here, "simultaneously" may refer to starting execution at the same time, or may refer to having a small time difference between two operations, and embodiments of the present disclosure are not limited thereto. Of course, the embodiments of the present disclosure are not limited thereto, and the steps S20 and S40 may be performed in a certain order, for example, the step S20 is performed first and then the step S40 is performed, or the step S40 is performed first and then the step S20 is performed, which may be determined according to actual requirements.

Fig. 12 is a flowchart illustrating another information prefetching method according to some embodiments of the disclosure. For example, in some examples, the processor further includes a page table entry cache space and a request cache region, the processor core is disposed at a same path level as the page table entry cache space, and the processor core is communicatively coupled to the page table entry cache space. The page table entry cache space may be provided, for example, inside the processor core. The request buffer is set at the same path level as the first page table walker. Steps S10-S40 in this embodiment are substantially the same as steps S10-S40 shown in FIG. 11, and are not described herein again. In some embodiments, in addition to comprising steps S10-S40, the information prefetching method may further comprise:

step S50: in response to the absence of page table entry data needed for address translation in the page table entry cache space, an address translation request is generated with the processor core.

Step S60: and sending the address translation request queue to be processed to the request cache region by utilizing the processor core.

For example, in step S50, when the virtual address needs to be translated into a physical address, if there is a miss in the ITLB or DTLB, i.e., there is no page table entry data required for address translation in the page table entry cache space, the processor core generates an address translation request.

For example, in step S60, when the processor includes multiple processor cores, the first page table walker cannot process address translation requests simultaneously sent by the multiple processor cores, and therefore sends a pending address translation request queue to the request buffer using the processor cores. The first page table walker may sequentially fetch address translation requests from the request cache and perform corresponding address translation operations.

For example, in some examples, the information prefetching method may further include: in response to the first page table walker being determined to perform an address translation operation, the first page table walker receives an address translation request forwarded by the second page table walker.

For example, in the case where the first page table walker is selected to perform an address translation operation, the second page table walker may receive an address translation request from the processor core and then forward the address translation request to the first page table walker, thereby providing a diversified transport for the address translation request. The first page table walker executes the address translation operation after receiving the address translation request, and for the specific address translation operation process, reference is made to the above embodiment, which is not described herein again.

Fig. 13 is a flowchart illustrating another information prefetching method according to some embodiments of the disclosure. As shown in fig. 13, in some examples, the information prefetching method may further include step S70 and step S80.

Step S70: the processor core determines a cache space for storing the target information according to the storage states of the first-level cache space and at least one preset cache space, and enables the address translation request to carry the level information of the cache space for storing the target information in a preset identification mode;

step S80: and the first page table walker analyzes the preset identification, and enables the prefetch request to carry the preset identification.

For example, in step S70, when it is necessary to translate a virtual address into a physical address and it is determined that an address translation operation is to be performed by the first page table walker, the processor core generates and sends an address translation request to the first page table walker. At this time, the processor core determines a cache space for storing the target information according to the storage states of the first-level cache space and the preset cache space, and makes the address translation request carry a preset identifier, wherein the preset identifier indicates level information of the cache space for storing the target information. For example, a corresponding field may be added to the address translation request to represent a preset identification for specifying the prefetch cache space. Thus, a preset identification may be passed to the first page table walker. It should be noted that the cache space for storing the target information may be determined according to the number of cache Misses Per 1000 Instructions (MPKI), and certainly, the cache space for storing the target information may also be determined according to factors such as a vacancy degree of each cache space, validity and hit rate of cached data, and the like.

For example, in some examples, if the MPKI value of the first-level cache space is larger, it is determined to use the first-level cache space as the prefetch cache space, that is, the target information stored corresponding to the physical address is prefetched and cached in the first-level cache space. At this point the processor core sets the preset flag to 1 and sends it to the page table walker along with the address translation request. For example, in this case, in combination with the method for determining the L1I cache and the L1D cache shown in fig. 10, the prefetch cache space may be further specifically determined.

For example, in other examples, if the MPKI value of the second-level cache space is larger, it is determined to use the second-level cache space as the prefetch cache space, that is, the target information stored corresponding to the physical address is prefetched and cached in the second-level cache space. At this point the processor core sets the preset flag to 2 and sends it to the first page table walker along with the address translation request.

For example, in step S80, when the first page table walker needs to send a prefetch request to the target preset cache space, the first page table walker sends a preset identifier obtained by parsing the address translation request to the target preset cache space along with the prefetch request, so that the target preset cache space determines the prefetch cache space according to the preset identifier. For example, a corresponding field may be added to the prefetch request to represent a preset identification for specifying the prefetch cache space.

By the mode, the processor core can determine the cache space adopted by the pre-fetching and transmit the preset identification to the target preset cache space through the first page table walker, so that the target preset cache space can acquire the cache space adopted by the pre-fetching. Therefore, the prefetch cache space can be dynamically selected, the prefetch cache space is not fixed as a certain level of cache space, and can be flexibly set during prefetching every time, so that the overall processing efficiency is improved.

Fig. 14 is a flowchart illustrating another information prefetching method according to some embodiments of the disclosure. As shown in fig. 14, in some examples, the information prefetching method may further include steps S90 and S100.

Step S90: in response to selecting a second page table walker from the first page table walker and the second page table walker to perform an address translation operation based on preset rules, the second page table walker performs an address translation operation in response to an address translation request to obtain a physical address.

For example, the processor core generates an address translation request in response to the absence of page table entry data required for address translation in the page table entry cache space, and sends the address translation request to the second page table walker if it is determined that an address translation operation is performed by the second page table walker. For example, the architecture of the TLB is not limited to the ITLB and DTLB, and any suitable architecture may be employed, and embodiments of the present disclosure are not limited thereto. In a case where it is determined that the address translation operation is performed by the second page table walker, the second page table walker receives an address translation request generated by the processor core, acquires page table entry data from the memory according to the address translation request, and performs address translation using the page table entry data to obtain the physical address.

For example, in some examples, the method may further comprise:

step S100: in response to the first page table walker being determined to perform the address translation operation, an address translation request is forwarded to the first page table walker using the second page table walker.

For example, the processor core may generate an address translation request in response to the absence of page table entry data required for address translation in the page table entry cache space, and may forward the address translation request to the first page table walker via the second page table walker in the event that it is determined that an address translation operation is performed by the first page table walker. For example, when a virtual address needs to be translated into a physical address, if page table entry data required for address translation misses in the ITLB or DTLB, an address translation operation needs to be performed. The processor determines one of the first page table walker and the second page table walker to perform an address translation operation based on a predetermined rule. When it is determined that an address translation operation is performed by the first page table walker, the processor core triggers the second page table walker to forward an address translation request to the first page table walker.

It should be noted that, in the embodiment of the present disclosure, the information prefetching method is not limited to the above-described steps, and may further include more or fewer steps, and the execution order of each step is not limited, which may be determined according to actual needs. For a detailed description of the method, reference may be made to the above description of the processor, which is not repeated here.

At least one embodiment of the present disclosure also provides an electronic device including the processor provided in any one of the embodiments of the present disclosure. The electronic equipment can realize the data/instruction prefetching function while reducing the address translation time delay, effectively reduce the time delay of data/instruction reading and writing operation and improve the overall performance of a system.

Fig. 15 is a schematic block diagram of an electronic device provided in some embodiments of the present disclosure. As shown in fig. 15, the electronic device 100 includes a processor 110, and the processor 110 is a processor provided in any embodiment of the disclosure, for example, the processor shown in fig. 5. The electronic device 100 may be used in a new application scenario such as big data, cloud computing, Artificial Intelligence (AI), and correspondingly, the electronic device 100 may be a big data computing device, a cloud computing device, an artificial intelligence device, and the like, which is not limited in this embodiment of the disclosure.

Fig. 16 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure. As shown in fig. 16, the electronic device 200 is, for example, suitable for implementing the information prefetching method provided by the embodiment of the disclosure. The electronic device 200 may be a terminal device or a server, etc. It should be noted that the electronic device 200 shown in fig. 16 is only an example, and does not bring any limitation to the functions and the use range of the embodiment of the present disclosure.

As shown in fig. 16, the electronic apparatus 200 may include a processing device (e.g., a central processing unit, a graphic processor, etc.) 21, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)22 or a program loaded from a storage device 28 into a Random Access Memory (RAM) 23. For example, the processing device 21 may be a processor provided in any embodiment of the present disclosure, such as the processor shown in fig. 5. In the RAM 23, various programs and data necessary for the operation of the electronic apparatus 200 are also stored. The processing device 21, the ROM 22, and the RAM 23 are connected to each other via a bus 24. An input/output (I/O) interface 25 is also connected to bus 24.

Generally, the following devices may be connected to the I/O interface 25: input devices 26 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 27 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 28 including, for example, magnetic tape, hard disk, etc.; and a communication device 29. The communication means 29 may allow the electronic apparatus 200 to perform wireless or wired communication with other electronic apparatuses to exchange data. While fig. 16 illustrates the electronic device 200 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided, and that the electronic device 200 may alternatively be implemented or provided with more or less means.

For a detailed description and technical effects of the electronic device 100/200, reference may be made to the above description of the processor and the information prefetching method, which are not described in detail herein.

The following points need to be explained:

(1) the drawings of the embodiments of the disclosure only relate to the structures related to the embodiments of the disclosure, and other structures can refer to common designs.

(2) Without conflict, embodiments of the present disclosure and features of the embodiments may be combined with each other to arrive at new embodiments.

The above description is only a specific embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and the scope of the present disclosure should be subject to the scope of the claims.

Claims

1. An information prefetching method is used for a processor, wherein the processor comprises a first-level cache space, a first page table walker, a second page table walker and at least one preset cache space, the first-level cache space and the at least one preset cache space are sequentially in communication connection to form a communication link, the at least one preset cache space comprises a target preset cache space, the first page table walker and the target preset cache space are arranged at the same path level, the first page table walker is in communication connection with the target preset cache space, the second page table walker and the first-level cache space are arranged at the same path level, the second page table walker is in communication connection with the first-level cache space, the same path level is adjacent or similar in physical position and can directly carry out data interaction and transmission,

the method comprises the following steps:

responding to a first page table walker selected from the first page table walker and the second page table walker based on a preset rule to execute an address translation operation to obtain a physical address, and sending a pre-fetching request to the target preset cache space by the first page table walker, wherein the pre-fetching request comprises the physical address;

responding to the prefetching request, and performing information prefetching operation on the target preset cache space based on the physical address;

or, in response to selecting the second page table walker from the first page table walker and the second page table walker to perform the address translation operation based on the preset rule, the second page table walker performs the address translation operation in response to an address translation request to obtain the physical address.

2. The method of claim 1, wherein the processor further comprises a processor core,

the pre-fetching of the information based on the physical address in the target preset cache space includes:

determining a prefetch cache space, wherein the prefetch cache space is at least one of the first level cache space and the at least one preset cache space;

based on the physical address, the target preset cache space obtains target information correspondingly stored by the physical address;

and the target preset cache space sends the target information to the pre-fetching cache space.

3. The method of claim 2, wherein determining the prefetch cache space comprises:

acquiring a preset identifier, wherein the preset identifier represents the level information of a cache space, and the preset identifier is stored in a specified storage space or carried in the prefetch request;

and determining the pre-fetching cache space according to the preset identification.

4. The method of claim 3, wherein the first level cache space comprises a first level instruction space and a first level data space, the level information indicating a first level,

determining the pre-fetching cache space according to the preset identifier, including:

determining the prefetch cache space to be the first level instruction space in response to the target information being an instruction type;

and determining the prefetch cache space as the first-level data space in response to the target information being a data type.

5. The method of claim 2, wherein the obtaining, by the target pre-set cache space based on the physical address, the target information stored corresponding to the physical address includes:

and acquiring the target information from a path from the target preset cache space to the memory in a step-by-step query mode based on the physical address.

6. The method of claim 2, wherein the target pre-set cache space sending the target information to the pre-fetch cache space comprises:

and sending the target information to the pre-fetching cache space in a step-by-step transmission mode from the target preset cache space to the pre-fetching cache space.

7. The method of claim 2, further comprising:

and the target preset cache space sends the physical address to the processor core.

8. The method of claim 7, wherein the target pre-determined cache space sending the physical address to the processor core comprises:

and the target preset cache space sends the physical address to the processor core in a step-by-step transfer mode.

9. The method of claim 3, wherein the processor further comprises a page table entry cache space, the processor core being disposed at a same path level as the page table entry cache space, the processor core being communicatively coupled to the page table entry cache space,

the method further comprises the following steps:

generating, with the processor core, the address translation request in response to an absence of page table entry data needed for address translation in the page table entry cache space.

10. The method of claim 9, further comprising:

in response to selecting the first page table walker from the first page table walker and the second page table walker to perform the address translation operation based on the preset rule, the first page table walker performs the address translation operation in response to the address translation request to obtain the physical address.

11. The method of claim 10, wherein the first page table walker, in response to the address translation request, performing the address translation operation to obtain the physical address, comprises:

and the first page table walker receives the address translation request generated by the processor core, acquires page table entry data from a memory through the target preset cache space, and performs the address translation operation by using the page table entry data to acquire the physical address.

12. The method of claim 11, wherein the first page table walker obtaining the page table entry data from the memory via the target pre-determined cache space and performing the address translation operation using the page table entry data comprises:

and the first page table traversal device acquires the page table entry data in a step-by-step query mode from the path from the target preset cache space to the memory according to the address translation request, and translates the page table entry data to acquire the physical address.

13. The method of claim 12, further comprising:

in response to the first page table walker being determined to perform the address translation operation, the first page table walker receives the address translation request forwarded by the second page table walker.

14. The method of claim 12, wherein the address translation request includes translation information comprising: address translation request sequence number, virtual address value to be translated, initial address of the highest level page table.

15. The method of claim 14, wherein the translation information further comprises a request type identifier indicating that the target information that the physical address corresponds to storing is an instruction type or a data type.

16. The method of claim 9, further comprising:

the processor core determines a cache space for storing the target information according to the storage states of the first-level cache space and the at least one preset cache space, and enables the address translation request to carry level information of the cache space for storing the target information in a preset identification mode;

and the first page table traversing device analyzes to obtain the preset identification, and the pre-fetching request is made to carry the preset identification.

17. The method of claim 1, wherein the second page table walker, in response to the address translation request, performing the address translation operation to obtain the physical address, comprises:

and the second page table walker receives the address translation request generated by the processor core, acquires page table entry data from a memory according to the address translation request, and performs address translation by using the page table entry data to acquire the physical address.

18. The method of claim 10, wherein the preset rules comprise: determining that the address translation operation is executed by the first page table walker when page table entry data required for address translation does not exist in the page table entry cache space or a page table level corresponding to the page table entry data required for address translation in the page table entry cache space is greater than a threshold.

19. The method of claim 9, wherein the processor further comprises a request buffer, the request buffer being set at a same path level as the first page table walker, the request buffer being communicatively coupled to the first page table walker and to the target pre-defined buffer space,

the method further comprises the following steps:

and sending a pending address translation request queue to the request cache region by using the processor core.

20. The method of claim 2, wherein the at least one preset cache space includes second-level to Nth-level cache spaces, N being an integer greater than 2,

the nth level cache space is closest to the memory and farthest from the processor core, and any one of the second level cache space to the nth level cache space is used as the target preset cache space.

21. The method of claim 20, wherein the nth level cache space is a shared type cache space, the nth level cache space being the target preset cache space.

22. The method of claim 20, wherein the second level cache space is a private type or a shared type cache space, the second level cache space being the target preset cache space.

23. A processor includes a first level cache space, a first page table walker, a second page table walker, and at least one predetermined cache space,

wherein the first level cache space and the at least one preset cache space are sequentially in communication connection to form a communication link, the at least one preset cache space comprises a target preset cache space, the first page table walker and the target preset cache space are arranged at the same path level, the first page table walker is in communication connection with the target preset cache space, the second page table walker and the first level cache space are arranged at the same path level, the second page table walker is in communication connection with the first level cache space, the same path level is adjacent or close in physical position and can directly perform data interaction and transmission,

the first page table walker is configured to: responding to a first page table walker selected from the first page table walker and the second page table walker based on a preset rule to execute an address translation operation to obtain a physical address, and sending a pre-fetching request to the target preset cache space, wherein the pre-fetching request comprises the physical address;

the target preset cache space is configured as follows: in response to the prefetch request, performing an information prefetch operation based on the physical address;

the second page table walker is configured to: in response to selecting the second page table walker from the first page table walker and the second page table walker to perform the address translation operation based on the preset rule, performing the address translation operation in response to an address translation request to obtain the physical address.

24. The processor of claim 23, wherein the target default cache space is further configured to determine a prefetch cache space, obtain target information stored corresponding to the physical address based on the physical address, and send the target information to the prefetch cache space,

the prefetch cache space is at least one of the first level cache space and the at least one preset cache space.

25. An electronic device comprising a processor according to claim 23 or 24.