CN114925001A

CN114925001A - Processor, page table prefetching method and electronic equipment

Info

Publication number: CN114925001A
Application number: CN202210548146.4A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Biren Intelligent Technology Co Ltd
Current assignee: Shanghai Biren Intelligent Technology Co Ltd
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2022-08-19

Abstract

A processor, a page table prefetching method and an electronic device are provided. The processor comprises a system memory management unit, a page table pre-fetch controller and at least one address request unit. Each address request unit is configured to send at least one target request, each target request includes a target virtual address, the target virtual address includes a plurality of address segments, the plurality of address segments includes first to nth address segments, N is an integer greater than 1, and the first to nth address segments are respectively used for querying first to nth page tables. The system storage management unit is configured to receive the target request sent by each address request unit to obtain a plurality of target virtual addresses, and perform address translation on the plurality of target virtual addresses to obtain a physical address. The page table pre-fetching controller is configured to judge whether to pre-fetch for the K-th level page table based on the current target virtual address according to a plurality of target virtual addresses, wherein K is more than or equal to 1 and less than N, and K is an integer. The processor may selectively prefetch page tables.

Description

Processor, page table prefetching method and electronic equipment

Technical Field

Embodiments of the present disclosure relate to a processor, a page table prefetching method, and an electronic device.

Background

In the field of computer technology, a programmer may write a program by using any Virtual Address (VA) within a system specified range, and an Address used when a Central Processing Unit (CPU) executes an application is a Virtual Address. When allocating memory to a process, a virtual Address needs to be mapped to a Physical Address (PA), and the Physical Address is a real Physical memory access Address. The differentiated use of virtual addresses and physical addresses has become the mainstream trend of the industry.

Disclosure of Invention

At least one embodiment of the present disclosure provides a processor, including a system memory management unit, a page table prefetch controller, at least one address request unit, wherein the at least one address request unit is communicatively coupled to the system memory management unit, and the page table prefetch controller is communicatively coupled to the system memory management unit; each address request unit is configured to send at least one target request, each target request includes a target virtual address, the target virtual address includes a plurality of address segments, the plurality of address segments includes first to nth address segments, N is an integer greater than 1, and the first to nth address segments are respectively used for querying first to nth page tables; the system storage management unit is configured to receive a target request sent by each address request unit to obtain a plurality of target virtual addresses, and perform address translation on the plurality of target virtual addresses to obtain a plurality of physical addresses respectively corresponding to the plurality of target virtual addresses; the page table pre-fetching controller is configured to judge whether to pre-fetch operation aiming at a K-level page table is carried out based on the current target virtual address according to the target virtual addresses, wherein K is more than or equal to 1 and is less than N, and K is an integer.

For example, an embodiment of the disclosure provides a processor, where the processor includes only one address request unit, and the page table prefetch controller is located in the address request unit.

For example, in a processor provided by an embodiment of the present disclosure, the page table prefetch controller is located within the system memory management unit.

For example, in a processor provided in an embodiment of the present disclosure, determining whether to perform a prefetch operation for the K-th level page table based on a current target virtual address according to the plurality of target virtual addresses includes: judging whether regularity exists in each K-level page table contained in each of the target virtual addresses; in response to regularity existing in each K-th level page table, determining to perform a pre-fetch operation for the K-th level page table based on a current target virtual address.

For example, in a processor provided in an embodiment of the present disclosure, the regularity includes: the K-th page tables are sequentially continuous, or the K-th page tables have intervals smaller than a first preset threshold.

For example, in the processor provided in an embodiment of the present disclosure, when there is regularity in each of the kth level page tables, when K ≧ 2, the first-level page tables to the kth-1 level page tables corresponding to the first address segment to the kth-1 address segment respectively included in the plurality of target virtual addresses are respectively the same.

For example, in the processor provided in an embodiment of the present disclosure, the page table prefetch controller is further configured to generate prefetch directive information in response to determining that a prefetch operation is performed for the kth-level page table based on a current target virtual address, and send the prefetch directive information to the system memory management unit.

For example, in the processor provided in an embodiment of the present disclosure, the prefetch indication information includes flag information and number information, where the flag information indicates that a prefetch operation for the K-th page table is required when the flag information is a valid value, the flag information indicates that a prefetch operation for the K-th page table is not required when the flag information is an invalid value, and the number information indicates the number of the K-th page tables that need to be prefetched.

For example, in the processor provided in an embodiment of the present disclosure, the system memory management unit is further configured to receive the prefetch indication information, and in response to the flag information being a valid value, perform a prefetch operation for the K-th page table according to the number information.

For example, in a processor provided by an embodiment of the present disclosure, the system memory management unit includes a translation look-aside buffer configured to determine whether there is a hit page table during address translation, and store a prefetched kth-level page table.

For example, in the processor provided in an embodiment of the present disclosure, the number of the K-th level page tables obtained by prefetching is greater than or equal to a difference between the number of the target virtual addresses and 1.

For example, in a processor provided in an embodiment of the present disclosure, the system memory management unit further includes an address buffer configured to store the plurality of target virtual addresses received by the system memory management unit.

For example, in the processor provided in an embodiment of the present disclosure, the page table prefetch controller is further configured to determine, according to the plurality of target virtual addresses, whether to perform a prefetch operation for the nth-level page table based on a current target virtual address.

For example, in a processor provided in an embodiment of the present disclosure, determining whether to perform a prefetch operation for the nth-level page table based on a current target virtual address according to the plurality of target virtual addresses includes: judging whether regularity exists in each Nth-level page table contained in each of the plurality of target virtual addresses; in response to regularity existing in each Nth-level page table, determining to perform a pre-fetching operation for the Nth-level page table based on a current target virtual address.

For example, in a processor provided by an embodiment of the present disclosure, the regularity includes: and the N-th-stage page tables are sequentially continuous, or the N-th-stage page tables have intervals smaller than a second preset threshold value.

For example, in the processor provided in an embodiment of the present disclosure, when there is regularity in each nth level page table, the first level page table to the nth-1 level page table corresponding to the first address segment to the nth-1 address segment included in the plurality of target virtual addresses are respectively the same.

At least one embodiment of the present disclosure further provides a page table prefetching method, which is used in a processor provided in any embodiment of the present disclosure, where the method includes: and judging whether to perform pre-fetching operation aiming at the K-level page table based on the current target virtual address according to the plurality of target virtual addresses.

For example, an embodiment of the present disclosure provides a method further including: and judging whether to perform pre-fetching operation aiming at the Nth-level page table based on the current target virtual address according to the plurality of target virtual addresses.

At least one embodiment of the present disclosure also provides an electronic device including the processor provided in any one of the embodiments of the present disclosure.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description only relate to some embodiments of the present disclosure and do not limit the present disclosure.

FIG. 1 is a basic schematic diagram of the operation of a system memory management unit;

fig. 2 is a schematic block diagram of a processor provided in some embodiments of the present disclosure;

FIG. 3 is a flowchart illustrating a page table pre-fetch controller in a processor according to some embodiments of the present disclosure determining whether to perform a pre-fetch operation on a K-th level page table;

FIG. 4 is a flowchart illustrating a page table prefetch controller in a processor determining whether to perform a prefetch operation with respect to an Nth level page table according to some embodiments of the present disclosure;

FIG. 5 is a block diagram of a processor according to some embodiments of the present disclosure;

FIG. 6 is a block diagram of another processor provided in some embodiments of the present disclosure;

FIG. 7 is a flowchart illustrating a page table prefetching method according to some embodiments of the disclosure;

fig. 8 is a schematic block diagram of an electronic device provided by some embodiments of the present disclosure;

fig. 9 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.

Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. Also, the use of the terms "a," "an," or "the" and similar referents do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that the element or item preceding the word comprises the element or item listed after the word and its equivalent, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used only to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

A System Memory Management Unit (SMMU) is widely applied to various Systems On Chip (SOC), and is mainly used for converting a virtual address into a physical address, that is, converting the virtual address into a corresponding physical address. Implementing a virtual to physical address translation requires the SMMU to go to memory (e.g., DDR) multiple times to read the page table for translation, which takes a lot of time to read the page table, resulting in a virtual to physical address translation that takes a long time, thus reducing the performance of the SOC.

In order to improve the efficiency of address translation, a cache is generally integrated inside the SMMU. The cache integrated within SMMU is called a Translation Lookaside Buffer (TLB), which may also be referred to as a page table cache, a Translation Lookaside Buffer, or the like. The TLB is a cache of the CPU used by the SMMU to improve the speed of virtual to physical address translation. A TLB typically has a fixed number of spatial slots for storing tag page table entries that map virtual addresses to physical addresses, with the search key being the virtual address and the search result being the physical address. If the requested virtual address is present in the TLB, a very fast match result is given, and the resulting physical address can then be used to access memory. If the requested virtual address is not in the TLB, a tag page table is used to perform the virtual address to physical address translation, which requires multiple reads of the DDR, which can take a long time.

FIG. 1 is a basic diagram of the operation of a system memory management unit. As shown in fig. 1, a pre-designed circuit function block applied to an Application Specific Integrated Circuit (ASIC) or an editable logic device (FPGA) or other applicable scenario is called an Intellectual Property Core (IP Core), or IP. The IP may be any logic or functional block used in an ASIC or FPGA or other suitable scenario, such as a filter, a memory controller, an interface program, etc., and embodiments of the present disclosure are not limited in this respect. Since the IP needs to use an address when implementing its preset function, and a request for the address is issued, the IP may also be referred to as an address requesting unit in the description herein.

In the first step, the IP issues an operation virtual address. After receiving the virtual address, the SMMU starts to translate the virtual address. For example, SMMU first looks into its internal TLB and if the TLB hits, the translated physical address may be output directly from the hit page table. If the TLB is missed, as shown in the second step, the SMMU needs to read the page table from the DDR, and after reading the page table, the SMMU converts the virtual address into a physical address according to the content of the page table and outputs the physical address. Meanwhile, the read page table is cached in the TLB, and the virtual address is received by the subsequent SMMU, so that the virtual address can be directly hit to the TLB, and the virtual address is directly converted into a physical address to be output. Finally, as shown in step (c), the SMMU issues the translated physical address, so that the corresponding operation can be performed.

Schemes to increase address translation efficiency typically go around increasing the hit rate of the TLB. The probability of a page table hitting the TLB in a virtual to physical address translation is raised, and the following two common ways are used.

One way is to increase the capacity of the SMMU internal TLB. By increasing the capacity of the TLB, the TLB can store more page tables, so that the probability of hitting the TLB by the page tables during the translation from the virtual address to the physical address is improved, and the efficiency of the translation from the virtual address to the physical address is improved. However, this approach may significantly increase the area and power consumption of the chip, and eventually increase the cost of the chip, so the increase of the capacity of the TLB is limited.

Another way is to prefetch adjacent page tables to the TLB. That is, after a virtual address arrives, when the page table corresponding to the virtual address is fetched, the page table corresponding to the address (labeled VA1) adjacent to the virtual address is also fetched to the TLB, and when an operation of VA1 is subsequently received, the operation can be directly hit to the TLB, thereby increasing the hit rate of the subsequent virtual address to physical address translation. This approach may increase the hit rate of the TLB to some extent, but prefetching the page table may take up space in the TLB. Because the total space of the TLB is limited, each time the SMMU fetches the page table from the DDR, some page tables are prefetched back, and the prefetched page tables occupy the space of the TLB, so that when the TLB resources are in a tight state, the prefetched page tables kick other page tables out of the TLB, which affects the efficiency of address translation, and the improvement of the TLB hit efficiency is not obvious. Therefore, this approach is blind and has a limited effect on improving the conversion efficiency.

At least one embodiment of the disclosure provides a processor, a page table prefetching method and an electronic device. The processor can selectively prefetch the page table, avoid blindness, reduce the storage space occupied by invalid page tables in the translation look-aside buffer, improve the hit probability, reduce bus congestion caused by a large number of prefetch page tables, reduce the occupation of bus resources and reduce the required increased hardware overhead.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. It should be noted that the same reference numerals in different figures will be used to refer to the same elements already described.

At least one embodiment of the present disclosure provides a processor including a system memory management unit, a page table prefetch controller, at least one address request unit. The at least one address request unit is communicatively coupled to the system memory management unit, and the page table prefetch controller is communicatively coupled to the system memory management unit. Each address request unit is configured to send at least one target request, each target request includes a target virtual address, the target virtual address includes a plurality of address segments, the plurality of address segments includes first to nth address segments, N is an integer greater than 1, and the first to nth address segments are respectively used for querying first to nth page tables. The system storage management unit is configured to receive the target request sent by each address request unit to obtain a plurality of target virtual addresses, and perform address translation on the plurality of target virtual addresses to obtain a plurality of physical addresses respectively corresponding to the plurality of target virtual addresses. The page table pre-fetching controller is configured to judge whether to pre-fetch for the K-th level page table based on the current target virtual address according to a plurality of target virtual addresses, wherein K is more than or equal to 1 and less than N, and K is an integer.

Fig. 2 is a schematic block diagram of a processor according to some embodiments of the present disclosure. As shown in FIG. 2, in some embodiments, processor 100 includes a system memory management unit 110, a page table prefetch controller 120, at least one address request unit 130.

For example, at least one address requesting unit 130 is communicatively coupled to the system memory management unit 110. For example, in some examples, when the processor 100 includes a plurality of address requesting units 130, all of the address requesting units 130 are communicatively connected to the system memory management unit 110, that is, the plurality of address requesting units 130 are commonly connected to the same system memory management unit 110. For example, a pre-designed circuit function module applied to an Application Specific Integrated Circuit (ASIC) or an editable logic device (FPGA) or other applicable scenarios is called an Intellectual Property Core (IP Core), or also called an IP, and the address request unit 130 may be an Intellectual Property Core. Address request unit 130 may be any logic or functional block used in an ASIC or FPGA or other applicable scenarios, such as a filter, a memory controller, an interface program, and the like, which are not limited in this disclosure.

Each address request unit 130 is configured to send at least one target request. The target request may be any request that is compatible with the function of the address requesting unit 130, such as any type of request, for example, a read request, a write request, a delete request, a reply request, and the like, which is not limited by the embodiments of the present disclosure.

For example, each target request includes a target virtual address, that is, a virtual address required to be used in an operation corresponding to the target request is referred to as a target virtual address. For example, the target virtual address is a virtual address that needs to be converted, such as a virtual address carried by some request or instruction. SMMU needs to translate the target virtual address into a corresponding physical address in order to execute the corresponding request or instruction. For example, in some examples, the target virtual address may be a storage address of data that needs to be acquired in a read request, or may also be a storage address of a target location that needs to perform a write operation in a write request, or an address to be used in other types of requests, which is not limited in this embodiment of the present disclosure.

For example, the target virtual address includes a plurality of address segments including a first address segment through an nth address segment, N being an integer greater than 1. The first address segment to the nth address segment are respectively used for querying the first-level page table to the nth-level page table, that is, the first address segment is used for querying the first-level page table, the second address segment is used for querying the second-level page table, and so on, and the nth address segment is used for querying the nth-level page table.

The translation from a virtual address to a physical address is usually performed by performing a multi-level page table lookup, and performing a page table lookup with different levels depends on different bits (bits) of the virtual address. For example, in some examples, assume that a virtual address is 36 bits wide, e.g., 0xaa _ bb _ cc _ 000. "aa" is used for addressing of the first stage page table, and "aa" is the first address segment; "bb" is used for addressing of the second stage page tables, "bb" is the second address segment; "cc" is used for addressing of the third level page table, and "cc" is the third address segment. The page table granularity is 4 KB. In this example, N is 3, the SMMU is divided into three page tables, and by performing three page table lookups, the corresponding physical address can be finally obtained.

It should be noted that, in the embodiment of the present disclosure, the bit number of the virtual address is not limited to 36 bits, and may also be any other bit number, and the address segment dividing manner and the representation manner of the virtual address are not limited, which may be determined according to actual requirements. The page table is not limited to three stages, and may have any other number of stages. The detailed description of the multi-level page table addressing can be made with reference to conventional designs, which are not described in detail herein. The value of N is not limited and may be any integer greater than 1, depending on the number of page table levels.

For example, the system storage management unit 110 is configured to receive the target request sent by each address requesting unit 130 to obtain a plurality of target virtual addresses, and perform address translation for the plurality of target virtual addresses to obtain a plurality of physical addresses respectively corresponding to the plurality of target virtual addresses. For example, the system storage management unit 110 may sequentially translate each target virtual address one by one, thereby sequentially obtaining each corresponding physical address. The manner of address translation performed by the system memory management unit 110 may refer to the manner described in fig. 1, and is not described herein again.

For example, page table prefetch controller 120 is communicatively coupled to system memory management unit 110. The page table pre-fetch controller 120 is configured to determine whether to pre-fetch a K-th level page table based on a current target virtual address according to a plurality of target virtual addresses, where K is an integer and is greater than or equal to 1 and less than N. It should be noted that, since K < N, the kth page table is not the nth page table, that is, the kth page table is not the last page table, and the kth page table is used to continuously query the next page table, so that the physical address cannot be directly obtained from the kth page table.

For example, the current target virtual address refers to a target virtual address currently undergoing address translation among the plurality of target virtual addresses. When performing page table walk for the K-th address segment in the current target virtual address, the page table prefetch controller 120 may determine whether to perform a prefetch operation for the K-th page table according to the multiple target virtual addresses, that is, determine whether to prefetch another K-th page table when obtaining the K-th page table corresponding to the K-th address segment in the current target virtual address.

For example, in some examples, system storage management unit 110 includes translation look-aside buffer 111. The translation look-aside buffer 111 is configured to determine whether there is a hit page table during address translation and to store a prefetched K-th level page table. The translation lookaside buffer 111 may be a TLB as previously described.

For example, in some examples, processor 100 includes only one address request unit 130, i.e., the number of address request units 130 in processor 100 is 1. In this case, page table prefetch controller 120 may be located within address request unit 130. For example, in other examples, page table prefetch controller 120 is located within system memory management unit 110, and the number of address request units 130 may be one or more.

Fig. 3 is a flowchart illustrating a page table pre-fetching controller in a processor according to some embodiments of the disclosure determining whether to perform a pre-fetching operation on a K-th level page table. For example, the page table prefetch controller 120 determines whether to perform a prefetch operation for the K-th level page table based on the current target virtual address according to the target virtual addresses, and a specific determination manner of the page table prefetch controller 120 includes the following operations, as shown in fig. 3.

Step S11: judging whether regularity exists in each K-level page table contained in each of the plurality of target virtual addresses;

step S12: in response to regularity of each of the K-th level page tables, determining to perform a pre-fetch operation for the K-th level page table based on a current target virtual address.

For example, in step S11, whether there is regularity in each K-th level page table may include: the K-th page tables are sequentially continuous, or the K-th page tables have intervals smaller than a first preset threshold. For example, the first preset threshold may be any value, which may be determined according to actual requirements, and the embodiments of the present disclosure are not limited thereto. Of course, the regularity is not limited to the above manner, and may be other types of regularity, such as continuity with intervals, conformity function relationship, multiple relationship, etc., as long as there is predictability in each K-th page table included in each target virtual address, it can be considered that there is regularity.

For example, in some examples, when there is regularity in each K-th page table, when K ≧ 2, the first to K-1-th page tables corresponding to the first to K-1-th address segments included in the plurality of target virtual addresses are respectively the same, which ensures that the high bits of the virtual address where the address segment for the first to K-1-th page tables is located for lookup are consistent with the high bits of the virtual address where the address segment corresponding to the prefetch page table is located. Of course, in other examples, the first to K-1 level page tables corresponding to the first to K-1 address segments may also be different, which is not limited in the embodiment of the present disclosure.

For example, in step S12, if there is regularity in each of the K-th page tables, it is determined that a prefetch operation is performed for the K-th page table based on the current target virtual address. That is, due to regularity, when the K-th page table of the current target virtual address is obtained, at least one other K-th page table different from the K-th page table is prefetched, and the prefetched K-th page table may be a page table required by other target virtual addresses during address translation.

For example, page table prefetch controller 120 is further configured to generate prefetch directive information and send the prefetch directive information to system memory management unit 110 in response to determining that a prefetch operation is to be performed for the K-th level page table based on the current target virtual address.

For example, the prefetch indication information includes flag information and number information. The flag information indicates whether or not the page table needs to be prefetched, and when the flag information has a valid value (for example, 1), it indicates that a prefetch operation for the K-th page table needs to be performed, and when the flag information has an invalid value (for example, 0), it indicates that the prefetch operation for the K-th page table is not performed. The number information indicates the number of K-th level page tables that need to be prefetched. For example, the number information may be any number value such as 1, 2, 3, 5, etc., which is not limited by the embodiment of the present disclosure, and the number information is valid only when the flag information is a valid value. For example, the size (e.g., the number of bits) of the number information may be determined according to actual requirements.

The system memory management unit 110 knows whether and how many page tables to prefetch based on the prefetch indication information. For example, in some examples, the flag information is 1 and the number information is 2, then system memory management unit 110 learns that a page table needs to be prefetched and prefetches 2 page tables. For example, in other examples, if the flag information is 0 and the number information is 2, the system memory management unit 110 knows that it is not necessary to prefetch the page table, and at this time, since the flag information is 0, the number information does not have meaning even if it is 2.

For example, the system memory management unit 110 is further configured to receive the prefetch indication information, and in response to the flag information being a valid value, perform a prefetch operation with respect to the K-th level page table according to the number information. When the received flag information is a valid value, it indicates that a prefetch operation for the K-th page table is required, and therefore the system memory management unit 110 performs the prefetch operation according to the number information. For example, in some examples, if the number information has a value of 3, the system memory management unit 110 prefetches 3K-th page tables, that is, when the K-th page table that is otherwise needed is obtained, 3 other K-th page tables (for example, consecutive 3K-th page tables) are prefetched.

For example, in some examples, the number of K-th level page tables prefetched is greater than or equal to the difference between the number of target virtual addresses and 1. For example, if the system memory management unit 110 currently receives 5 target virtual addresses and performs address translation for a certain target virtual address, the number of the K-th level page tables obtained by prefetching is greater than or equal to 4, so that the remaining 4 target virtual addresses are hit in the translation look-aside buffer 111 as much as possible during the address translation process, thereby improving the efficiency of address translation. Of course, the embodiments of the present disclosure are not limited thereto, and the number of the K-th level page tables obtained by prefetching is not limited to and may not satisfy the above relationship, which may be determined according to actual requirements, for example, according to the regular characteristics among multiple target virtual addresses.

For example, in some examples, where the page table prefetch controller 120 is located within the system memory management unit 110, the system memory management unit 110 further includes an address buffer configured to store a plurality of target virtual addresses received by the system memory management unit 110, thereby facilitating the page table prefetch controller 120 to determine whether to perform a page table prefetch based on the plurality of target virtual addresses. For example, the larger the storage space of the address buffer, the larger the number of virtual addresses that can be stored, and the stronger the adaptive page table fetching capability of the page table prefetch controller 120, the more "intelligent" the decision is. For example, in some examples, the storage space of the address buffer is not set to be too large, e.g., about 20 virtual addresses can be cached, thereby not resulting in an excessive increase in chip area and power consumption.

For example, the page table prefetch controller 120 is further configured to determine whether to perform a prefetch operation for the nth stage page table (i.e., the last stage page table) based on the current target virtual address according to the plurality of target virtual addresses.

Fig. 4 is a flowchart illustrating a page table prefetch controller in a processor determining whether to perform a prefetch operation with respect to an nth-level page table according to some embodiments of the present disclosure. The specific decision manner of the page table prefetch controller 120 includes the following operations, as shown in fig. 4.

Step S21: judging whether each Nth-level page table contained in the target virtual addresses has regularity or not;

step S22: in response to regularity of each nth level page table, determining to perform a pre-fetch operation for the nth level page table based on a current target virtual address.

For example, in step S21, whether there is regularity in each nth-level page table may include: the N-th level page tables are sequentially continuous, or the N-th level page tables have an interval smaller than a second preset threshold. For example, the second preset threshold may be any value, which may be determined according to actual requirements, and the embodiments of the present disclosure are not limited thereto. Of course, the regularity is not limited to the above manner, and may be other types of regularity, such as continuity with intervals, conformity function relationship, multiple relationship, etc., as long as there is predictability in each nth level page table included in each target virtual address, it can be considered that there is regularity.

For example, in some examples, in a case that there is regularity in each nth level page table, the first to nth-1 level page tables corresponding to the first to nth-1 address segments included in the plurality of target virtual addresses are respectively the same, which ensures that the high bits of the virtual address where the address segments for the queries of the first to nth-1 level page tables are located are consistent with the high bits of the virtual address where the address segment corresponding to the prefetch page table is located. Of course, in other examples, the first to N-1 level page tables corresponding to the first to N-1 address segments may also be different, and the embodiment of the disclosure is not limited thereto.

For example, in step S22, if there is regularity in each nth level page table, it is determined that a prefetch operation is performed for the nth level page table based on the current target virtual address. That is, when the nth-level page table of the current target virtual address is obtained, at least one other nth-level page table different from the nth-level page table is prefetched at the same time, and the prefetched nth-level page table may be a page table required by other target virtual addresses during address translation.

It should be noted that, in the embodiment of the present disclosure, the pre-fetched page table may be a K-th level page table (non-last level page table) or an N-th level page table (last level page table), that is, the pre-fetched page table may be any one of the page tables, thereby improving the capability and adaptability of the pre-fetched page table.

In the processor provided by the embodiment of the disclosure, when the page table prefetching is performed, the adjacent page tables are not directly prefetched, but determination is made to determine whether to prefetch the page table and how many page tables to prefetch. Therefore, the system storage management unit can selectively make the operation of prefetching or not by judging whether to prefetch the page table and how many page tables are prefetched and transmitting the prefetching indication information of prefetching or not and prefetching how many page tables to the system storage management unit. Therefore, the effectiveness of the page table obtained by prefetching is high, the hit probability in the subsequent address translation operation is high, and the pertinence is strong.

Moreover, whether the pre-fetching operation is carried out on the K-level page table or the N-level page table is judged based on a plurality of target virtual addresses, the K-level page table or the N-level page table can be pre-fetched purposefully, blindness is avoided, and the storage space occupied by the invalid page table in the translation look-aside buffer can be reduced. If it is determined that the K-th page table or the N-th page table is to be prefetched, the prefetched K-th page table or N-th page table is stored in the translation look-aside buffer, so that the hit probability can be improved when a subsequent address translation operation of the target virtual address is processed. And because the K-level page table or the N-level page table is selectively prefetched, the number of the prefetched page tables can be reduced, thereby reducing the bus congestion caused by a large number of prefetched page tables, reducing the occupation of bus resources and being realized by only increasing little hardware overhead.

Fig. 5 is a schematic structural diagram of a processor according to some embodiments of the present disclosure, and fig. 5 illustrates an example in which the page table prefetch controller 120 is located in the address request unit 130.

As shown in FIG. 5, page table Prefetch controller 120 is represented by Prefetch _ ctrl, address request unit 130 is represented by IP, and system memory management unit 110 is represented by SMMU.

Prefetch _ ctrl is set in the IP, and Prefetch _ ctrl may determine whether to perform a page table Prefetch according to a series of virtual addresses VA that the IP prepares to send to the SMMU, and provide Prefetch indication information to the SMMU. For example, the Prefetch indication information output by the prefetcch _ ctrl includes two signals, i.e., flag information prefetcch _ flag and number information prefetcch _ cnt. The flag information Prefetch _ flag is used to indicate whether the virtual address needs to Prefetch a page table, when the flag information Prefetch _ flag has a valid value (e.g., 1), it indicates that the page table needs to be prefetched, and when the flag information Prefetch _ flag has an invalid value (e.g., 0), it indicates that prefetching is not needed. The number information prefetcch _ cnt indicates the number of page tables to be prefetched, and the signal is valid only when the flag information prefetcch _ flag is valid. In the process of converting the virtual address VA to the physical address PA, the SMMU knows whether and how many page tables to prefetch based on the indication information of these two signals. For example, the pre-fetched page table may be a K-th page table (non-last page table) or an N-th page table (last page table), that is, the pre-fetched page table may be any page table.

For example, when IP sends a virtual address to SMMU, IP is the active initiator of the virtual address. Therefore, when a series of virtual addresses are sent, Prefetch _ ctrl, which is set inside the IP, can perceive the relationship between these virtual addresses. Here, the relationship between virtual addresses may refer to some cases as follows. In the first case, whether the virtual addresses are in the same address range (the address range corresponds to the same page table); second, whether the virtual addresses are in adjacent address ranges (the page tables corresponding to the adjacent address ranges are also adjacent); in the third case, the virtual addresses are in how many address intervals they are adjacent to. Of course, the above scenarios are merely examples, which do not constitute a limitation on the embodiments of the present disclosure.

For example, in some examples, assuming a page table granularity of 4KB, the IP will issue 5 virtual addresses VA 0-VA 4 in a period of time, with values VA 0: 0x8020 — 0000, VA 1: 0x8020_0100, VA 2: 0x8020 — 1000, VA 3: 0x8020 — 3000, VA 4: 0x8020 — 7000. These 5 virtual addresses have the following regularity: VA0 and VA1 are in the same 4KB interval and correspond to the same page table; the page table corresponding to VA2 is immediately adjacent to the page table corresponding to VA 0; the difference between VA3 and VA0 is 3 page table granularities; there is a difference of 7 page table granularity between VA4 and VA 0. Therefore, when the IP sends VA0, prefetcch _ ctrl set inside it can sense the regularity between these 5 virtual addresses and determine whether to perform a page table Prefetch (which case requires a Prefetch).

Therefore, when the IP sends VA0 to SMMU, it can pass the information to SMMU as well as whether page table prefetching is needed. The information indicating whether or not page table prefetching is required is generated by Prefetch _ ctrl set in the IP, using Prefetch instruction information. For example, the Prefetch indication information output by the prefetcch _ ctrl includes two signals, i.e., flag information prefetcch _ flag and number information prefetcch _ cnt. The SMMU knows whether the page table needs to be prefetched and how many page tables are prefetched according to the flag information prefech _ flag and the number information prefech _ cnt, and executes corresponding prefetching operation. In this example, during the translation process for VA0, when the SMMU fetches a page table, the adjacent 7 page tables can be fetched, and then when the IP subsequently sends VA1 to VA4 to the SMMU, the IP can directly hit the corresponding page table.

For example, in still other examples, in the processor architecture shown in FIG. 5, assuming the granularity of the page table is 4KB, the IP will issue 5 virtual addresses VA 0-VA 4 in a period of time, whose values are VA 0: 0x8020 — 0000, VA 1: 0x8020_0100, VA 2: 0x8020_0200, VA 3: 0x8020_0300, VA 4: 0x8020_ 0400. These 5 virtual addresses are in the same 4KB range, and they correspond to the same page table. Therefore, when the IP sends VA0, Prefetch _ ctrl arranged inside it can sense the regularity between these 5 virtual addresses and determine whether to perform page table prefetching (this case does not require prefetching). Therefore, when the IP issues VA0 to SMMU, it can pass the information to SMMU as well whether page table prefetching is needed. In the process of converting the VA0, the SMMU only needs to fetch the corresponding page table without prefetching other page tables, and when the IP sends VA 1-VA 4 to the SMMU subsequently, the IP can directly hit the same page table.

Fig. 6 is a schematic structural diagram of another processor according to some embodiments of the present disclosure, and fig. 6 shows an example in which the page table prefetch controller 120 is located in the system memory management unit 110.

As shown in FIG. 6, the page table Prefetch controller 120 is denoted by Adaptive _ Prefetch, the system memory management unit 110 is denoted by SMMU, and the 4 address request units 130 are denoted by IP0, IP1, IP2, and IP3, respectively. For example, the address request units IP0, IP1, IP2 and IP3 are in signal connection with SMMU through a Network On Chip (NOC), and the address request units IP0, IP1, IP2 and IP3 share a set of SMMU.

The Adaptive _ prefetcch is arranged inside the SMMU, and may determine whether to perform page table prefetching according to a series of virtual addresses VA received by the SMMU, and generate Prefetch indication information, where the Prefetch indication information may be transmitted to the SMMU (where the SMMU performs a Prefetch operation) or transmitted to the DDR (where the DDR has a corresponding processing function and receives the Prefetch indication information). Likewise, the Prefetch indication information generated by Adaptive _ prefetcch includes two signals, i.e., flag information prefetcch _ flag and number information prefetcch _ cnt. For the description of the flag information prefetcch _ flag and the number information prefetcch _ cnt, reference may be made to the above description, and details thereof are not repeated herein. The SMMU knows from the indication of these two signals whether and how many page tables to prefetch. For example, the pre-fetched page table may be a K-th page table (non-last page table) or an N-th page table (last page table), that is, the pre-fetched page table may be any page table.

For example, to make Adaptive page table fetching of Adaptive _ prefetcch more capable, a sufficiently large cache space may be reserved inside SMMU to store received virtual address operations. As shown in fig. 6, in the case where the Adaptive _ prefetcch is located within the SMMU, the SMMU further includes an address buffer configured to store the virtual address (or corresponding virtual address operation) received by the SMMU. For example, the address buffer is denoted by VA _ buffer. The larger the storage space of VA _ buffer is, the more virtual addresses can be stored, the stronger the Adaptive page table fetching capability of Adaptive _ Prefetch is, and the more intelligent the decision is. For example, in some examples, the storage space of the VA _ buffer is not too large, e.g., about 20 virtual addresses can be cached, thereby not causing excessive increase in chip area and power consumption.

In the example shown in fig. 6, since the page table Prefetch controller (Adaptive _ Prefetch) is arranged inside the SMMU, development efficiency is improved, development workload is reduced, and uniform processing by the SMMU is facilitated.

For example, in order to save resources, a set of SMMUs is shared by a plurality of IPs (IP0, IP1, IP2, IP 3). Multiple IPs send a series of virtual addresses to the SMMU over a period of time. After the virtual addresses reach the SMMU, the virtual addresses are cached in the VA _ buffer of the SMMU, and the SMMU waits for the page table to be retrieved and translated out. These virtual addresses are cached within SMMU at the same time for a period of time, so Adaptive _ prefetcch can resolve the relationship between these virtual addresses. Here, the relationship between virtual addresses may refer to some cases as follows. In the first case, whether the virtual addresses are in the same address range (the address range corresponds to the same page table); second, whether the virtual addresses are in adjacent address intervals (the page tables corresponding to the adjacent address intervals are also adjacent); in the third case, the virtual addresses are in how many address intervals they are adjacent to. Of course, the above scenarios are merely examples, which do not constitute a limitation on the embodiments of the present disclosure.

It should be noted that the example of how to determine whether and how many page tables need to be prefetched described above in connection with fig. 5 is equally applicable to the processor architecture shown in fig. 6, except that multiple virtual addresses upon which the determination is based have been sent to SMMU and buffered in VA _ buffer. For example, when the plurality of virtual addresses are VA0 to VA4, when performing an address translation operation of VA0, regularity between VA0 to VA4 may be analyzed and it may be determined whether or not page table prefetching is performed.

It should be noted that, in the embodiment of the present disclosure, the system memory management unit 110, the translation lookaside buffer 111, the page table prefetch controller 120, and the address request unit 130 may be implemented by hardware, software, firmware, any feasible combination thereof, any architecture, an Intellectual Property Core (Intellectual Property Core), a dedicated or general circuit, a chip or a device, or a combination of a processor and a memory. The embodiments of the present disclosure are not limited with respect to the specific implementation forms of the system memory management unit 110, the translation lookaside buffer 111, the page table prefetch controller 120, and the address request unit 130.

It should be noted that the structure of the processor shown in fig. 2, fig. 5, and fig. 6 is only exemplary and not limiting, and the processor may further include other components and structures as needed, and the embodiment of the disclosure is not limited thereto. The processor may be any type of chip or circuit with Processing capability, such as a CPU, a Graphics Processing Unit (GPU), a General-purpose Graphics Processing Unit (GPGPU), and the like, and the embodiments of the present disclosure are not limited thereto.

The processor provided by the embodiment of the disclosure can selectively prefetch the page table, and can effectively reduce the number of times of reading the page table from the memory. This is briefly described below with reference to examples.

For example, in some examples, assume that 4 virtual addresses need to be address translated and that the page tables are 4-level structures, i.e., including a first level page table through a fourth level page table. In a conventional address translation operation, 16 (i.e., 4+4+4) DDR reads are required to complete the translation of the 4 virtual addresses, assuming a miss in the TLB.

In the processor provided in the embodiment of the present disclosure, if the first-level page table has regularity and can be prefetched, when the translation of the 1 st virtual address is executed, the first-level page tables corresponding to the last 3 virtual addresses are prefetched back, so that the last 3 virtual addresses may directly hit the first-level page table in the TLB during the translation process, and it is not necessary to read the corresponding first-level page table from the DDR, and therefore 13 (i.e., 4+3+3+3) DDR reads are required to complete the translation of the 4 virtual addresses.

If the second-level page table has regularity and can be prefetched, when the 1 st virtual address is converted, the second-level page tables corresponding to the last 3 virtual addresses are prefetched back, so that the last 3 virtual addresses can directly hit the second-level page table in the TLB in the conversion process, and the corresponding second-level page table does not need to be read in the DDR, and therefore, 10 (namely, 4+2+2+2) DDR reads are needed to complete the conversion of the 4 virtual addresses.

If the third-level page table has regularity and can be prefetched, when the 1 st virtual address is converted, the third-level page tables corresponding to the last 3 virtual addresses are prefetched back, so that the last 3 virtual addresses can directly hit the third-level page table in the TLB in the conversion process, and the corresponding third-level page table does not need to be read in the DDR, and therefore, 7 (namely, 4+1+1+1) DDR reads are needed to complete the conversion of the 4 virtual addresses.

If the fourth-level page table has regularity and can be prefetched, when the 1 st virtual address is converted, the fourth-level page tables corresponding to the last 3 virtual addresses are prefetched back, so that the last 3 virtual addresses can directly hit the fourth-level page table in the TLB in the conversion process, and the corresponding fourth-level page table does not need to be read in the DDR, and therefore, 4 (namely, 4+0+0+0) DDR reads are needed to complete the conversion of the 4 virtual addresses.

Therefore, the processor provided by the embodiment of the disclosure can selectively prefetch any one level of page table, effectively reduce the number of times of reading the page table from the memory, improve the hit probability, and improve the efficiency of address conversion.

At least one embodiment of the present disclosure also provides a page table prefetching method. The page table prefetching method can selectively prefetch page tables, avoid blindness, reduce the storage space occupied by invalid page tables in the translation look-aside buffer, improve the hit probability, reduce bus congestion caused by a large number of page table prefetching, reduce the occupation of bus resources, and reduce the required hardware overhead.

Fig. 7 is a flowchart illustrating a page table prefetching method according to some embodiments of the present disclosure, where the page table prefetching method is used for a processor according to some embodiments of the present disclosure. In some embodiments, as shown in FIG. 7, the page table prefetching method includes step S10.

Step S10: and judging whether to perform pre-fetching operation aiming at the K-level page table based on the current target virtual address according to the plurality of target virtual addresses.

For example, in some examples, the page table prefetching method further includes step S20.

Step S20: and judging whether to perform pre-fetching operation aiming at the N-level page table based on the current target virtual address according to the plurality of target virtual addresses.

For example, the above steps S10 and S20 may be performed by a page table prefetch controller in the processor, but of course, the embodiments of the disclosure are not limited thereto, and may also be performed by the page table prefetch controller in cooperation with other components in the processor. For detailed explanation and technical effects of step S10 and step S20, reference may be made to the above description of the processor, which is not repeated here.

It should be noted that, in the embodiment of the present disclosure, the page table prefetching method may further include more or fewer steps, and the execution order of each step is not limited, which may be determined according to actual needs.

At least one embodiment of the present disclosure also provides an electronic device. The electronic equipment can selectively prefetch the page table, avoid blindness, reduce the storage space occupied by invalid page tables in the translation look-aside buffer, improve the hit probability, reduce bus congestion caused by a large number of prefetch page tables, reduce the occupation of bus resources and reduce the required hardware overhead.

Fig. 8 is a schematic block diagram of an electronic device provided in some embodiments of the present disclosure. As shown in fig. 8, in some embodiments, the electronic device 200 includes a processor 210, and the processor 210 is a processor provided in any embodiment of the disclosure, for example, the processor 100 shown in fig. 2, the processor shown in fig. 5, or the processor shown in fig. 6. The electronic device 200 may be any device with computing functions, such as a computer, a server, a smart phone, a tablet computer, and the like, and the embodiment of the disclosure is not limited thereto.

Fig. 9 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure. As shown in fig. 9, the electronic device 300 includes a processor provided in any embodiment of the present disclosure, and the electronic device 300 is suitable for implementing the page table prefetching method provided in the embodiment of the present disclosure. The electronic device 300 may be a terminal device or a server or the like. It should be noted that the electronic device 300 shown in fig. 9 is only an example, and does not bring any limitation to the functions and the use range of the embodiment of the present disclosure.

As shown in fig. 9, the electronic device 300 may include a processing apparatus (e.g., a central processing unit, a graphics processor, etc.) 31, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)32 or a program loaded from a storage apparatus 38 into a Random Access Memory (RAM) 33. For example, the processing device 31 may be a processor provided in any embodiment of the present disclosure, such as the processor 100 shown in fig. 2, the processor shown in fig. 5, or the processor shown in fig. 6. In the RAM 33, various programs and data necessary for the operation of the electronic apparatus 300 are also stored. The processing device 31, the ROM 32, and the RAM 33 are connected to each other via a bus 34. An input/output (I/O) interface 35 is also connected to bus 34.

Generally, the following devices may be connected to the I/O interface 35: input devices 36 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 37 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 38 including, for example, magnetic tape, hard disk, etc.; and a communication device 39. The communication means 39 may allow the electronic device 300 to communicate with other electronic devices wirelessly or by wire to exchange data. While fig. 9 illustrates an electronic device 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided and that the electronic device 300 may alternatively be implemented or provided with more or less means.

For a detailed description and technical effects of the electronic device 200/300, reference may be made to the description of the processor above, which is not repeated here.

The following points need to be explained:

(1) the drawings of the embodiments of the disclosure only relate to the structures related to the embodiments of the disclosure, and other structures can refer to common designs.

(2) Without conflict, embodiments of the present disclosure and features of the embodiments may be combined with each other to arrive at new embodiments.

The above description is only a specific embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and the scope of the present disclosure should be subject to the scope of the claims.

Claims

1. A processor comprising a system memory management unit, a page table prefetch controller, at least one address request unit, wherein,

the at least one address request unit is in communication with the system memory management unit, and the page table prefetch controller is in communication with the system memory management unit;

each address request unit is configured to send at least one target request, each target request includes a target virtual address, the target virtual address includes a plurality of address segments, the plurality of address segments includes first to nth address segments, N is an integer greater than 1, and the first to nth address segments are respectively used for querying first to nth page tables;

the system storage management unit is configured to receive a target request sent by each address request unit to obtain a plurality of target virtual addresses, and perform address translation on the plurality of target virtual addresses to obtain a plurality of physical addresses respectively corresponding to the plurality of target virtual addresses;

the page table pre-fetching controller is configured to judge whether to pre-fetch operation aiming at a K-level page table is carried out based on the current target virtual address according to the target virtual addresses, wherein K is more than or equal to 1 and is less than N, and K is an integer.

2. The processor of claim 1, wherein the processor includes only one address request unit, the page table prefetch controller being located within the address request unit.

3. The processor of claim 1, wherein the page table prefetch controller is located within the system memory management unit.

4. The processor of claim 1, wherein determining from the plurality of target virtual addresses whether to perform a prefetch operation for the K-th level page table based on a current target virtual address comprises:

judging whether regularity exists in each K-level page table contained in each of the target virtual addresses;

in response to regularity of each K-th level page table, determining to perform a pre-fetching operation for the K-th level page table based on a current target virtual address.

5. The processor of claim 4, wherein the regularity comprises: the K-th page tables are sequentially continuous, or the K-th page tables have intervals smaller than a first preset threshold.

6. The processor as claimed in claim 4, wherein when the K ≧ 2, the first page table to the K-1 page table corresponding to the first address segment to the K-1 address segment respectively included in the plurality of target virtual addresses are respectively the same under the condition that regularity exists in each K-level page table.

7. The processor of claim 1, wherein the page table prefetch controller is further configured to generate prefetch directive information and send the prefetch directive information to the system memory management unit in response to determining that a prefetch operation is to be performed for the K-th level page table based on a current target virtual address.

8. The processor of claim 7, wherein the prefetch indication information includes flag information and number information,

when the flag information is an effective value, it indicates that a prefetch operation for the kth-level page table is required, and when the flag information is an invalid value, it indicates that a prefetch operation for the kth-level page table is not to be performed,

the number information indicates the number of the K-th level page tables that need to be prefetched.

9. The processor of claim 8, wherein the system memory management unit is further configured to receive the prefetch indication information and, in response to the flag information being a valid value, perform a prefetch operation for the kth-level page table according to the number information.

10. The processor of claim 9, wherein the system memory management unit comprises a translation look-aside buffer,

the translation look-aside buffer is configured to determine whether there is a hit page table during address translation, and to store a prefetched kth-level page table.

11. The processor of claim 9, wherein the number of prefetched K-th level page tables is greater than or equal to the difference between the number of target virtual addresses and 1.

12. The processor of claim 3, wherein the system memory management unit further comprises an address buffer configured to store the plurality of target virtual addresses received by the system memory management unit.

13. The processor of claim 1, wherein the page table prefetch controller is further configured to determine from the plurality of target virtual addresses whether to perform a prefetch operation for the nth level page table based on a current target virtual address.

14. The processor of claim 13, wherein determining from the plurality of target virtual addresses whether to perform a prefetch operation for the nth level page table based on a current target virtual address comprises:

judging whether regularity exists in each Nth-level page table contained in each of the plurality of target virtual addresses;

in response to regularity existing in each Nth-level page table, determining to perform a pre-fetching operation for the Nth-level page table based on a current target virtual address.

15. The processor of claim 14, wherein the regularity comprises: and the N-th-stage page tables are sequentially continuous, or the N-th-stage page tables have intervals smaller than a second preset threshold value.

16. The processor of claim 14, wherein, when there is regularity in each of the nth page tables, the first page table to the nth-1 page table corresponding to the first address segment to the nth-1 address segment included in the plurality of target virtual addresses are respectively the same.

17. A page table pre-fetching method for use in a processor as claimed in any one of claims 1 to 16, wherein the method comprises:

and judging whether to perform pre-fetching operation aiming at the K-level page table based on the current target virtual address according to the plurality of target virtual addresses.

18. The method of claim 17, further comprising:

and judging whether to perform pre-fetching operation aiming at the Nth-level page table based on the current target virtual address according to the plurality of target virtual addresses.

19. An electronic device comprising the processor of any one of claims 1-16.