CN115185867A

CN115185867A - Method for processing access request

Info

Publication number: CN115185867A
Application number: CN202210302630.9A
Authority: CN
Inventors: 邵奇
Original assignee: Haiguang Information Technology Co Ltd
Current assignee: Haiguang Information Technology Co Ltd
Priority date: 2022-03-24
Filing date: 2022-03-24
Publication date: 2022-10-14

Abstract

A method of processing an access request, a processing apparatus, an electronic device and a non-transitory readable storage medium are disclosed. The first access request is for acquiring data in a processing device comprising a multi-level buffer memory, the method of processing access requests comprising: responding to the receiving of an access request by a buffer memory of the current level, and determining the priority level corresponding to the first access request; and determining the order of the buffer memory for processing the first access request or the second access request based on the priority level corresponding to the first access request.

Description

Method for processing access request

Technical Field

Embodiments of the present disclosure relate to a method of processing an access request, a processing apparatus, an electronic device and a non-transitory readable storage medium.

Background

Each level of buffer memory (cache) may receive access requests from each request unit. These access requests include, but are not limited to, requests to read data or instructions from the buffer memory, or requests to prefetch data or/instructions from the buffer memory. After the buffer memory receives various access requests, the buffer memory usually processes the various access requests in a polling manner. For example, polling is used to place individual access requests into the pipeline. While polling is relatively simple and fair to handle individual accesses, it is often not intelligent enough to cause a CPU performance penalty. Therefore, further improvements in the scheme of processing access requests to the buffer memory are needed to improve the overall performance of the CPU.

Disclosure of Invention

At least one embodiment of the present disclosure provides a method of processing an access request, a processing apparatus, an electronic device, and a non-transitory readable storage medium.

At least one embodiment of the present disclosure provides a method of processing an access request for acquiring data in a processing apparatus including a multi-level cache memory, the method comprising: responding to the buffer memory of the current level to receive an access request, and determining the priority level corresponding to the first access request; and determining the order of the buffer memory for processing the first access request or the second access request based on the priority level corresponding to the first access request.

For example, in at least one embodiment of the present disclosure, the determining, in response to receiving an access request from a buffer memory at a current level, a priority corresponding to the first access request includes: and in response to the buffer memory of the current level receiving a priority updating command aiming at the first access request, determining the priority corresponding to the first access request.

For example, in at least one embodiment of the present disclosure, the first access request includes a priority field, and the determining, in response to the access request being received by the buffer memory of the current level, the priority corresponding to the first access request includes: and determining the priority corresponding to the first access request based on the priority indicated by the priority field in the first access request.

For example, in at least one embodiment of the present disclosure, the determining, based on the priority level corresponding to the first access request, an order in which the buffer memory processes the first access request or the second access request includes: in response to a response queue of the buffer memory including access responses and the access responses including data requested by the first access request, adjusting a priority level of the access responses based on a priority level corresponding to the first access request.

For example, in at least one embodiment of the present disclosure, the determining, based on the priority level corresponding to the first access request, an order in which the buffer memory processes the first access request or the second access request includes: responding to a request queue of the buffer memory, wherein the request queue comprises a second access request and the received first access request both request to access the same address, adjusting the priority level corresponding to the second access request based on the priority level corresponding to the first access request, and determining the order of processing the second access request in the buffer memory in a pipeline mode based on the adjusted priority level corresponding to the second access request.

For example, in at least one embodiment of the present disclosure, the determining, based on the priority level corresponding to the first access request, an order in which the buffer memory processes the first access request or the second access request includes: in response to the priority level corresponding to the first access request being a high priority, the buffer memory will preferentially process the first access request in a pipelined manner.

For example, in at least one embodiment of the present disclosure, the determining, based on the priority level corresponding to the first access request, an order in which the buffer memory processes the first access request or the second access request includes: responding to the first access request missing in the buffer memory of the current level, and recording the first access request and the priority level corresponding to the first access request in a miss history register of the buffer memory of the current level; and responding to the first access request and the priority level corresponding to the first access request, and sending the first access request to a lower-level buffer memory.

For example, in at least one embodiment of the present disclosure, the processing device further includes a processor core including a reorder buffer, the priority update command of the first access request being at least partially associated with a state of the reorder buffer.

For example, in at least one embodiment of the present disclosure, the processing apparatus further includes a processor core including an instruction fetch unit, the priority update command of the first access request being at least partially associated with a branch prediction result corresponding to the instruction fetch unit.

For example, in at least one embodiment of the present disclosure, the processing apparatus further includes a processor core including an instruction fetch unit and a reorder buffer, the priority update command of the first access request being at least partially associated with a state of the reorder buffer and a branch prediction result corresponding to the instruction fetch unit.

At least one embodiment of the present disclosure provides a processing apparatus, including: a processor core; a multi-level cache memory; wherein the processor core or the multi-level cache memory is configured to perform the above method.

At least one embodiment of the present disclosure provides an electronic device, which includes the processing apparatus described above.

At least one embodiment of the present disclosure provides a non-transitory readable storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processing device, perform the method described above.

The embodiment of the disclosure can optimize the response speed of the processing device to the access requests with different degrees of urgency by setting different priorities for the access requests, thereby improving the performance of the processing device.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings described below only relate to some embodiments of the present disclosure and do not limit the present disclosure.

FIG. 1 is a diagram of a microprocessor core architecture according to at least one embodiment of the present disclosure.

Fig. 2 is a schematic diagram of a request queue processed by a pipeline in a cache according to at least one embodiment of the present disclosure.

Fig. 3 is a flowchart of a method for processing an access request according to at least one embodiment of the present disclosure.

Fig. 4 is a schematic diagram of a processing apparatus for processing an access request according to at least one embodiment of the present disclosure.

Fig. 5 is an example scenario of adjusting a priority level of an access request in accordance with at least one embodiment of the present disclosure.

Fig. 6 is yet another example scenario of adjusting a priority level of an access request in accordance with at least one embodiment of the present disclosure.

Fig. 7 is yet another example scenario of adjusting a priority level of an access request in accordance with at least one embodiment of the present disclosure.

Fig. 8 is a schematic block diagram of a processing device provided in at least one embodiment of the present disclosure.

Fig. 9 is a schematic block diagram of an electronic device according to at least one embodiment of the present disclosure.

Fig. 10 is a schematic diagram of a non-transitory readable storage medium provided in at least one embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described below clearly and completely with reference to the accompanying drawings. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.

Unless otherwise defined, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and the like in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. Also, the use of the terms "a," "an," or "the" and similar referents do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used only to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

The following briefly introduces terms related to the present disclosure in order to facilitate the description of the present disclosure.

Buffer memory (cache): the buffer memory is a small-capacity memory interposed between the core and the main memory (e.g., external memory) and has a higher read/write speed than the main memory, thereby enabling a high-speed supply of instructions and data to the central processor core and increasing the execution speed of the program. With the increasing integration of semiconductor devices, multi-level buffer memory systems have been developed, and all control logic of each level of buffer memory is implemented by the internal controller of the level of buffer memory.

First-level buffer memory (L1 cache): the L1 Cache is a first-level Cache memory and is positioned beside a CPU kernel. Which is the most tightly coupled buffer memory with the CPU. The first-level buffer memory is divided into an instruction buffer memory and a data buffer memory, has the minimum capacity and is shared by each core/cluster.

Second level cache (L2 cache): the L2 Cache is a second-level Cache memory which is a second-level Cache memory of the CPU, the capacity of the Cache memory can directly influence the performance of the CPU, and the Cache memory is shared by each core/cluster.

Third-level buffer memory (L3 cache): the L3 Cache is a three-level Cache memory which is a third-level Cache memory of the CPU and is used for further reducing memory delay, is shared by a plurality of cores/clusters and has the largest capacity. In general, L3 is also a Last Level Cache (LLC) of the core of the multi-core processor, and L3 is coupled to an external memory.

And (3) external storage: this is a concept opposite to the Memory of the processor core, and is usually a Dynamic Random Access Memory (DRAM), which is usually connected to the three-level buffer Memory through a bus. The capacity of the external memory is large, but the access speed is slow.

Cache Line (Cache Line): a Cache line is the minimum unit of data exchange between the Cache and the memory, and is usually 32 bytes or 64 bytes.

Write Back (Write Back): and returning and writing the data in the upper-level buffer memory to the next-level buffer memory or an external memory.

Directory (Directory): the current-level buffer memory records a record table of the access data state of the previous-level buffer memory.

Prefetch (Prefetch): the data is loaded into the buffer memory in advance by predicting the read data, so that the time delay of each core for acquiring the data is reduced.

Miss history register: typical examples of the miss history Register are a Miss Address Buffer (MAB), or MSHR (Missing Status Handling Register). When a read-write and prefetch request is not in the buffer memory of the current level and needs to be read to the next level of buffer memory, the request and the corresponding attribute thereof are stored in the miss history register until the next level of buffer memory returns the data of the request.

Re-order buffer (ROB): the reorder buffer may cause instructions to be executed out-of-order and later committed in the original order. In some dynamic dispatch out-of-order execution mechanisms, there are four phases: issue (issue), execute (execute), write Result (write Result), and Commit (Commit). In the write result phase, the results of the instructions are temporarily stored in a reorder buffer. The results of the instruction execution are then stored in registers or main memory. If other instructions urgently require this result, the reorder buffer may transmit the desired data directly for it.

As shown in FIG. 1, in at least one embodiment of the present disclosure, an exemplary microprocessor architecture includes a five-stage pipeline in which each instruction may be issued every clock cycle and executed within a fixed time, e.g., 5 clock cycles. The execution of each instruction is divided into 5 steps: an Instruction Fetch (IF) stage 1001, a read Register (RD) stage 1002, an arithmetic/logic unit (ALU) stage 1003, a store (MEM) stage 1004, and a Write Back (WB) stage 1005. In the IF stage 1001, a specified instruction is fetched from the instruction cache. A portion of the fetched instruction is used to specify a source register that is available for executing the instruction. In RD stage 1002, the system fetches the contents of the specified source registers. The fetched value may be used to perform an arithmetic or logical operation in ALU stage 1003. In the MEM stage 1004, memory in the instruction-readable/write data cache is executed. Finally, in the WB stage 1005, the values obtained by executing the instruction may be written back into a certain register.

Generally, a processor core (core) processes data quickly, but reads data from a main memory for a long time, so that a current high-performance processor core generally adopts a multi-level cache (cache) for buffering the data. The cache memory may also process the cache line corresponding to the access request in a pipelined manner. For example, each cache line in the cache memory may be read, written, and updated continuously.

However, when the above-described continuous operations are performed, the structures having correlation may be simultaneously operated, thereby causing a conflict. For example, when two instructions both need to operate on the same or different cache lines (e.g., for a multi-way set-associative architecture, the same index may correspond to 4 or 8 cache lines) with the same index (index), it is necessary to determine which access request was processed first by the arbiter because of the conflict. For example, the arbiter will determine which access request corresponds to a cache line that will be read, written, or updated first.

Further, since the buffer memory may be continuously receiving a large number of access requests, a large number of conflicts may occur. The frequent occurrence of conflicts increases the latency of the buffer memory. If the cache memory processes non-critical access requests first, it may cause critical access requests to be blocked, resulting in critical access request response delays.

Fig. 2 is a schematic diagram of a request queue processed by a pipeline manner in a cache memory according to at least one embodiment of the present disclosure.

For example, in at least one embodiment of the present disclosure, as shown in FIG. 2, the cache memory utilizes an arbiter 220 to pick an access request from a request queue 210 into the read-write-update pipeline described above. For example, access requests from an upper level cache or a processor are stored in the request queue 210, the requests in the request queue 210 can enter the arbiter 220 if a preset condition is met (e.g., resources are sufficient, etc.), and the arbiter 220 has arbitration selection logic, that is, selects the access requests according to a certain rule. After the arbiter 220 picks out the access request, the access request may be processed in a pipelined manner.

It should be noted that, in the embodiment of the present disclosure, the preset condition required for the request to go from the request queue 210 to the arbiter 220 is not limited, and may be set according to actual needs. It should also be noted that the arbitration selection logic of the arbiter 220 is not limited in the embodiment of the present disclosure, for example, in one example, the arbitration selection logic may be set to fetch the request that enters the arbiter first, but may also be other selection logic (for example, in a polling manner), and may be set according to actual needs.

Because the buffer memory can continuously process access requests placed into the pipeline by the arbiter in a pipelined manner, there may be cases where consecutive access requests access the same structure. Such as a structure with the same index, there is a conflict in the pipeline processing flow of the buffer memory if the structure cannot continuously process requests.

For example, in one example, as shown in FIG. 2, assume that there are two access requests — access request A and access request B, both attempting to access a cache line in L1 cache corresponding to index 5, and 8 cache lines corresponding to index 5. Assume that access request A desires to access the first cache line in the L1 cache corresponding to index 5, and the first cache line in the L1 cache corresponding to index 5 does not store the DATA DATA-A that access request A expects to access (i.e.,base:Sub>A miss). Suppose further that access request B also attempts to access the second cache line of the 8 cache lines in the L1 cache corresponding to index 5, and that this cache line has just stored therein the DATA-B (i.e., hit) that access request B expects to access.

If the arbiter 220 selects the access requestbase:Sub>A to pipeline inbase:Sub>A round-robin manner, it may decide to replace the DATA-B in the second cache line corresponding to the index 5 based onbase:Sub>A random policy during the pipeline processing, so that the access requestbase:Sub>A can subsequently obtain the DATA-base:Sub>A desired to be accessed. For example, during pipeline processing, it is possible to look up the DATA-base:Sub>A corresponding to the access requestbase:Sub>A from the lower-level buffer memory, then replace the DATA-B stored in the second cache line corresponding to the index 5 with the DATA-base:Sub>A, and correspondingly place the access responsebase:Sub>A' of the access requestbase:Sub>A including the DATA-base:Sub>A in the response queue 230.

However, since the DATA DATA-B in the second cache line is replaced with the DATA DATA-A, later incoming pipeline access requests B will also be unable to obtain the DATA DATA-B. That is, in such a case, the pipeline conflict between the access request B and the access request a causes the access request B which should hit (hit) to miss (miss), which in turn causes the efficiency of obtaining the DATA-B by the access request B to be reduced, and it takes a long time to wait until the access response B' for the access request B is generated.

To reduce the above-mentioned conflict, at least one embodiment of the present disclosure provides a method of processing an access request. Accordingly, at least one embodiment of the present disclosure also provides a processing device, an electronic device and a non-transitory readable storage medium corresponding to the above method. Therefore, at least one embodiment of the disclosure can optimize the response to the access requests with different degrees of urgency and improve the efficiency of data access by determining the priority of the access requests.

In the following, a method for processing an access request according to at least one embodiment of the present disclosure is described in a non-limiting manner by using several examples or embodiments, and as described below, different features of these specific examples or embodiments may be combined with each other without mutual conflict, so as to obtain new examples or embodiments, which also belong to the protection scope of the present disclosure.

Fig. 3 is a flowchart of a method 30 for processing an access request according to at least one embodiment of the disclosure. Fig. 4 is a schematic diagram of a processing device 400 for processing an access request according to at least one embodiment of the present disclosure.

For example, at least one embodiment of the present disclosure provides a method 30 of processing an access request, as shown in FIG. 3. For example, the method 30 may be applied to various types of processing devices. For example, in at least one embodiment of the present disclosure, the method 30 may be applied to a processing device 400 as shown in fig. 4.

For example, in at least one embodiment of the present disclosure, as shown in fig. 3, the method 30 of processing an access request may include the following operations S301 to S302.

In operation S301, in response to the buffer memory of the current level receiving an access request, a priority level corresponding to the first access request is determined.

In operation S302, an order in which the first access request or the second access request is processed by the buffer memory is determined based on a priority level corresponding to the first access request.

For example, the buffer memory may process the first access request in a pipelined manner, or the buffer memory may process the second access request in a pipelined manner. The second access request may be associated with the first access request in various ways, e.g., the second access request may access the same data or the second access request has obtained data that the first access request expects to access. The present disclosure is not limited thereto.

According to the method 30 for processing access requests provided by at least one embodiment of the present disclosure, by determining the priority level corresponding to the first access request, the response speed to access requests with different degrees of urgency can be optimized, and the efficiency of accessing data is improved.

It should be noted that, in at least one embodiment of the present disclosure, operations S301 to S302 may be executed sequentially, may be executed in parallel, or may be executed in other adjusted orders.

It should be further noted that, in at least one embodiment of the present disclosure, the method 30 for processing an access request may selectively perform some steps in operations S301 to S302, or may perform some additional steps in addition to operations S301 to S302, which is not limited by the embodiment of the present disclosure.

Optionally, in one example of the present disclosure, the access request may correspondingly include a priority field to indicate a priority to which the access request corresponds. The current level of the buffer memory is described as an L2 buffer memory. Assume that the L2 cache receives access request C, which indicates that it needs to access data in address 0x 06. Optionally, the access request C may further indicate that its corresponding priority level is any one of low priority, medium priority, and high priority. Alternatively, the access request C may also indicate its corresponding priority numerically.

It will be appreciated by those skilled in the art that the inclusion of a priority field in an access request is merely one example of an embodiment of the present disclosure. The access request may not include a priority field. In such a case, the L2 cache memory may set the priority level corresponding to the access request C as any one of low priority, medium priority, and high priority by default. Alternatively, the L2 cache may analyze the access address corresponding to the access request C and combine the access requests of the current cache lines to assign a priority to the access request C. The present disclosure is not so limited.

For example, in one example of the present disclosure, operation S301 optionally includes determining a priority corresponding to the first access request in response to the buffer memory of the current level receiving a priority update command for the first access request. As shown in fig. 4, it is assumed that in operation S301, the buffer memory of the current level is an L2 buffer memory, and the L2 buffer memory has stored the access request C into its corresponding request queue. Next, the miss history register in the L1 cache sends a priority update command to the L2 cache, which updates the priority level of access request C.

For example, there may be cases where: the processor core may further determine that the requested access is an access request requiring preferential processing after the access request has been sent. Therefore, in such a case, the priority upgrade/downgrade to the access request can be indicated only by a simple priority update command. In some examples, the priority update command is used only to adjust the priority. In other examples, the priority update command may also include other fields to assist in implementing other functions, and the disclosure is not limited thereto.

For example, assume that the priority level corresponding to access request C is low in the request queue. And the priority level of access request C should be increased as determined by the processor core. Then, the priority update request for access request C may indicate that the priority level corresponding to access request C needs to be updated to a high priority. In such a case, the L2 cache would correspondingly determine the priority level of access request C as high priority.

In one example, after the L2 buffer memory receives the priority update command, the L2 buffer memory further queries its built-in request queue and response queue, and updates the request queue and response queue accordingly. Examples of L2 buffer memory update request queues and response queues are detailed below, and the disclosure is not limited thereto.

For example, operation S302 may further include adjusting the priority level of the access response based on the priority level corresponding to the first access request, in response to the response queue of the buffer memory including the access response and the access response including the data requested by the first access request. Assume that the L2 buffer memory finds in its response queue that the data in address 0x06 has been prefetched from the L3 buffer memory in advance based on the access request K, and stores an access response K' corresponding to the access request K into the response queue. Then correspondingly, the L2 buffer will adjust the priority level in the response queue with the access response K' based on the priority update command so that the data in address 0x06 can be sent to the L1 buffer as soon as possible. At this time, as an embodiment, an access response C' corresponding to the access request C, which correspondingly includes the data in the address 0x06, may also be directly generated correspondingly. The access response C 'may then be stored directly to the response queue and the priority level of the access response C' is set to high priority accordingly. The response queue may also determine the order in which the access responses are sent to the L1 cache based on the priority level of each access response. The present disclosure is not so limited.

For another example, operation S302 may further include, in response to that the request queue of the buffer memory includes a second access request and the second access request requests access to the same address as the received first access request, adjusting a priority level corresponding to the second access request based on the priority level corresponding to the first access request. Assume that the L2 buffer memory finds in its request queue that there is already an access request F for data in address 0x 06. Then correspondingly the L2 cache will adjust the priority level of the access request F in the request queue based on the priority update command (e.g. set the priority level of the access request F to high priority) so that the arbiter of the L2 cache preferentially picks the access request F into the pipeline. For example, the data in the address 0x06 may be fetched in advance based on the access request F, and the corresponding access request F is generated corresponding to the access response F ', and the priority level of the access response F' is correspondingly set to be the high priority. The response queue may then also determine the order in which the access responses are sent to the L1 cache based on the priority level of each access response. The present disclosure is not so limited.

For another example, operation S302 may further include that the buffer memory is to preferentially process the first access request in a pipelined manner in response to the priority level corresponding to the first access request being a high priority. Assume that the L2 buffer memory does not find any requests in its request queue and response queue attempting to access data in address 0x 06. Correspondingly, the L2 buffer will place the access request C in the request queue based on the priority update command and set a correspondingly higher priority for the access request C, so that the arbiter of the L2 buffer will preferentially pick the access request C into the pipeline.

For another example, operation S302 may further include, in response to the first access request missing in the buffer memory of the current level, the buffer memory of the current level recording the first access request and a priority level corresponding to the first access request in a miss history register thereof; and responding to the first access request and the priority level corresponding to the first access request, and sending the first access request to a lower-level buffer memory.

Continuing with any of the above examples, after the arbiter of the L2 buffer picks access request C or access request F to enter the pipeline processing flow, the L2 buffer may determine that it does not store the data at address 0x06 through the pipeline processing, and needs to further query the L3 buffer for the data at address 0x 06. The L2 cache will correspondingly record access request C or access request F in its miss history register and set access request C or access request F high accordingly, and then send access request C or access request F with high priority to the L3 cache. The L3 cache will also perform a similar process to adjust the priority level of access request C or access request F, and the disclosure will not be repeated herein.

It should be noted that fig. 4 is only an example, and in an embodiment of the present disclosure, a processing apparatus including a multi-level cache may further include more or less components, which is not particularly limited by the embodiment of the present disclosure and may be set according to actual needs.

Therefore, the embodiment of the disclosure can optimize the response speed of the processing device to the access requests with different degrees of urgency by setting different priorities for the access requests, thereby improving the performance of the processing device.

For example, the priority level of the access request may be further adjusted in at least one embodiment of the present disclosure for the scenarios described with reference to fig. 5-7.

First, an example scenario of adjusting a priority level of an access request is described with reference to fig. 5. In this example scenario, the processing device also includes a processor core (e.g., a CPU), and the CPU includes a reorder buffer. Priority update command of the first access request

At least partially associated with a state of a reorder buffer of the processing device.

In the example of fig. 5, it is assumed that the processing apparatus further includes a CPU. The CPU may execute multiple instructions in the pipeline of fig. 1 in an out-of-order (OOO) manner. For example, assume that a certain program includes 5 instructions written sequentially: instruction 1, instruction 2, instruction 3, instruction 4, instruction 5, the actual execution of these instructions in the CPU may be instruction 3, instruction 4, instruction 5, instruction 1 and instruction 2. However, although the execution order of the above instructions may be changed, the state of the CPU needs to be changed strictly in the order of instruction 1, instruction 2, instruction 3, instruction 4, and instruction 5. For example, changing the state of the CPU includes changing data corresponding to an address in the CPU, changing the state of various registers in the CPU (e.g., architectural registers of the CPU, and the reorder buffer of fig. 5).

Continuing with the example in FIG. 5, in the case of out-of-order execution, instruction 2 may be the last to execute. Therefore, even if instruction 3, instruction 4, and instruction 5 have all been executed, the reorder buffer may need to wait for instruction 2 to finish before further altering the reorder buffer state. In this example, the reorder buffer will place instruction 2 at the bottom of the reorder buffer and wait for instruction 2 to return the execution result.

If in such a case instruction 2 needs to access data stored at an address (e.g., address 0x 08) that is not available in memory in advance, the CPU will spend a long time waiting to retrieve the data from the L2 buffer or even external memory. To further save time, the reorder buffer will send an indication to the miss history register of the L1 cache to indicate that the unnamed history register returns the data stored at address 0x 08 as soon as possible. At this time, the unnamed history register of the L1 buffer sends a priority update command to the L2 buffer to instruct to increase the priority of the access request of the access address 0 × 08, so that the instruction 2 can be executed as soon as possible, the waiting time of the instructions 3, 4, and 5 in the reorder buffer is reduced, and the performance of the processing apparatus is prevented from being reduced due to the congestion of the reorder buffer.

Thus, corresponding to the scenario of fig. 5, embodiments of the present disclosure may optimize the response speed of the processing apparatus to instructions waiting in the reorder buffer by increasing the priority of access requests associated with the state of the reorder buffer, thereby improving the performance of the processing apparatus.

Next, still another example scenario of adjusting a priority level of an access request is described with reference to fig. 6. In this example scenario, the processing apparatus further includes a processor core including an instruction fetch unit, the priority update command of the first access request being at least partially associated with a branch prediction result corresponding to the instruction fetch unit.

For example, as shown in FIG. 6, the fetch module upgrades the address request when it finds a branch direction prediction error and an instruction buffer memory miss. It is assumed that the processor apparatus schematically shown in fig. 6 executes the following pseudo code.

#####################################################

If X is greater than 5, A = A +3; # Branch (1)

Else B = B-3; # Branch (2)

#####################################################

For the above pseudo code, the branch prediction unit in the processing apparatus may predict that the program has a higher possibility of going to branch (1), and therefore, the branch prediction unit may instruct the fetch unit to prefetch data a. However, in practice, the program should execute branch (2), i.e. the branch prediction result corresponding to the fetch unit is incorrect. At this time, only data a is included in the L1 buffer memory, and data B is not included, resulting in a miss of the access request for obtaining data B. At this point, the fetch unit will instruct the miss history register to send a priority update command to the L2 buffer memory to indicate to raise the priority of the access request to fetch data B, according to the branch prediction result. Optionally, the instruction fetch unit may also instruct the miss history register to send a priority update command to the L2 cache memory to instruct to lower the priority of the access request to fetch the data a, according to the branch prediction result.

Thus, corresponding to the scenario in fig. 6, the embodiment of the present disclosure may improve the performance of the processing apparatus by increasing the priority of the access request associated with the branch prediction result, so that the optimized processing apparatus can quickly compensate for the loss caused by the branch prediction error.

Next, yet another example scenario for adjusting a priority level of an access request is described with reference to FIG. 7. In this example scenario, the processing apparatus also includes a processor core including an instruction fetch unit and a reorder buffer. The priority update command of the first access request is at least partially associated with the state of the reorder buffer and the corresponding branch prediction result of the fetch unit.

For example, as shown in FIG. 7, there is a case where, although the instruction fetch module finds a branch direction prediction error, there are instructions that need to read the same address in the wrong branch direction and the correct branch direction. It is assumed that the processor apparatus schematically shown in fig. 7 executes the following pseudo code.

#####################################################

if X >5# Branch (1)

A = a +3# instruction 0, data a needs to be accessed

B = B +5# instruction 1, data B needs to be accessed

C = C +2# instruction 2, requiring access to data C

else # Branch (2)

A = a +5# instruction 3, requiring access to data a

D = D +5# instruction 4, data D needs to be accessed

E = E +2# instruction 5, data E needs to be accessed

#####################################################

With respect to the above pseudo code, a branch prediction unit in the processing apparatus may predict that the program is more likely to go to branch (1), and therefore, the branch prediction unit may instruct the fetch unit to prefetch data a, B, and C. However, in practice, the program should execute branch (2), i.e. the branch prediction result corresponding to the fetch unit is incorrect.

Referring to fig. 7, the reordering buffer also records each instruction in sequence, and refreshes the execution result of each instruction after each instruction is executed. Correspondingly, in the reorder buffer of FIG. 7, instructions 0-2 are instructions on the wrong path and instructions 3-5 are instructions on the correct path.

Considering that instructions 3-5 and instructions 0-2 correspond to overlapping data storage addresses, for example, both instruction 3 and instruction 0 need to access the address of data A storage. If instructions 0-2 are directly destaged, as in the scenario of FIG. 6, instruction 3 may need to wait a long time to get data A, resulting in the reorder buffer never being flushed.

Thus, for the scenario illustrated in FIG. 7, the miss history register may first be traversed for instructions 3-5 to query whether an access request for data A, data D, and data E was issued in the miss history register or whether an unsent access request for data A, data D, and data E was generated in the miss history register. Through the traversal, in response to determining that there is an access request requesting access to the same address, the flag bit of the 'new request' of the access request may be set to 1. For example, for the above example, the flag bit of the 'new request' of the access request for accessing data a may be set to 1. If an access request for accessing data A has been issued to the L2 cache, a priority update request may be sent indicating that the priority level of the access request is to be increased. If an access request for accessing data A has not been issued to the L2 cache, the priority field of the access request may be modified directly, setting it correspondingly to a high priority level.

Further, it may also be possible to determine that no access requests are currently being issued for data D and data E due to the process of traversing the miss history register. Then access requests for accessing data D and accessing data E need to be generated since data D and data E are necessary for execution of the piece of program. In some examples, the priority level of the access requests for accessing data D and accessing data E may be set to high directly. In still other examples, priority update requests may also be generated directly, indicating an increase in the priority level of access requests for accessing data D and accessing data E.

Then, for instructions 0-2, the miss history register is traversed to query whether an access request for data A, data B, and data C was issued in the miss history register or whether a non-transmitted access request for data A, data B, and data C was generated in the miss history register. In the traversal process, for an access request for accessing the data a, the traversal process does not trigger the priority change of the access request based on the flag bit of the 'new request' of the request being 1. While for an access request for accessing data B and an access request for accessing data C, based on the flag bits of the two access requests being 0, the traversal process will trigger a priority change for the access requests for accessing data B and accessing data C.

For example, correspondingly, based on the flag bits of the access request for accessing data B and the access request for accessing data C being 0, it can be determined that: data B and data C are no longer needed or the urgency of data B and data C is not high. If an access request for accessing data B/C has not been issued to the L2 cache, the priority field of the access request may be modified directly, set to a correspondingly low priority level, or the access request may be deleted directly. If an access request for accessing data B/C has been issued to the L2 cache, a priority update request may be sent indicating that the priority level of the two access requests is to be lowered.

Thus, corresponding to the scenario of fig. 7, embodiments of the present disclosure may prioritize processing of access requests (e.g., access requests to access data a, data D, and data E) relied on by a processor core, reduce latency of critical data path paths, and improve performance of the processor.

Fig. 8 is a schematic structural diagram of a processing device according to at least one embodiment of the present disclosure.

For example, at least one embodiment of the present disclosure provides a processing device 80, as shown in fig. 8. The processing apparatus 80 comprises a processor core 801 and a multi-level buffer memory 802. The processor core 801 and the multi-level buffer memory 802 are communicatively coupled to each other. For example, the processor core 801 includes an L1 cache, a fetch unit, and a reorder buffer. The multi-level buffer memory comprises the L2 buffer memory, the L3 buffer memory and an external memory. Each level of buffer memory optionally includes a miss history register.

For example, the processor core 801 may be a Central Processing Unit (CPU), a digital signal processor core (DSP), or other form of processing unit having data processing capabilities and/or program execution capabilities, such as a Field Programmable Gate Array (FPGA), or the like; for example, the Central Processing Unit (CPU) may be an X86 or ARM architecture or the like. Processor core 801 may be a general purpose processor core or a special purpose processor core that may control other components in processing device 80 to perform desired functions.

For example, the multi-level cache memory 802 can be volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM), cache memory, and/or the like. The non-volatile memory may include, for example, read Only Memory (ROM), a hard disk, an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), USB memory, flash memory, and the like.

For example, each level of buffer memory is configured to respond to the access request received by the buffer memory of the current level, and determine the priority level corresponding to the first access request; and determining an order in which the first access request or the second access request is processed in a pipelined manner by the buffer memory based on the priority level corresponding to the first access request. For example, each level of the buffer memory may implement operations S301 to S302, and the specific implementation method thereof may refer to the related descriptions of operations S301 to S302, which are not described herein again.

It should be noted that each stage of the buffer memory may implement operations S301 to S302 through software, hardware, firmware, or any combination thereof. For example, each stage of the buffer memories may correspondingly include a priority determining circuit and a sequence determining circuit to implement operation S301 and operation S302, respectively. The embodiments of the present disclosure do not limit their specific embodiments.

For example, in at least one embodiment of the present disclosure, each level of buffer memory further comprises an arbiter. The arbiter and the request queue are communicatively coupled to each other. For example, in one example, an arbiter, as one implementation of the order determination circuit, is configured to determine an order in which the first access request or the second access request is to be processed in a pipelined manner by the buffer memory based at least in part on a priority level to which the first access request corresponds.

It should be noted that the processing device 80 shown in fig. 8 is only an example, the processing device may further include more or less circuits or units, and the connection relationship between the circuits or units is not limited and may be determined according to actual needs. The specific configuration of each circuit is not limited, and may be configured by an analog device, a digital chip, or other suitable configurations according to the circuit principle.

Fig. 9 is a schematic block diagram of an electronic device provided in at least one embodiment of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device 900 shown in fig. 9 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.

For example, as shown in fig. 9, in some examples, electronic device 900 includes a processing device (e.g., a central processor core, a graphics processor core, etc.) 901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage device 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the computer system are also stored. The processing device 901, ROM 902 and RAM 903 are connected thereto by a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

For example, the following components may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 907 including devices such as Liquid Crystal Displays (LCDs), speakers, vibrators, and the like; storage 908 including, for example, magnetic tape, hard disk, etc.; and a communication device 909 including a network interface card such as a LAN card, a modem, or the like. The communication means 909 may allow the processing means 900 to perform wireless or wired communication with other devices to exchange data, performing communication processing via a network such as the internet. The drive 310 is also connected to the I/O interface 905 as needed. A removable medium 9011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 9010 as necessary, so that a computer program read out therefrom is installed into the storage device 908 as necessary. While fig. 9 illustrates an electronic device 900 that includes various means, it is to be understood that not all illustrated means are required to be implemented or included. More or fewer devices may be alternatively implemented or included.

For example, the electronic device 900 may further include a peripheral interface (not shown in the figure) and the like. The peripheral interface may be various types of interfaces, such as a USB interface, a lightning (l i right i ng) interface, and the like. The communication 909 may communicate with networks such as the internet, intranets, and/or wireless networks such as cellular telephone networks, wireless Local Area Networks (LANs), and/or Metropolitan Area Networks (MANs) and other devices through wireless communication. The wireless communication may use any of a number of communication standards, protocols, and technologies, including, but not limited to, global system for mobile communications (GSM), enhanced Data GSM Environment (EDGE), wideband code division multiple access (W-CDMA), code Division Multiple Access (CDMA), time Division Multiple Access (TDMA), bluetooth, wi-Fi (e.g., based on IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, and/or IEEE 802.11n standards), voice over internet protocol (VoIP), wi-MAX, protocols for email, instant messaging, and/or Short Message Service (SMS), or any other suitable communication protocol.

For example, the electronic device 900 may be any device such as a mobile phone, a tablet computer, a notebook computer, an electronic book, a game console, a television, a digital photo frame, and a navigator, and may also be any combination of a processing device and hardware, which is not limited in this respect by the embodiments of the present disclosure.

For example, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902. When executed by the processing device 901, the computer program performs the method 30 disclosed by the embodiment of the present disclosure.

It should be noted that the computer readable medium of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the processing device 400, the processing device 80, or the processing device 901; or may exist alone without being assembled into the processing apparatus 400/processing apparatus 80/processing apparatus 901.

At least one embodiment of the present disclosure also provides a non-transitory readable storage medium. Fig. 10 is a schematic block diagram of a non-transitory readable storage medium provided by at least one embodiment of the present disclosure. As shown in fig. 10, the non-transitory readable storage medium 150 has stored thereon computer instructions 111, which computer instructions 111, when executed by the processor core, perform one or more steps of the method 30 as described above.

For example, the non-transitory readable storage medium 150 can be any combination of one or more computer readable storage media, e.g., one computer readable storage medium containing computer readable program code for determining a priority level corresponding to the first access request in response to receiving an access request at a current level of a cache memory, another computer readable storage medium containing computer readable program code for determining an order in which the cache memory is to process the first access request or the second access request in a pipelined manner based on the priority level corresponding to the first access request. Of course, the above program codes may also be stored in the same computer readable medium, and the embodiments of the present disclosure are not limited thereto.

For example, when the program code is read by a computer, the computer can execute the program code stored in the computer storage medium to perform, for example, the method 30 provided by any of the embodiments of the present disclosure.

For example, the storage medium may include a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a flash memory, or any combination of the above, as well as other suitable storage media.

In the present disclosure, the term "plurality" means two or more unless explicitly defined otherwise.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of processing an access request for retrieving data in a processing device comprising a multi-level buffer memory, the method comprising:

responding to a first access request received by a buffer memory of the current level, and determining a priority level corresponding to the first access request; and

and determining the order of the buffer memory for processing the first access request or the second access request based on the priority level corresponding to the first access request.

2. The method of claim 1, wherein said determining a priority level to which the first access request corresponds in response to the current level of cache memory receiving an access request further comprises:

and in response to the buffer memory of the current level receiving a priority updating command aiming at the first access request, determining the priority corresponding to the first access request.

3. The method of claim 1, wherein the first access request includes a priority field, and wherein determining a priority corresponding to the first access request in response to the access request being received by the buffer memory of the current level comprises: and determining the priority corresponding to the first access request based on the priority indicated by the priority field in the first access request.

4. The method of claim 1, wherein the determining the order in which the buffer memory processes the first access request or the second access request based on the priority level to which the first access request corresponds comprises:

in response to a response queue of the buffer memory including access responses and the access responses including data requested by the first access request, adjusting a priority level of the access responses based on a priority level corresponding to the first access request.

5. The method of claim 1, wherein the determining the order in which the buffer memory processes the first access request or the second access request based on the priority level to which the first access request corresponds comprises:

in response to a request queue of the buffer memory including a second access request requesting access to the same address as the received first access request, adjusting a priority level corresponding to the second access request based on a priority level corresponding to the first access request, and

and determining the order of the second access requests processed in the buffer memory in a pipeline mode based on the adjusted priority level corresponding to the second access requests.

6. The method of claim 1, wherein the determining the order in which the buffer memory processes the first access request or the second access request based on the priority level to which the first access request corresponds comprises: in response to the priority level corresponding to the first access request being a high priority, the buffer memory will preferentially process the first access request in a pipelined manner.

7. The method of claim 1, wherein the determining the order in which the buffer memory processes the first access request or the second access request based on the priority level to which the first access request corresponds comprises:

responding to the first access request missing in the buffer memory of the current level, and recording the first access request and the priority level corresponding to the first access request in a miss history register of the buffer memory of the current level; and

and responding to the first access request and the priority level corresponding to the first access request, and sending the first access request to a lower-level buffer memory.

8. The method of claim 2, wherein the processing device further comprises a processor core including a reorder buffer, the priority update command of the first access request being at least partially associated with a state of the reorder buffer.

9. The method of claim 2, wherein processing device further comprises a processor core including an instruction fetch unit, the priority update command of the first access request being at least partially associated with a corresponding branch prediction result of the instruction fetch unit.

10. The method of claim 2, wherein the processing device further comprises a processor core including an instruction fetch unit and a reorder buffer, the priority update command of the first access request being at least partially associated with a state of the reorder buffer and a branch prediction result corresponding to the instruction fetch unit.

11. A processing apparatus, comprising:

a processor core;

a multi-level cache memory;

wherein the processor core or the multi-level cache memory is configured to perform the method of any of claims 1-10.

12. An electronic device comprising the processing apparatus of claim 11.

13. A non-transitory readable storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processing device, perform the method of any one of claims 1-10.