CN114721727B - Processor, electronic equipment and multithreading shared instruction prefetching method - Google Patents

Processor, electronic equipment and multithreading shared instruction prefetching method Download PDF

Info

Publication number
CN114721727B
CN114721727B CN202210649455.0A CN202210649455A CN114721727B CN 114721727 B CN114721727 B CN 114721727B CN 202210649455 A CN202210649455 A CN 202210649455A CN 114721727 B CN114721727 B CN 114721727B
Authority
CN
China
Prior art keywords
instruction
request
prefetching
prefetch
thread group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210649455.0A
Other languages
Chinese (zh)
Other versions
CN114721727A (en
Inventor
李晶晶
王刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Denglin Technology Co ltd
Chengdu Denglin Technology Co ltd
Original Assignee
Shanghai Denglin Technology Co ltd
Chengdu Denglin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Denglin Technology Co ltd, Chengdu Denglin Technology Co ltd filed Critical Shanghai Denglin Technology Co ltd
Priority to CN202210649455.0A priority Critical patent/CN114721727B/en
Publication of CN114721727A publication Critical patent/CN114721727A/en
Application granted granted Critical
Publication of CN114721727B publication Critical patent/CN114721727B/en
Priority to PCT/CN2022/130379 priority patent/WO2023236443A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3814Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application relates to a processor, electronic equipment and a multithreading shared instruction prefetching method, and belongs to the technical field of computers. The processor includes: an instruction cache, thread group unit; the thread group unit is configured to send a first instruction prefetching request to the instruction cache, wherein the first instruction prefetching request is used for acquiring instructions corresponding to a thread group to be created in the future in a period of time; the instruction cache is configured to perform instruction prefetching in response to a first instruction prefetching request. In the embodiment of the application, before the thread group is created, a first instruction prefetching request for acquiring an instruction corresponding to the thread group to be created in a future period of time is sent to the instruction cache in advance, so that the instruction cache has enough time to prefetch the instruction in advance, and thus, the probability of access failure of subsequent instructions can be reduced, and especially, the effect is obvious for a new thread.

Description

Processor, electronic equipment and multithread shared instruction prefetching method
Technical Field
The application belongs to the technical field of computers, and particularly relates to a processor, electronic equipment and a multithread shared instruction prefetching method.
Background
Processor instruction fetch performance is important and instruction caches are often designed to alleviate latency issues with instruction fetch accesses, where instructions are immediately available if the instruction fetch request hits in the instruction cache. If an instruction fetch request does not hit in the instruction cache, it needs to be sent to the next level cache or main memory to fetch the instruction, and this type of processing usually requires a long response time, thereby reducing performance.
In order to improve the hit rate of instruction fetch requests in an instruction cache, reduce the instruction fetch delay and avoid the system from being stalled due to the fact that instructions are not supplied, an instruction prefetch technology is provided. The key points of instruction prefetch quality are control over the point in time of instruction prefetching (which affects the effectiveness of instruction prefetching either too early or too late) and avoidance of multi-thread group repeat prefetching (which wastes prefetch bandwidth).
The instruction fetching mode of the current instruction prefetching technology is mainly to prefetch according to a sequence or jump prefetch according to a fixed step size, that is, to prefetch the next fixed number of instructions based on the position of the existing instruction fetching request, the time margin of the instruction prefetching mode is insufficient, and prefetching is not timely enough, so that if an instruction fetching request accesses a prefetch instruction, the prefetching action of the prefetch instruction is just happening or in progress (that is, the corresponding prefetch instruction is not yet acquired), especially for a new thread which is just started, because the cold start time of the new thread is relatively long, the initial instruction fetching request of the new process is access failure (miss), and the corresponding instruction can be acquired only after waiting for a long time, so that the purpose of prefetching cannot be achieved.
Disclosure of Invention
In view of this, an object of the present application is to provide a processor, an electronic device, and a multithread shared instruction prefetching method, so as to solve the problem that in the existing instruction prefetching technology, prefetching is not timely enough, which results in a high probability of instruction access failure, and a long time is required to wait for acquiring a corresponding instruction.
The embodiment of the application is realized as follows:
in a first aspect, an embodiment of the present application provides a processor, including: an instruction cache and a thread group unit; the thread group unit is configured to send a first instruction prefetching request to the instruction cache, wherein the first instruction prefetching request is used for acquiring instructions corresponding to a thread group to be created in the future in a period of time; the instruction cache is configured to perform instruction prefetching in response to the first instruction prefetching request.
In the embodiment of the application, before the thread group is created, a first instruction prefetching request for acquiring the instruction corresponding to the thread group to be created in a future period of time is sent to the instruction cache in advance, so that the instruction cache has enough time to prefetch the instruction in advance, the probability of follow-up instruction access failure can be reduced, and particularly for a new thread, the effect is obvious.
With reference to one possible implementation manner of the embodiment of the first aspect, the processor further includes: an instruction fetching unit; the thread group unit is further configured to send the created thread group to the instruction fetch unit; the instruction fetching unit is configured to issue a corresponding instruction fetching request based on the received thread group; the instruction cache is further configured to respond to the instruction fetching request and return an instruction hit by the instruction fetching request.
In the embodiment of the application, the created thread groups are distributed to the instruction fetch unit, so that the instruction fetch unit issues the corresponding instruction fetch requests based on the received thread groups, and obtains the corresponding instructions from the instruction cache to perform subsequent operations.
With reference to one possible implementation manner of the embodiment of the first aspect, the instruction fetching unit is further configured to monitor storage amounts of instructions, which are prefetched into the instruction cache, corresponding to each thread group, and send a second instruction prefetching request to the instruction cache when the monitored storage amounts are smaller than a preset threshold; the instruction cache is further configured to perform instruction prefetching in response to the second instruction prefetching request.
In the embodiment of the application, the memory space of the instruction which is prefetched into the instruction cache and corresponds to each thread group is monitored, and when the memory space of the cache instruction is smaller than the preset threshold value, a second instruction prefetching request is sent to the instruction cache so as to prefetch the instruction in advance, so that the probability of hitting the instruction by a subsequent instruction fetching request is improved, and the instruction fetching waiting time is shortened.
With reference to one possible implementation manner of the embodiment of the first aspect, the instruction cache is further configured to: before responding to the second instruction prefetching request to perform instruction prefetching, when the second instruction prefetching request is received, determining that the second instruction prefetching request does not exist in a prefetching state table, wherein the prefetching state table is used for recording the second instruction prefetching request which is in progress or the second instruction prefetching request which is completed within a period of time.
In the embodiment of the application, before the instruction cache performs instruction prefetching in response to the second instruction prefetching request, it is determined that the second instruction prefetching request does not exist in the prefetching state table, and then instruction prefetching is performed in response to the second instruction prefetching request, so that repeated prefetching and waste of prefetching bandwidth can be avoided.
With reference to a possible implementation manner of the embodiment of the first aspect, the instruction cache is further configured to, when a second instruction prefetch request is received, record the received second instruction prefetch request in the prefetch status table.
In the embodiment of the application, the received second instruction prefetching request is recorded in the prefetching status table, so that the second instruction prefetching request is managed, and the instruction prefetching condition of the second instruction prefetching request is obtained.
With reference to one possible implementation manner of the embodiment of the first aspect, the instruction cache is specifically configured to: if no other instruction prefetch request from the thread group corresponding to the received second instruction prefetch request exists in the prefetch state table, directly recording the received second instruction prefetch request in the prefetch state table; or, other instruction prefetch requests from the thread group corresponding to the received second instruction prefetch request exist in the prefetch status table, the prefetch range of the other instruction prefetch requests is updated, and the updated prefetch range of the other instruction prefetch requests includes the prefetch range of the received second instruction prefetch request.
In the embodiment of the application, when the received second instruction prefetch request is recorded in the prefetch state table, if no other instruction prefetch request from the same thread group exists in the prefetch state table, the received second instruction prefetch request is directly recorded in the prefetch state table, and if other instruction prefetch requests from the same thread group exist, the prefetch range of the other instruction prefetch request is updated, so that the updated prefetch range includes the prefetch range of the received second instruction prefetch request, and therefore repeated prefetching and prefetch bandwidth waste can be avoided in the subsequent process.
With reference to one possible implementation manner of the embodiment of the first aspect, the priority of the instruction cache in response to the first instruction prefetch request is lower than the priority of the instruction cache in response to the second instruction prefetch request.
In the embodiment of the application, the priority of the instruction cache responding to the first instruction prefetching request is lower than the priority of the instruction cache responding to the second instruction prefetching request, so that the instruction cache can perform instruction prefetching for the thread group by using the idle bandwidth of the instruction cache, and the normal access performance of the instruction cache cannot be influenced.
In a second aspect, an embodiment of the present application further provides an electronic device, which includes a body and a processor as provided in the foregoing first aspect and/or in connection with any possible implementation manner of the first aspect.
In a third aspect, an embodiment of the present application further provides a method for prefetching an instruction shared by multiple threads, including: the method comprises the steps that an instruction cache acquires a first instruction prefetching request sent by a thread group unit, wherein the first instruction prefetching request is sent by the thread group unit before a thread group is created, and the first instruction prefetching request is used for acquiring an instruction corresponding to a thread group to be created in the future in a period of time; the instruction cache performs instruction prefetching in response to the first instruction prefetching request.
With reference to one possible implementation manner of the embodiment of the third aspect, the first instruction prefetch request includes: the instruction cache performs instruction prefetching in response to the first instruction prefetching request, and includes: and the instruction cache responds to the first instruction prefetching request, acquires the instructions of the prefetching instruction quantity from a rear-level cache according to the instruction fetching address information and stores the instructions in the instruction cache.
With reference to one possible implementation manner of the embodiment of the third aspect, the method further includes: the instruction cache receives a second instruction prefetching request issued by an instruction fetching unit, wherein the second instruction prefetching request is sent when the storage capacity of the instruction prefetched into the instruction cache corresponding to the thread group is smaller than a preset threshold value; the instruction cache performs instruction prefetching in response to the second instruction prefetching request.
With reference to one possible implementation manner of the embodiment of the third aspect, before the instruction cache performs instruction prefetching in response to the second instruction prefetching request, the method further includes: and when receiving the second instruction prefetching request, the instruction cache determines that the second instruction prefetching request does not exist in a prefetching state table, wherein the prefetching state table is used for recording the second instruction prefetching request which is in progress or the second instruction prefetching request which is completed within a period of time.
With reference to one possible implementation manner of the embodiment of the third aspect, the method further includes: when a second instruction prefetching request is received, if no other instruction prefetching request from the thread group corresponding to the received second instruction prefetching request exists in the prefetching state table, directly recording the received second instruction prefetching request in the prefetching state table; or, other instruction prefetch requests from the thread group corresponding to the received second instruction prefetch request exist in the prefetch status table, the prefetch range of the other instruction prefetch requests is updated, and the updated prefetch range of the other instruction prefetch requests includes the prefetch range of the received second instruction prefetch request.
Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts. The foregoing and other objects, features and advantages of the application will be apparent from the accompanying drawings. Like reference numerals refer to like parts throughout the drawings. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the application.
Fig. 1 shows a schematic structural diagram of a processor provided in an embodiment of the present application.
Fig. 2 shows a schematic structural diagram of another processor provided in an embodiment of the present application.
Fig. 3 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Fig. 4 is a flowchart illustrating a multithread shared instruction prefetching method according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, relational terms such as "first," "second," and the like may be used solely in the description herein to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In the description of the present application, it is also to be noted that, unless otherwise explicitly specified or limited, the terms "connected" and "connected" are to be interpreted broadly, e.g., as meaning either a fixed connection, a detachable connection, or an integral connection; or may be an electrical connection; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.
Further, the term "and/or" in the present application is only one kind of association relationship describing the associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The term "plurality" means two or more unless otherwise specified.
In view of the drawbacks of the prior instruction prefetching techniques, for example, if there is an instruction fetch request accessing a prefetch instruction due to the fact that the prefetch is not timely enough, there is a high probability that the prefetch action of the prefetch instruction is just taking place or is in progress (i.e. the corresponding prefetch instruction has not yet been fetched), especially for a new thread that is just started, because the cold start time of the new thread is relatively long, the initial instruction fetch request of the new process is all access failures (misses), and it takes a long time to fetch the corresponding instruction.
Based on this, the embodiment of the present application provides a brand-new instruction prefetching method with multithread sharing, which can effectively reduce the probability of instruction access invalidation and reduce the waiting time for instruction fetching request to obtain an instruction, and especially has an obvious effect for a new thread.
For a better understanding, the following description is made in conjunction with the processor shown in FIG. 1. The processor includes: the instruction cache is connected with the thread group unit.
The thread group unit is responsible for creating the thread group, and the waiting time for obtaining the instruction by the instruction fetching request is reduced in order to reduce the probability of instruction access failure because the cold start time of a new thread is relatively long. The thread group unit is configured to send a first instruction prefetch request to the instruction cache in advance before the thread group is created, wherein the first instruction prefetch request is used for acquiring an instruction corresponding to a thread group to be created in a future period of time (the length of the time can be configured according to application requirements), so that enough time is available for instruction prefetch in advance, and the probability of instruction access failure can be reduced, and the effect is particularly obvious for a new thread.
The first instruction prefetch request carries control information for instruction prefetching, for example, instruction fetch address information (for obtaining a global access address of an instruction) and prefetch instruction number (for indicating the number of instructions that need to be prefetched, and the value of the prefetch instruction can be flexibly configured as needed). If the access addresses of the instructions to be prefetched are consecutive, the instruction fetching address information may only include the first address information of the instruction fetching, and the instructions with the number of prefetched instructions may be fetched consecutively from the first address information. If the access addresses of the instructions to be prefetched are not consecutive, the instruction fetching address information needs to include the access addresses corresponding to the instructions of the number of prefetched instructions. For example, if the number of prefetch instructions is 32, the fetch address information needs to include the access address corresponding to the fetch 32 instructions.
A thread group can comprise a plurality of (such as 16, 32, 64 and the like) threads for acquiring the same instruction, and the plurality of threads for acquiring the same instruction are divided into the same thread group, so that the instructions required by the plurality of threads can be acquired simultaneously by sending an instruction acquisition request, and the instruction acquisition efficiency is improved.
The instruction cache is configured to perform instruction prefetching in response to a first instruction prefetching request. Wherein the first instruction prefetch request comprises: fetch address information and the number of prefetch instructions. The instruction cache is specifically configured to respond to the first instruction prefetch request, obtain the instructions of the prefetch instruction number from the later-level cache (the next-level cache or the main memory) according to the instruction fetch address information, and store the instructions in the instruction cache. For example, assuming that the number of prefetch instructions is 64, 64 instructions are fetched from the later-stage cache based on the instruction fetch address information.
It should be noted that the number of prefetch instructions can be flexibly configured according to the requirement, and is not limited to 64 or 32 in the above example, therefore, 32 and 64 in the example cannot be understood as the limitation of the present application.
Optionally, the priority of the first instruction prefetch request is relatively low, so that the normal access operation of the instruction cache is not affected, the idle bandwidth of the instruction cache is used for prefetching instructions for the new thread group, so as to reduce the cold start time of the thread group, and the number of the prefetch instructions can be flexibly configured, so as to obtain the optimal performance according to different application scenarios. When the workload of the instruction cache is heavy, for example, when a first instruction prefetch request is received, and the instruction cache may be executing an instruction access of an earlier created thread group, the first instruction prefetch request may be overstocked due to an unresponsive response, and subsequently, if the execution of the thread group corresponding to the first instruction prefetch request is finished, the first instruction prefetch request may be invalidated and automatically deleted.
In the embodiment of the application, before the thread group is created, a first instruction prefetching request for acquiring the instruction corresponding to the thread group to be created in a future period of time is sent to the instruction cache in advance, so that the instruction cache has enough time to prefetch the instruction in advance, the probability of follow-up instruction access failure can be reduced, and particularly for a new thread, the effect is obvious. .
In one embodiment, the processor further includes an instruction fetching unit, which is schematically shown in fig. 2. The instruction fetching unit is respectively connected with the thread group unit and the instruction cache.
The thread group unit is further configured to issue the created thread group to the instruction fetch unit. After a period of time (which may be configured as needed) after the thread group unit sends the first instruction prefetch request, the thread group unit formally creates a thread group corresponding to the first instruction prefetch request, and then distributes the created thread group to the instruction fetch unit.
Alternatively, in order to support parallel multithread instruction fetching, the number of instruction fetching units may be multiple (two and two), and the specific value of the instruction fetching unit may be determined according to the requirement of parallel instruction fetching, for example, if 8 thread groups need to be supported for concurrent instruction fetching, the number of instruction fetching units is 8, and if 16 thread groups need to be supported for concurrent instruction fetching, the number of instruction fetching units is 16.
Each instruction fetch unit has a consistent function, and is configured to issue instruction fetch requests to an instruction cache based on a thread group, for example, each instruction fetch request carries a global access address, and instruction fetching can be performed based on the access address. The instruction cache is further configured to return to the instruction fetch unit, in response to the instruction fetch request, an instruction hit by the instruction fetch request. If the instruction fetch request hits in the instruction cache, the instruction can be obtained immediately, and if the instruction fetch request does not hit in the instruction cache, the instruction cache needs to send the instruction fetch request to a later-level cache (a next-level cache or a main memory) to obtain the instruction.
Optionally, the instruction fetching unit is further configured to monitor a storage amount of the instruction prefetched into the instruction cache corresponding to each thread group, and send a second instruction prefetching request to the instruction cache when the monitored storage amount is smaller than a preset threshold (flexibly configurable). The instruction cache is further configured to perform an instruction prefetch in response to a second instruction prefetch request. The instruction fetching unit may monitor a consumption state of the prefetch instruction of each thread group, and when it is detected that the memory size of the prefetch instruction is insufficient (less than a preset threshold), start a second instruction prefetch request of the thread group, and send the second instruction prefetch request to the instruction cache.
For convenience of understanding, the following description will be made by way of example, assuming that the preset threshold is 32, which indicates that instruction prefetching is required when the storage amount of the instruction, corresponding to the thread group, that has been prefetched into the instruction cache is less than 32. For example, assuming that an instruction cache prefetches 128 instructions required by a thread group (assuming that the thread group needs 192 instructions in total) before the thread group is not created, when the instruction fetch unit starts executing the thread group, an instruction fetch request is issued to the instruction cache to fetch the corresponding instructions, and when a 97 th instruction is fetched, since the instruction storage amount of the cache is lower than a preset threshold (32 at this time), a second instruction prefetch request is issued to the instruction cache to fetch instructions behind the thread group, for example, to fetch 129 th to 160 th instructions. Then, when the 129 th instruction is to be acquired, because the instruction storage amount (31) of the cache at this time is lower than the preset threshold (32 at this time), a second instruction prefetch request is issued to the instruction cache to acquire instructions after the thread group, for example, to acquire 161 th to 192 th instructions. It should be noted that, the example is only for facilitating understanding of the principle that the instruction fetching unit monitors the storage amount of the instruction that has been prefetched into the instruction cache corresponding to each thread group, and issues the second instruction prefetch request to prefetch the instruction when the storage amount is insufficient. The various specific values in the above examples should not be construed as limiting the application.
The second instruction prefetch request is similar to the first instruction prefetch request, and also carries control information for instruction prefetching, for example, instruction fetch address information (for obtaining a global access address of an instruction) and prefetch instruction number (for indicating the number of instructions that need to be prefetched, and the value of the prefetch instruction can be flexibly configured as required).
The instruction fetching unit can calculate the storage amount of the instructions which are prefetched into the instruction cache and correspond to each thread group through the number of the prefetched instructions and the number of the instructions consumed (the number of normal instruction fetching by the instruction fetching unit). For example, for a certain thread group, assuming that the number of instructions prefetched by the instruction fetching unit is 64 and the number of instructions consumed is 32, the storage amount of the instructions prefetched into the instruction cache corresponding to the thread group is 64-32= 32.
In an optional implementation manner, the instruction cache may be further configured to notify the instruction fetch unit of the instruction prefetch situation after the instruction prefetch is performed in response to the first instruction prefetch request and after the instruction prefetch is performed in response to the second instruction prefetch request, and in addition, the instruction fetch unit may be notified of the instruction prefetch situation at regular or irregular times even if the first instruction prefetch request and the second instruction prefetch request are not responded, so that the instruction fetch unit can learn, in real time, the storage amount of the instructions prefetched into the instruction cache corresponding to each thread group.
It should be noted that, for the same thread, the instruction fetch range of the first instruction prefetch request and the instruction fetch range of the second instruction prefetch request may overlap or not overlap, and the instruction fetch range of the second instruction prefetch request may be next to the instruction fetch range of the first instruction prefetch request, for example, the instruction fetch range of the first instruction prefetch request is from address 0 to address 31, and the instruction fetch range of the second instruction prefetch request may be from address 32 to address 63; the instruction fetching ranges of the first instruction pre-fetching request and the second instruction pre-fetching request can be overlapped or even equal, namely, the instruction fetching range of the first instruction pre-fetching request and the instruction fetching range of the second instruction pre-fetching request can be flexibly set according to requirements.
Optionally, the priority of the first instruction prefetch request is relatively low, and may be that the priority of the instruction cache responding to the first instruction prefetch request is lower than the priority of the second instruction prefetch request, if the workload of the instruction cache is heavy, the instruction cache may not respond to the first instruction prefetch request of the thread group all the time, the instruction prefetched into the instruction cache corresponding to the thread group is zero, and at this time, when the instruction fetch unit sends the second instruction prefetch request, the instruction fetch range may begin from the first address or from some address after the first address begins to fetch instructions. If the instruction cache already responds to the first instruction prefetch request, the instruction fetch unit may fetch an instruction in an instruction fetch range that is immediately adjacent to the instruction fetch range of the first instruction prefetch request when sending the second instruction prefetch request.
To avoid repetitive prefetching, wasting prefetch bandwidth, optionally the instruction cache is further configured to: before performing instruction prefetching in response to the second instruction prefetching request, when the second instruction prefetching request is received, determining that the second instruction prefetching request does not exist in a prefetching state table, wherein the prefetching state table is used for recording the second instruction prefetching request in progress or the second instruction prefetching request completed within a period of time. In this embodiment, when receiving the second instruction prefetch request, the instruction cache does not directly respond to the second instruction prefetch request for instruction prefetching, but responds to the second instruction prefetch request for instruction prefetching only when determining that the second instruction prefetch request does not exist in the prefetch status table. If the second instruction prefetching request exists in the prefetching status table, the second instruction prefetching request is indicated to be in instruction prefetching or is already completed, and the second instruction prefetching request is directly discarded.
The instruction cache is further configured to record the received second instruction prefetch request in a prefetch status table upon receipt of the second instruction prefetch request. To avoid repeated prefetching, wasting prefetch bandwidth, optionally the instruction cache is specifically configured to: if no other instruction prefetching request from the thread group corresponding to the received second instruction prefetching request exists in the prefetching state table (the other second instruction prefetching request from the same thread group as the current second instruction prefetching request), directly recording the received second instruction prefetching request in the prefetching state table; or, other instruction prefetch requests from the thread group corresponding to the received second instruction prefetch request exist in the prefetch status table, the prefetch range of the other instruction prefetch requests is updated, and the prefetch range of the updated other instruction prefetch requests comprises the prefetch range of the received second instruction prefetch request.
For ease of understanding, this is illustrated by way of example. Assuming that the instruction fetch unit issues a second instruction prefetch request for a thread group for the first time, at this time, since there is no other instruction prefetch request from the thread group in the prefetch status table, the second instruction prefetch request is directly recorded in the prefetch status table. For another example, the instruction fetch unit issues the second instruction prefetch request for the thread group for the second time, and at this time, because there are other instruction prefetch requests from the thread group (i.e. the second instruction prefetch request issued for the first time) in the prefetch state table, only the prefetch range of the other instruction prefetch requests in the prefetch state table (i.e. the second instruction prefetch request issued for the first time) needs to be updated, so that the prefetch range of the second instruction prefetch request issued for the second time is included. That is, the control information (including the instruction fetch address information and the number of prefetch instructions, for example) for instruction prefetching in the second instruction prefetch request issued for the first time is updated.
In addition, the instruction cache is also configured to delete the prefetch range which has expired in the prefetch status table, and delete the second instruction prefetch request which has expired, that is, delete the second instruction prefetch request whose completion time exceeds the preset time in the prefetch status table, and reserve the second instruction prefetch request whose completion time is within the preset time.
The processor shown in the application can be obtained by improving the architecture of the existing main stream processor, so that the processor can effectively reduce the probability of instruction access failure and reduce the waiting time for instruction acquisition of an instruction fetch request while supporting high concurrent access instruction cache. The existing main stream Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), a Graphics Processing Unit (GPU), and the like; the Processor may be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or a microprocessor or any other conventional Processor.
Based on the same inventive concept, the embodiment of the application also provides electronic equipment, and the electronic equipment comprises a body and the processor. The body may include a transceiver, a communication bus, a memory, and the like. In one embodiment, the electronic device is schematically illustrated in fig. 3.
The elements of the transceiver, memory and processor are electrically connected to each other, directly or indirectly, to enable data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. Wherein the transceiver may be used to transceive data. The memory may be used to store data.
The Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
The electronic device includes, but is not limited to, a smart phone, a tablet, a computer, a server, and the like.
The processor provided by the embodiment of the electronic device has the same implementation principle and the same technical effect as those of the embodiment of the processor, and for the sake of brief description, reference may be made to the corresponding content in the embodiment of the processor where no part of the embodiment of the electronic device is mentioned.
Based on the same inventive concept, the embodiment of the present application further provides a multithread shared instruction prefetching method, as shown in fig. 4. The multithreading shared instruction prefetching method provided by the embodiment of the application is applied to the processor. The following describes an instruction prefetching method for multi-thread sharing provided by the embodiment of the present application with reference to fig. 4.
S1: the instruction cache acquires a first instruction prefetching request sent by the thread group unit, wherein the first instruction prefetching request is sent by the thread group unit before the thread group is created.
The first instruction prefetching request is used for acquiring instructions corresponding to a thread group to be created in the future in a period of time. By sending the first instruction prefetch request to the instruction cache ahead of time before the thread group is created so that there is enough time to prefetch instructions ahead of time, the probability of instruction access failure can be reduced, particularly for new threads.
S2: the instruction cache performs instruction prefetching in response to the first instruction prefetch request.
The first instruction prefetch request includes: fetch address information and the number of prefetch instructions. The process of the instruction cache performing instruction prefetching in response to the first instruction prefetching request may be: in response to the first instruction prefetch request, instructions of the number of prefetch instructions are obtained from a later-level cache (a next-level cache or a main memory) according to instruction fetch address information and stored in an instruction cache.
The instruction cache is specifically configured to respond to the first instruction prefetch request, obtain the instructions of the prefetch instruction number from the later-level cache (the next-level cache or the main memory) according to the instruction fetch address information, and store the instructions in the instruction cache. For example, assuming that the number of prefetch instructions is 64, 64 instructions are fetched from the later-stage cache based on the instruction fetch address information. It should be noted that the number of prefetch instructions can be flexibly configured according to the need, and is not limited to 64 illustrated herein, and therefore 64 should not be construed as limiting the present application
Optionally, the instruction prefetching method for multithread sharing further includes: the instruction cache receives a second instruction prefetching request issued by the instruction fetching unit, wherein the second instruction prefetching request is sent when the storage capacity of the instruction prefetched into the instruction cache corresponding to the thread group is smaller than a preset threshold value; the instruction cache performs instruction prefetching in response to the second instruction prefetch request.
Optionally, before the instruction cache performs instruction prefetching in response to the second instruction prefetch request, the method for instruction prefetching with shared multiple threads further includes determining, by the instruction cache, that the second instruction prefetch request does not exist in a prefetch status table when the second instruction prefetch request is received, where the prefetch status table is used to record the second instruction prefetch request that is in progress or the second instruction prefetch request that has been completed within a period of time.
Optionally, the instruction prefetching method for multithread sharing further includes: when a second instruction prefetching request is received, if other instruction prefetching requests from the thread group corresponding to the received second instruction prefetching request do not exist in the prefetching state table, the received second instruction prefetching request is directly recorded in the prefetching state table; or, other instruction prefetch requests from the thread group corresponding to the received second instruction prefetch request exist in the prefetch status table, the prefetch range of the other instruction prefetch requests is updated, and the prefetch range of the updated other instruction prefetch requests comprises the prefetch range of the received second instruction prefetch request.
The implementation principle and the resulting technical effect of the method for prefetching instructions shared by multiple threads provided in the embodiment of the present application are the same as those of the foregoing processor embodiment, and for the sake of brief description, reference may be made to corresponding contents in the foregoing processor embodiment where no part of the method embodiment is mentioned.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A processor, comprising:
caching an instruction;
a thread group unit configured to send a first instruction prefetch request to the instruction cache, where the first instruction prefetch request is used to acquire an instruction corresponding to a thread group to be created in the future, and the thread group includes multiple threads acquiring the same instruction;
the instruction cache is configured to perform instruction prefetching in response to the first instruction prefetching request and perform instruction prefetching in response to a second instruction prefetching request, wherein the priority of the instruction cache in response to the first instruction prefetching request is lower than the priority of the instruction cache in response to the second instruction prefetching request, and the second instruction prefetching request is sent by the instruction fetching unit when the storage amount of the instruction prefetched into the instruction cache corresponding to the thread group is smaller than a preset threshold value.
2. The processor of claim 1, further comprising:
an instruction fetching unit;
the thread group unit is also configured to send the created thread group to the instruction fetching unit;
the instruction fetching unit is configured to issue corresponding instruction fetching requests based on the received thread groups;
the instruction cache is further configured to respond to the instruction fetching request and return the instruction hit by the instruction fetching request.
3. The processor of claim 2, wherein the instruction fetching unit is further configured to monitor a storage amount of the instruction prefetched into the instruction cache corresponding to each thread group, and send a second instruction prefetch request to the instruction cache when the monitored storage amount is smaller than a preset threshold.
4. The processor of claim 3, wherein the instruction cache is further configured to: before responding to the second instruction prefetching request to perform instruction prefetching, when the second instruction prefetching request is received, determining that the second instruction prefetching request does not exist in a prefetching state table, wherein the prefetching state table is used for recording the second instruction prefetching request which is in progress or the second instruction prefetching request which is completed within a period of time.
5. The processor of claim 4, wherein the instruction cache is further configured to record a second instruction prefetch request received in the prefetch status table when the second instruction prefetch request is received.
6. The processor of claim 5, wherein the instruction cache is specifically configured to:
if no other instruction prefetch request from the thread group corresponding to the received second instruction prefetch request exists in the prefetch state table, directly recording the received second instruction prefetch request in the prefetch state table; alternatively, the first and second liquid crystal display panels may be,
and updating the prefetching range of the other instruction prefetching requests when the other instruction prefetching requests from the thread group corresponding to the received second instruction prefetching request exist in the prefetching state table, wherein the updated prefetching range of the other instruction prefetching requests comprises the prefetching range of the received second instruction prefetching request.
7. An electronic device comprising a body and a processor as claimed in any one of claims 1 to 6.
8. A method for multithreaded shared instruction prefetching, comprising:
the method comprises the steps that an instruction cache acquires a first instruction prefetching request sent by a thread group unit and receives a second instruction prefetching request sent by the instruction fetching unit, wherein the first instruction prefetching request is sent by the thread group unit before a thread group is created, the first instruction prefetching request is used for acquiring instructions corresponding to the thread group to be created in a future period of time, and the thread group comprises a plurality of threads for acquiring the same instruction;
the instruction cache responds to the first instruction prefetching request to perform instruction prefetching, and responds to the second instruction prefetching request to perform instruction prefetching, wherein the priority level of the instruction cache responding to the first instruction prefetching request is lower than the priority level of the instruction cache responding to the second instruction prefetching request, and the second instruction prefetching request is sent by an instruction fetching unit when the storage quantity of instructions, corresponding to a thread group, prefetched into the instruction cache is smaller than a preset threshold value.
9. The method of claim 8, wherein prior to the instruction cache performing instruction prefetching in response to the second instruction prefetch request, the method further comprises:
and when receiving the second instruction prefetching request, the instruction cache determines that the second instruction prefetching request does not exist in a prefetching state table, wherein the prefetching state table is used for recording the second instruction prefetching request in progress or the second instruction prefetching request completed within a period of time.
10. The method of claim 9, further comprising:
when a second instruction prefetching request is received, if no other instruction prefetching request from the thread group corresponding to the received second instruction prefetching request exists in the prefetching state table, directly recording the received second instruction prefetching request in the prefetching state table; alternatively, the first and second electrodes may be,
and updating the prefetch range of the other instruction prefetch requests when the other instruction prefetch requests from the thread group corresponding to the received second instruction prefetch request exist in the prefetch state table, wherein the updated prefetch range of the other instruction prefetch requests comprises the prefetch range of the received second instruction prefetch request.
CN202210649455.0A 2022-06-10 2022-06-10 Processor, electronic equipment and multithreading shared instruction prefetching method Active CN114721727B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210649455.0A CN114721727B (en) 2022-06-10 2022-06-10 Processor, electronic equipment and multithreading shared instruction prefetching method
PCT/CN2022/130379 WO2023236443A1 (en) 2022-06-10 2022-11-07 Processor, electronic device and multi-thread shared instruction prefetching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210649455.0A CN114721727B (en) 2022-06-10 2022-06-10 Processor, electronic equipment and multithreading shared instruction prefetching method

Publications (2)

Publication Number Publication Date
CN114721727A CN114721727A (en) 2022-07-08
CN114721727B true CN114721727B (en) 2022-09-13

Family

ID=82232958

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210649455.0A Active CN114721727B (en) 2022-06-10 2022-06-10 Processor, electronic equipment and multithreading shared instruction prefetching method

Country Status (2)

Country Link
CN (1) CN114721727B (en)
WO (1) WO2023236443A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114721727B (en) * 2022-06-10 2022-09-13 成都登临科技有限公司 Processor, electronic equipment and multithreading shared instruction prefetching method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102446087A (en) * 2010-10-12 2012-05-09 无锡江南计算技术研究所 Instruction prefetching method and device
CN105786448A (en) * 2014-12-26 2016-07-20 深圳市中兴微电子技术有限公司 Instruction scheduling method and device

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6965982B2 (en) * 2001-06-29 2005-11-15 International Business Machines Corporation Multithreaded processor efficiency by pre-fetching instructions for a scheduled thread
CN1269036C (en) * 2003-04-24 2006-08-09 英特尔公司 Methods and appats, for generating speculative helper therad spawn-target points
JP4374221B2 (en) * 2003-08-29 2009-12-02 パナソニック株式会社 Computer system and recording medium
US7730263B2 (en) * 2006-01-20 2010-06-01 Cornell Research Foundation, Inc. Future execution prefetching technique and architecture
US8312442B2 (en) * 2008-12-10 2012-11-13 Oracle America, Inc. Method and system for interprocedural prefetching
US9110810B2 (en) * 2011-12-06 2015-08-18 Nvidia Corporation Multi-level instruction cache prefetching
CN105159654B (en) * 2015-08-21 2018-06-12 中国人民解放军信息工程大学 Integrity measurement hashing algorithm optimization method based on multi-threaded parallel
US10599571B2 (en) * 2017-08-07 2020-03-24 Intel Corporation Instruction prefetch mechanism
US11157283B2 (en) * 2019-01-09 2021-10-26 Intel Corporation Instruction prefetch based on thread dispatch commands
CN114327641A (en) * 2021-12-31 2022-04-12 海光信息技术股份有限公司 Instruction prefetching method, instruction prefetching device, processor and electronic equipment
CN114721727B (en) * 2022-06-10 2022-09-13 成都登临科技有限公司 Processor, electronic equipment and multithreading shared instruction prefetching method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102446087A (en) * 2010-10-12 2012-05-09 无锡江南计算技术研究所 Instruction prefetching method and device
CN105786448A (en) * 2014-12-26 2016-07-20 深圳市中兴微电子技术有限公司 Instruction scheduling method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"CTA-Aware Prefetching and Scheduling for GPU";Gunjae Koo;《IEEE》;20180806;第137-149页 *
"基于线程的数据预取技术研究";欧国东;《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》;20120715(第07期);第I137-9页 *

Also Published As

Publication number Publication date
CN114721727A (en) 2022-07-08
WO2023236443A1 (en) 2023-12-14

Similar Documents

Publication Publication Date Title
JP6392286B2 (en) Apparatus and method for reducing castout in a multi-level cache hierarchy
US8140768B2 (en) Jump starting prefetch streams across page boundaries
US7552293B2 (en) Kernel and application cooperative memory management
US6678795B1 (en) Method and apparatus for memory prefetching based on intra-page usage history
US7917701B2 (en) Cache circuitry, data processing apparatus and method for prefetching data by selecting one of a first prefetch linefill operation and a second prefetch linefill operation
US8935478B2 (en) Variable cache line size management
US9262328B2 (en) Using cache hit information to manage prefetches
US20130013867A1 (en) Data prefetcher mechanism with intelligent disabling and enabling of a prefetching function
US9256541B2 (en) Dynamically adjusting the hardware stream prefetcher prefetch ahead distance
US20070180158A1 (en) Method for command list ordering after multiple cache misses
CN110869914B (en) Utilization-based throttling of hardware prefetchers
US7783840B2 (en) Method and apparatus for controlling memory system
US7574566B2 (en) System and method for efficient software cache coherence
US20070180156A1 (en) Method for completing IO commands after an IO translation miss
US8832414B2 (en) Dynamically determining the profitability of direct fetching in a multicore architecture
CN103076992A (en) Memory data buffering method and device
CN113407119B (en) Data prefetching method, data prefetching device and processor
CN114721727B (en) Processor, electronic equipment and multithreading shared instruction prefetching method
CN112231243A (en) Data processing method, processor and electronic equipment
EP3295314A1 (en) Prefetch tag for eviction promotion
US20010032297A1 (en) Cache memory apparatus and data processing system
US9542318B2 (en) Temporary cache memory eviction
US8856444B2 (en) Data caching method
US9552293B1 (en) Emulating eviction data paths for invalidated instruction cache
WO2017028877A1 (en) Device and method for prefetching content to a cache memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant