WO2020132841A1 - 一种基于多线程的指令处理方法及装置 - Google Patents

一种基于多线程的指令处理方法及装置 Download PDF

Info

Publication number
WO2020132841A1
WO2020132841A1 PCT/CN2018/123258 CN2018123258W WO2020132841A1 WO 2020132841 A1 WO2020132841 A1 WO 2020132841A1 CN 2018123258 W CN2018123258 W CN 2018123258W WO 2020132841 A1 WO2020132841 A1 WO 2020132841A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
thread
execution
instruction set
execution thread
Prior art date
Application number
PCT/CN2018/123258
Other languages
English (en)
French (fr)
Inventor
王锦
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201880098382.2A priority Critical patent/CN112789593A/zh
Priority to PCT/CN2018/123258 priority patent/WO2020132841A1/zh
Publication of WO2020132841A1 publication Critical patent/WO2020132841A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead

Definitions

  • the present application relates to the field of computer technology, and in particular, to an instruction processing method and device based on multithreading.
  • Processing concurrency may refer to a metric for processing different instructions in parallel. The higher the processing concurrency, the greater the number of instructions processed at one time, and the lower the processing concurrency, the greater the number of instructions processed at one time.
  • Execution efficiency may refer to the number of instructions processed per unit time. The higher the execution efficiency, the faster the speed of processing instructions, and the lower the execution efficiency, the slower the speed of processing instructions.
  • multi-threading is generally used to improve the concurrency and execution efficiency of computer instructions.
  • FIG. 1 it is a schematic diagram of a multi-threaded instruction engine (IE) framework.
  • IE usually includes five stages, namely instruction fetching (IF), instruction decoder (ID), and data. Select (data selector, DS), execution (EXecution, EX) and write back (write back (DS).
  • IF instruction fetching
  • ID instruction decoder
  • DS data selector
  • EXecution execution
  • DS write back
  • PTH all eight pipeline threads
  • PTH can request the polling (Round Robin, RR) scheduler to take instructions from the instruction memory (Instruction, Memory, IMEM), which is represented as RR1 in FIG.
  • IMEM caches the read instruction in the corresponding instruction queue (IQ), and a PTH is bound to an IQ; after that, an IQ outputs its cached instruction to the RR2 scheduler.
  • the scheduler sends the corresponding instructions to the ID for subsequent ID, DS, EX, and WB processing.
  • multiple PTHs can fetch instructions at the same time. While one PTH is being executed, the other PTHs carry instruction waits. When it is detected that the currently processed instruction is a thread switching instruction (for example, JMP instruction or I/O instruction) At that time, it immediately switches to other PTH to continue processing, so as to avoid the instruction fetch delay and improve the execution efficiency of the instruction.
  • a thread switching instruction for example, JMP instruction or I/O instruction
  • it immediately switches to other PTH to continue processing, so as to avoid the instruction fetch delay and improve the execution efficiency of the instruction.
  • the implementation cost is relatively large.
  • Embodiments of the present application provide an instruction processing method and device based on multithreading, which are used to improve instruction execution efficiency and resource utilization in a processor, thereby reducing costs.
  • a multi-thread-based instruction processing method is provided, which is applied to a processor.
  • the processor includes: a thread manager THM, an instruction memory IMEM, and an ETH for scheduling multiple execution threads (for example, ETH 0- ETH 7) at least two execution thread schedulers, the method includes: the thread manager receives a first instruction processing request and a second instruction processing request, the first instruction processing request is used to request the processor to process the first instruction set, The two instruction processing request is used to request the processor to process the second instruction set; the thread manager controls the instruction memory to send the first instruction set to the first execution thread scheduler of the at least two execution thread schedulers, and controls the instruction memory to The second execution thread scheduler in at least two schedulers sends a second instruction set, and the instruction memory is used to store the first instruction set and the second instruction set; the first execution thread scheduler is the first from multiple execution threads The instruction set selects the first execution thread, and the first execution thread is used to execute the first instruction set; the second execution thread scheduler
  • the processor can schedule one execution thread to execute instructions through at least two execution thread schedulers, so that two or more instruction sets can be executed at the same time.
  • the number is fixed, the number of execution threads in the waiting stage is reduced, which saves IE resources, improves the execution efficiency of instructions and the utilization rate of instruction processing resources, and further reduces costs.
  • the thread manager is used to manage multiple pipelining threads PTH (for example, PTH0-PTH7), and the method further includes: the thread manager is idle from the multiple pipelining threads In the process thread, a first pipeline thread is selected for the first instruction processing request, and a second pipeline thread is selected for the second instruction processing request.
  • PTH for example, PTH0-PTH7
  • the waiting time when the processor processes the first instruction processing request and the second instruction processing request can be reduced, thereby improving the execution efficiency of the instruction.
  • the first execution thread scheduler selects the first execution thread for the first instruction set from the plurality of execution threads, including: the first execution thread scheduler selects from the plurality of execution threads In the process thread in the idle state, the first execution thread is selected for the first instruction set; the second execution thread scheduler selects the second execution thread for the second instruction set from a plurality of execution threads, including: a second execution thread scheduler From the process threads in the idle state among the multiple execution threads, select the second execution thread for the second instruction set.
  • the waiting time for the processor to execute the first instruction set and the second instruction set can be reduced, thereby improving the execution efficiency of the instruction.
  • the thread manager controlling the instruction memory to send the first instruction set to the first execution thread scheduler of the at least two execution thread schedulers includes: the thread manager sends the instruction set to the instruction memory A read request of the first instruction set, so that the instruction memory selects the first instruction cache queue from the plurality of instruction cache queues, and caches the first instruction set in the first instruction cache queue; the first execution thread scheduler starts from the first Obtain the first instruction set in the instruction cache queue; the thread manager controls the instruction memory to send the second instruction set to the second execution thread scheduler in at least two schedulers, including: the thread manager sends the second instruction set to the instruction memory Read request, so that the instruction memory selects a second instruction cache queue from a plurality of instruction cache queues, and caches the second instruction set in the second instruction cache queue; the second execution thread scheduler obtains from the second instruction cache queue The second instruction set.
  • multiple instruction cache queues can flexibly cache corresponding instruction sets on different pipeline threads and
  • the method further includes: the instruction memory caches the first instruction respectively according to the number of execution threads that are idle in a plurality of execution threads managed by at least two execution thread schedulers The queue is bound to the first execution thread scheduler, and the second instruction cache queue is bound to the second execution thread scheduler.
  • the time when the execution thread scheduler schedules the execution thread to process the instruction set can be reduced, thereby improving the execution efficiency of the instruction.
  • the method further includes: the instruction memory releases the binding relationship between the first instruction cache queue and the first execution thread scheduler; the thread manager The first pipeline thread is set to an idle state; the first execution thread scheduler sets the first execution thread to an idle state.
  • the resources related to the first instruction set in the processor can be released in time after the processing of the first instruction set is completed, which avoids unreasonable utilization of resources, thereby improving the flexibility of resource use, and Improve the execution efficiency of instructions.
  • the method further includes: the instruction memory releasing the binding relationship between the second instruction cache queue and the second execution thread scheduler; the thread manager The second pipeline thread is set to the idle state; the second execution thread scheduler sets the second pipeline thread to the idle state.
  • the resources related to the second instruction set in the processor can be released in time to avoid unreasonable utilization of resources, thereby improving the flexibility of resource use, and Improve the execution efficiency of instructions.
  • the method further includes: a thread manager Control the instruction memory according to the thread switching instruction to retrieve the instruction set after the thread switching instruction is switched; when the thread switching instruction belongs to the first instruction set, the instruction memory clears the instructions in the first instruction cache queue, and the first execution thread scheduler schedules multiple executions The third execution thread in the thread continues to execute; or, when the thread switching instruction belongs to the second instruction set, the instruction memory clears the instruction in the second instruction cache queue, and the second execution thread scheduler schedules the third of the multiple execution threads The execution thread continues to execute.
  • the instructions in the corresponding instruction cache queue are cleared in time at the same time, which can improve the utilization rate of the instruction cache queue and simultaneously execute the thread scheduler to switch to other The execution thread continues to execute, which can improve the processing efficiency of the instruction.
  • a processor in a second aspect, includes: a thread manager, an instruction memory, and at least two execution thread schedulers for scheduling multiple execution threads; wherein, the thread manager is configured to receive the first Instruction processing request and second instruction processing request, the first instruction processing request is used to request the processor to process the first instruction set, the second instruction processing request is used to request the processor to process the second instruction set; the thread manager is also used to control The instruction memory sends the first instruction set to the first execution thread scheduler in the at least two execution thread schedulers, and the control instruction memory sends the second instruction set to the second execution thread scheduler in the at least two schedulers, the instruction memory Used to store the first instruction set and the second instruction set; the first execution thread scheduler is used to select the first execution thread for the first instruction set from the plurality of execution threads, and the first execution thread is used to execute the first instruction set The second execution thread scheduler is used to select a second execution thread for the second instruction set from a plurality of execution threads, and the second execution thread is used to
  • the thread manager is used to manage multiple pipeline threads, and the thread manager is also used to: from the process threads in the idle state among the multiple pipeline threads, respectively process the first instruction The request selects the first pipeline thread, and selects the second pipeline thread for the second instruction processing request.
  • the first execution thread scheduler is specifically configured to: select the first execution thread for the first instruction set from the process threads in the idle state among the multiple execution threads; second The execution thread scheduler is specifically used to select the second execution thread for the second instruction set from the process threads in the idle state among the multiple execution threads.
  • the thread manager is further specifically configured to: read the first instruction set from the instruction memory, so that the instruction memory selects the first instruction cache queue from the plurality of instruction cache queues, And cache the first instruction set in the first instruction cache queue, the first instruction cache queue is used to provide the first instruction set for the first execution thread scheduler; the thread manager is also specifically used to: read from the instruction memory A second instruction set, so that the instruction memory selects a second instruction cache queue from a plurality of instruction cache queues, and caches the second instruction set in the second instruction cache queue, and the second instruction cache queue is used for the second execution thread
  • the scheduler provides a second instruction set.
  • the instruction memory is used to: according to the number of execution threads in a plurality of execution threads managed by at least two execution thread schedulers, respectively, cache the first instruction and The first execution thread scheduler is bound, and the second instruction cache queue is bound to the second execution thread scheduler.
  • the instruction memory is also used to release the binding relationship between the first instruction cache queue and the first execution thread scheduler; the thread manager, It is also used to set the first pipeline thread to the idle state; the first execution thread scheduler is also used to set the first execution thread to the idle state.
  • the instruction memory is also used to release the binding relationship between the second instruction cache queue and the second execution thread scheduler; the thread manager, It is also used to set the second pipeline thread to the idle state; the second execution thread scheduler is also used to set the second pipeline thread to the idle state.
  • the thread manager is also used to The thread switching instruction control instruction memory fetches the instruction set after the thread switching instruction is switched; when the thread switching instruction belongs to the first instruction set, the instruction memory is also used to clear the instructions in the first instruction cache queue, and the first execution thread scheduler also uses The third execution thread in scheduling multiple execution threads continues to execute; or, when the thread switching instruction belongs to the second instruction set, the instruction memory is also used to clear the instruction in the second instruction cache queue; the second execution thread scheduler also It is used to schedule the third execution thread among multiple execution threads to continue execution.
  • a device which includes a processor and a memory, the memory is used to store code and data of the device, and the processor runs the code in the memory to cause the processor to perform the first aspect or the second aspect Multi-thread-based instruction processing method provided by any possible implementation mode of.
  • the processor is a processor provided by the foregoing second aspect or any possible implementation manner of the second aspect.
  • any of the processors or devices provided by the above multi-threaded instruction processing method are used to perform the corresponding methods provided above, therefore, for the beneficial effects that can be achieved, refer to the above provided The beneficial effects in the corresponding method of will not be repeated here.
  • FIG. 1 is a schematic structural diagram of a multi-threaded instruction engine
  • FIG. 2 is a schematic structural diagram 1 of a multi-threaded instruction engine provided by an embodiment of the present application
  • FIG. 3 is a second structural diagram of a multi-threaded instruction engine provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of an instruction processing method based on multithreading provided by an embodiment of the present application
  • FIG. 5 is a schematic structural diagram of an instruction processing device provided by an embodiment of the present application.
  • At least one refers to one or more, and “multiple” refers to two or more.
  • And/or describes the relationship of the related objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist at the same time, B exists alone, where A, B can be singular or plural.
  • At least one of the following or a similar expression refers to any combination of these items, including any combination of a single item or a plurality of items.
  • At least one (a) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, and c may be single or multiple.
  • the character "/" generally indicates that the related object is a "or” relationship.
  • the words “first” and “second” do not limit the number and the execution order.
  • Thread A collection of associated resources required for program execution, such as PC, Program State, etc.
  • Pipeline thread Threads waiting in line to enter the pipeline.
  • Execution thread corresponds to the collection of resources associated with the execution program inside the IE, such as registers.
  • Instruction fetching The operation of fetching instructions from IMEM and sending them to ID (Instruction Decoder).
  • Instruction memory Instruction, Memory, IMEM: Buffer to store instructions.
  • Instruction cache queue a queue of instruction caches.
  • Instruction decoding (instruction decoder, ID): The operation of decoding the instruction provided by IF.
  • Data selector Select the operation that requires the source operand from the Program State associated with the execution thread according to the instruction.
  • Execution Perform the operation of the corresponding instruction on the operand selected by DS.
  • the IE includes: a thread manager (TH Manager 201), IMEM 202, and at least two The thread scheduler 203 and at least two instruction processing modules 204 are executed.
  • the thread manager 201 includes multiple PTHs and pipeline thread schedulers.
  • the thread manager 201 can be used to manage multiple PTHs.
  • the pipeline thread scheduler can obtain instructions from the IMEM 202 according to the request of each PTH among the multiple PTHs. For example, when the thread manager 201 receives an instruction processing request, the thread manager may select one PTH from multiple PTHs to bind to the instruction processing request, and the bound PTH sends a fetch request to the pipeline thread scheduler, thereby
  • the pipeline thread scheduler can read the instruction set corresponding to the instruction processing request from the IMEM 202 after receiving the instruction fetch request.
  • FIG. 2 and FIG. 3 take multiple PTHs including PTH0-PTH7 as examples, RR1 represents a pipeline thread scheduler, the English corresponding to RR is Round Robin, and the Chinese is polling.
  • IMEM 202 can be used to store instructions that need to be processed. IMEM 202 can also be used to manage multiple IQs, and each IQ can be used to cache an instruction set read from IMEM 202. For example, after fetching one or more instructions in an instruction set from IMEM 202, IMEM 202 may select one IQ from multiple IQs to cache the one or more instructions, and then fetch other instructions in the same instruction set later Can be directly cached in the selected IQ. Both FIG. 2 and FIG. 3 take multiple IQs including IQ0-IQ7 as examples.
  • At least two execution thread schedulers 203 can be used to manage multiple ETHs.
  • at least two execution thread schedulers include two or more execution thread schedulers, and each execution thread scheduler is used to manage a part of ETH in multiple ETHs.
  • at least two execution thread schedulers include a first execution thread scheduler and a second execution thread scheduler, multiple ETHs include 8 ETH, and the first execution thread scheduler and the second execution thread scheduler are used to manage 4 ETH.
  • RR2 represents an execution thread scheduler
  • two RR2 and multiple ETHs including ETH0-ETH7 are used as examples for illustration
  • four RR2 and multiple ETHs include ETH0- ETH7 is used as an example for description.
  • the at least two instruction processing modules 204 may include two or more instruction processing modules, and each processing module may include ID, DS, EX, and WB, that is, each processing module may be used to translate instructions executed by the execution thread A series of operations such as code, data selection, execution and write back.
  • at least two instruction processing modules 204 include two instruction processing modules as an example.
  • at least two instruction processing modules 204 include four instruction processing modules as an example.
  • FIG. 4 is a flowchart of an instruction processing method based on multithreading provided by an embodiment of the present application, which may be applied to a device including a processor, and the structure of the IE in the processor may be as shown in FIG. 2 or FIG. 3 above.
  • the method includes the following steps.
  • the thread manager receives the first instruction processing request and the second instruction processing request.
  • the first instruction processing request is used to request processing of the first instruction set
  • the second instruction processing request is used to request processing of the second instruction set.
  • the first instruction processing request may include a program pointer (PC) corresponding to the first instruction set.
  • the PC corresponding to the first instruction set may be used to index the first instruction set.
  • the first instruction set may be independently executable A program segment, the program segment may include multiple instructions.
  • the second instruction processing request may include a PC corresponding to the second instruction set.
  • the PC corresponding to the second instruction set may be used to index the second instruction set, and the second instruction set may also be a program segment that can be independently executed.
  • the thread manager may receive the first instruction processing request before receiving the second instruction processing request; or, the thread manager may receive the second instruction processing request before receiving the first instruction processing request; or , The thread manager receives the first instruction processing request and the second instruction processing request at the same time.
  • the thread manager may also select the first pipeline thread for the first instruction processing request from the multiple pipeline threads, respectively, as the second instruction The processing request selects the second pipeline thread.
  • the thread manager is used to manage multiple pipeline threads.
  • the thread manager receives the first instruction processing request and the second instruction processing request, the thread manager can select the first instruction processing request from the multiple pipeline threads managed The first pipeline thread, and the selection of the second pipeline thread for the second instruction processing request.
  • the first pipeline thread and the second pipeline thread may be pipeline threads in an idle state among the plurality of pipeline threads, and the first pipeline thread and the second pipeline thread are two different pipeline threads.
  • the thread manager may select a pipeline thread in the idle state for the first instruction processing request from the plurality of pipeline threads managed, and select a pipeline thread in the idle state for the second instruction processing request. The thread manager can reduce the waiting time when the processor processes the first instruction processing request and the second instruction processing request by selecting the pipeline thread in the idle state.
  • the thread manager controls the instruction memory to send the first instruction set to the first execution thread scheduler in the at least two execution thread schedulers, and controls the instruction memory to send the second instruction thread scheduler to the second execution thread scheduler. Two instruction sets.
  • the process in which the thread manager controls the instruction memory to send the first instruction set to the first execution thread scheduler of the at least two execution thread schedulers may be: after the thread manager selects the first pipeline thread for the first instruction processing request , The first pipeline thread can send a first instruction fetch request to the pipeline thread scheduler in the thread manager, the first instruction fetch request is used to read the first instruction set; when the pipeline thread scheduler receives the first instruction fetch request At this time, the pipeline thread scheduler may read the first instruction set from the instruction memory IMEM according to the PC corresponding to the first instruction set, so that the instruction memory fetches the first instruction set to send to the first execution thread scheduler.
  • IMEM may select the first IQ for the first instruction set from multiple IQs, and the fetched instructions in the first instruction set are cached in the first IQ; when IMEM After the other instructions in the first instruction set are subsequently retrieved, they can be directly cached in the first IQ. After that, the first execution thread scheduler obtains the instructions in the first instruction set from the first IQ.
  • the process that the thread manager controls the instruction memory to send the second instruction set to the first execution thread scheduler of the at least two execution thread schedulers may be: the thread manager selects the second pipeline thread for the second instruction processing request After that, the second pipeline thread can send a second instruction fetch request to the pipeline thread scheduler in the thread manager.
  • the second instruction fetch request is used to read the second instruction set; when the pipeline thread scheduler receives the second instruction fetch
  • the pipeline thread scheduler may read the second instruction set from the instruction memory IMEM according to the PC corresponding to the second instruction set, thereby causing the instruction memory to fetch the second instruction set to send to the second execution thread scheduler.
  • the IMEM can select a second IQ for the second instruction set from multiple IQs, and the fetched instructions in the second instruction set are cached in the second IQ; when the IMEM After the other instructions in the second instruction set are subsequently retrieved, they can be directly cached in the second IQ.
  • the first IQ and the second IQ may be the IQ in the idle state among the multiple IQs, or the first IQ and the second IQ are empty, or the first IQ
  • the instructions in the other instruction set are not cached in the second IQ.
  • IMEM can reduce the waiting time of the processor when processing the first instruction set and the second instruction set by selecting the idle or empty IQ, and at the same time allows multiple instruction cache queues to flexibly cache different pipeline threads The corresponding instruction set.
  • the IMEM can also bind the first execution thread scheduler to the first IQ and the second execution thread scheduler to the second IQ, and then the first IQ can continue to provide instructions to the first execution thread scheduler.
  • the second IQ continuously provides instructions to the second execution thread scheduler.
  • the IMEM binds the first IQ to the first execution thread scheduler and the second IQ to the second according to the number of ETHs in the idle state among the multiple execution threads managed by at least two execution thread schedulers Perform thread scheduler binding.
  • the number of at least two execution thread schedulers is 4, and each execution thread scheduler can be used to manage 2 ETH.
  • the first execution thread scheduler and the first execution thread scheduler may be the last two execution thread schedulers, that is, the execution thread scheduler with the number of ETHs in the idle state being 2 respectively.
  • multiple instruction cache queues can be flexibly provided with instructions for different execution thread schedulers, thereby improving the flexibility and utilization of multiple instruction cache queues.
  • each of the at least two execution thread schedulers can count the number of ETHs in the idle state managed by each. For example, when an ETH managed by an execution thread scheduler is used, the execution thread scheduler can reduce the number of ETH currently in an idle state by one; when an ETH managed by an execution thread scheduler is released, the execution The thread scheduler can increase the number of ETH currently in idle state by one. Among them, by counting the number of ETH in the idle state, and then selecting ETH based on the number, it can reduce the time that the execution thread scheduler schedules the ETH processing instruction set, thereby improving the execution efficiency of the instruction.
  • the first execution thread scheduler selects the first execution thread for the first instruction set from the plurality of execution threads, and the first execution thread is used to execute the first instruction set.
  • the first execution thread scheduler may select the first execution thread for the first instruction set from the execution threads managed by the first execution thread, and the first execution thread executes the first Instructions in the instruction set. Furthermore, after the ID, DS, EX, and WB operations are performed on the instructions in the first instruction set, the execution of the instructions is completed.
  • the thread switching instruction can also be called a control flow instruction, such as a JMP instruction or an I/O instruction
  • EX can stop fetching instructions through IF and update the destination address corresponding to the thread switching instruction to continue fetching Means that at the same time, instructions that have been fetched but not yet executed in the first instruction set can also be cleared, such as clearing instructions cached in the first IQ, and not performing WB operations for instructions that have entered the IE.
  • the first execution thread scheduler may schedule other ETH carrying instructions that it manages to continue execution, so that the ID switching to the first execution thread scheduler continues carrying out other ETH carrying instructions.
  • the first execution thread scheduler may select one execution thread from the execution threads in the idle state as the first execution thread, that is, the first execution thread is The first execution thread scheduler is idle when selected.
  • the method may further include: the instruction memory releases the binding relationship between the first IQ and the first execution thread scheduler; the thread manager sets the first pipeline thread to an idle state; An execution thread scheduler sets the first execution thread to an idle state.
  • the resources related to the first instruction set in the processor are released in time to avoid the unreasonable utilization of resources, thereby improving the flexibility of resource use, and thereby improving the execution efficiency of instructions .
  • S404 The second execution thread scheduler selects a second execution thread for the second instruction set from the plurality of execution threads, and the second execution thread is used to execute the second instruction set.
  • S403 and S404 may be in no particular order, and the simultaneous execution of S403 and S404 in FIG. 4 is taken as an example for description.
  • the second execution thread scheduler may select a second execution thread for the second instruction set from the execution threads managed by the second execution thread, and the second execution thread executes the second Instructions in the instruction set. Furthermore, after a series of operations such as ID, DS, EX, and WB are performed on the instructions in the second instruction set, the execution of the instructions is completed. If the executed instruction is a thread switching instruction, such as a JMP instruction or an I/O instruction, EX can stop fetching instructions through IF and update the destination address corresponding to the thread switching instruction to continue fetching instructions, and can also clear the second instruction set.
  • a thread switching instruction such as a JMP instruction or an I/O instruction
  • EX can stop fetching instructions through IF and update the destination address corresponding to the thread switching instruction to continue fetching instructions, and can also clear the second instruction set.
  • Instructions that are fetched but not yet executed such as instructions that clear the cache in the second IQ, do not perform WB operations for instructions that have entered the IE.
  • the second execution thread scheduler may schedule other ETH carrying instructions that it manages to continue execution, so that the ID switching to the second execution thread scheduler continues carrying out other ETH carrying instructions.
  • the second execution thread scheduler may select one execution thread from the execution threads in the idle state as the second execution thread, that is, the second execution thread is The second execution thread scheduler is idle when selected.
  • the method may further include: the instruction memory releases the binding relationship between the second IQ and the second execution thread scheduler; the thread manager sets the second pipeline thread to an idle state; The second execution thread scheduler sets the second execution thread to an idle state.
  • the resources related to the second instruction set in the processor are released in time to avoid the unreasonable utilization of resources, thereby improving the flexibility of resource use, and thereby improving the execution efficiency of instructions .
  • each execution thread scheduler in at least two execution thread schedulers can schedule one execution thread to execute instructions, so that two or more instructions can be executed simultaneously Set, so that when the number of multiple execution threads is fixed, the number of execution threads in the waiting phase is reduced, which saves IE resources, improves the efficiency of instruction execution and the utilization of instruction processing resources, and thus also reduces costs .
  • the pipeline thread, IQ and execution thread scheduler are flexibly selected. After the instruction set processing is completed, the pipeline thread, IQ and execution thread scheduler are released in time, thereby avoiding the above resources The unreasonable occupation reduces the delay caused by the above-mentioned insufficient resources, and further improves the processing efficiency of instructions.
  • the processor includes a hardware structure and/or a software module corresponding to each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driven hardware depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
  • the processor may be divided into function modules according to the above method examples.
  • each function module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or software function modules. It should be noted that the division of the modules in the embodiments of the present application is schematic, and is only a division of logical functions. In actual implementation, there may be another division manner.
  • An embodiment of the present application further provides a processor.
  • the structure of the processor may be as shown in FIG. 2 or FIG. 3 above.
  • the processor includes: a thread manager, an instruction memory, and at least at least one for scheduling multiple execution threads. Two execution thread schedulers.
  • the thread manager is used to receive the first instruction processing request and the second instruction processing request, the first instruction processing request is used to request the processor to process the first instruction set, and the second instruction processing request is used to request the processor to process the second Instruction set; thread manager, also used to control the instruction memory to send the first instruction set to the first execution thread scheduler in the at least two execution thread schedulers, and control the instruction memory to the second execution in the at least two schedulers
  • the thread scheduler sends a second instruction set, the instruction memory is used to store the first instruction set and the second instruction set;
  • the first execution thread scheduler is used to select the first execution thread for the first instruction set from a plurality of execution threads,
  • the first execution thread is used to execute the first instruction set;
  • the second execution thread scheduler is used to select a second execution thread for the second instruction set from the plurality of execution threads, and the second execution thread is used to execute the second instruction set.
  • the thread manager is used to manage multiple pipeline threads, and the thread manager is also used to select the first instruction processing request from the process threads in the idle state among the multiple pipeline threads.
  • Pipeline thread which selects a second pipeline thread for the second instruction processing request.
  • the first execution thread scheduler is specifically used to: select the first execution thread for the first instruction set from the process threads in the idle state among the multiple execution threads; the second execution thread scheduling It is specifically used to select the second execution thread for the second instruction set from the process threads in the idle state among the multiple execution threads.
  • the thread manager is further specifically configured to: read the first instruction set from the instruction memory, so that the instruction memory selects the first instruction cache queue from the plurality of instruction cache queues, and the first The instruction set is cached in the first instruction cache queue, and the first instruction cache queue is used to provide the first instruction set for the first execution thread scheduler; the thread manager is also specifically used to read the second instruction set from the instruction memory, In order for the instruction memory to select the second instruction cache queue from the plurality of instruction cache queues, and cache the second instruction set in the first instruction cache queue, the second instruction cache queue is used to provide the first execution thread scheduler with the first Instruction Set.
  • the instruction memory is used to: according to the number of execution threads that are idle in the plurality of execution threads managed by at least two execution thread schedulers, respectively cache the first instruction cache queue and the first execution thread The scheduler is bound, and the second instruction cache queue is bound to the second execution thread scheduler.
  • the instruction memory is also used to release the binding relationship between the first instruction cache queue and the first execution thread scheduler; the thread manager is also used to The first pipeline thread is set to the idle state; the first execution thread scheduler is also used to set the first execution thread to the idle state.
  • the instruction memory is also used to release the binding relationship between the second instruction cache queue and the second execution thread scheduler; the thread manager is also used to The second pipeline thread is set to the idle state; the second execution thread scheduler is also used to set the second pipeline thread to the idle state.
  • a thread manager which is also used to control according to the thread switching instruction
  • the instruction memory fetches the instruction set after the thread switching instruction is switched; when the thread switching instruction belongs to the first instruction set, the instruction memory is also used to clear the instructions in the first instruction cache queue, and the first execution thread scheduler is also used to schedule multiple The third execution thread of the execution threads continues to execute; or, when the thread switching instruction belongs to the second instruction set, the instruction memory is also used to clear the instructions in the second instruction cache queue, and the second execution thread scheduler is also used to schedule The third execution thread among the multiple execution threads continues to execute.
  • an embodiment of the present application further provides an instruction processing device.
  • the instruction processing device includes: a memory 501 and a processor 502.
  • the memory 501 is used to store the program code and data of the device
  • the processor 502 is used to control and manage the operation of the device shown in FIG. 5, and the structure of the IE in the processor 502 may be as shown in FIG. 2 or FIG. 3 described above
  • the structure for example, is specifically used to support the instruction processing to execute S401-S404 in the above method embodiments, and/or other processes used in the technology described herein.
  • the instruction processing shown in FIG. 5 may further include a communication interface 503, and the communication interface 503 is used to support the device to communicate.
  • the processor 502 may be a central processor unit, a general-purpose processor, a digital signal processor, an application specific integrated circuit, a processing chip, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof . It can implement or execute various, for example, logical blocks, modules, and circuits described in conjunction with the disclosure of the embodiments of the present application.
  • the processor 502 may also be a combination that realizes a computing function, for example, includes one or more microprocessor combinations, a combination of a digital signal processor and a microprocessor, and so on.
  • the communication interface 503 may be a transceiver, a transceiver circuit, or a transceiver interface.
  • the memory 501 may be a volatile memory or a non-volatile memory or the like.
  • the communication interface 503, the processor 502, and the memory 501 are connected to each other through a bus 504;
  • the bus 504 may be a peripheral component interconnection (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard, Architecture, EISA) bus Wait.
  • PCI peripheral component interconnection
  • EISA Extended Industry Standard, Architecture
  • the bus 504 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only a thick line is used in FIG. 5, but it does not mean that there is only one bus or one type of bus.
  • the memory 501 may be included in the processor 502.
  • the processor may schedule one execution thread to execute an instruction through each execution thread scheduler in at least two execution thread schedulers, so that two or more instruction sets can be executed simultaneously.
  • the number of execution threads is fixed, the number of execution threads in the waiting phase is reduced, thereby saving IE resources, improving instruction execution efficiency and utilization of instruction processing resources, and thus reducing costs.
  • the pipeline thread, IQ and execution thread scheduler are flexibly selected. After the instruction set processing is completed, the pipeline thread, IQ and execution thread scheduler are released in time, thereby avoiding the above resources The unreasonable occupation reduces the delay caused by the above-mentioned insufficient resources, and further improves the processing efficiency of instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

一种基于多线程的指令处理方法及装置,涉及计算机技术领域,用于提高指令的执行效率,降低成本。该方法应用于包括线程管理器、指令存储器和至少两个执行线程调度器的处理器中,该方法包括:线程管理器接收第一指令处理请求和第二指令处理请求;线程管理器控制指令存储器向至少两个执行线程调度器中的第一执行线程调度器发送第一指令集,向至少两个调度器中的第二执行线程调度器发送第二指令集;第一执行线程调度器从多个执行线程中为第一指令集选择用于执行第一指令集的第一执行线程;第二执行线程调度器从多个执行线程中为第二指令集选择用于执行第二指令集的第二执行线程。

Description

一种基于多线程的指令处理方法及装置 技术领域
本申请涉及计算机技术领域,尤其涉及一种基于多线程的指令处理方法及装置。
背景技术
随着计算机技术的快速发展,计算机对指令的处理并发度和执行效率的要求越来越高。处理并发度可以是指并行处理不同指令的一个度量,处理并发度越高则一次处理的指令的数量越多,处理并发度越低则一次处理的指令的数量越多。执行效率可以是指单位时间内处理的指令的数量,执行效率越高则处理指令的速度越快,执行效率越低则处理指令的速度越慢。目前,通常采用多线程方式提高计算机指令的处理并发度和执行效率。
如图1所示,为一种多线程指令引擎(instruction engine,IE)的框架示意图,IE通常包括五个阶段,即取指(instruction fetching,IF)、译码(instruction decoder,ID)、数据选择(data selector,DS)、执行(execution,EX)和写回(write back,DS)。具体的,在IF阶段,8个流水线程(pipeline thread,PTH)均可以请求轮询(Round Robin,RR)调度器从指令存储器(Instruction Memory,IMEM)中取指,图1中表示为RR1;对于每一个PTH,IMEM将读取的指令缓存在对应的指令队列(instruction queue,IQ)中,一个PTH与一个IQ绑定;之后,一个IQ将其缓存的指令输出给RR2调度器,由RR2调度器将对应的指令发送给ID,进行后续的ID、DS、EX和WB处理。
在上述指令处理过程中,多个PTH可以同时取指,一个PTH正在执行的时候,其他PTH携带指令等待,当检测到当前处理的指令是线程切换指令(比如,JMP指令或者I/O指令)时,则立即切换到其它PTH上继续处理,从而避免取指延时,提高指令的执行效率。但是,上述指令处理的过程时,虽然存在多个PTH,但在后续指令执行阶段只有一个RR2调度器、以及一个ID、DS、EX和WB,因此同一时间只有一个PTH对应的指令在被处理,因此实现成本较大。
发明内容
本申请的实施例提供一种基于多线程的指令处理方法及装置,用于提高指令的执行效率和处理器中资源的利用率,进而降低成本。
为达到上述目的,本申请的实施例采用如下技术方案:
第一方面,提供一种基于多线程的指令处理方法,应用于处理器中,该处理器包括:线程管理器THM、指令存储器IMEM、以及用于调度多个执行线程ETH(比如,ETH 0-ETH 7)的至少两个执行线程调度器,该方法包括:线程管理器接收第一指令处理请求和第二指令处理请求,第一指令处理请求用于请求该处理器处理第一指令集,第二指令处理请求用于请求该处理器处理第二指令集;线程管理器控制指令存储器向至少两个执行线程调度器中的第一执行线程调度器发送第一指令集,以及控制该指令存储器向至少两个调度器中的第二执行线程调度器发送第二指令集,该指令存储器用 于存储第一指令集和第二指令集;第一执行线程调度器从多个执行线程中为第一指令集选择第一执行线程,第一执行线程用于执行第一指令集;第二执行线程调度器从多个执行线程中为第二指令集选择第二执行线程,第二执行线程用于执行第二指令集。
上述技术方案中,处理器能够通过至少两个执行线程调度器中每个执行线程调度器调度一个执行线程执行指令,从而同时能够执行两个或者两个以上的指令集,这样在多个执行线程的数量一定的情况下,处于等待阶段的执行线程减少,进而节省了IE的资源、提高了指令的执行效率和指令处理资源的利用率,进而也降低了成本。
在第一方面的一种可能的实现方式中,线程管理器用于管理多个流水线程PTH(比如,PTH 0-PTH 7),该方法还包括:线程管理器从多个流水线程中处于空闲状态的流程线程中,分别为第一指令处理请求选择第一流水线程,为第二指令处理请求选择第二流水线程。上述可能的实现方式中,能够减小该处理器处理第一指令处理请求和第二指令处理请求时的等待时间,进而提高指令的执行效率。
在第一方面的一种可能的实现方式中,第一执行线程调度器从多个执行线程中为第一指令集选择第一执行线程,包括:第一执行线程调度器从多个执行线程中处于空闲状态的流程线程中,为第一指令集选择第一执行线程;第二执行线程调度器从多个执行线程中为第二指令集选择第二执行线程,包括:第二执行线程调度器从多个执行线程中处于空闲状态的流程线程中,为第二指令集选择第二执行线程。上述可能的实现方式中,能够减小该处理器执行第一指令集和第二指令集的等待时间,进而提高指令的执行效率。
在第一方面的一种可能的实现方式中,线程管理器控制指令存储器向至少两个执行线程调度器中的第一执行线程调度器发送第一指令集,包括:线程管理器向指令存储器发送第一指令集的读请求,以使指令存储器从多个指令缓存队列中选择第一指令缓存队列,并将第一指令集缓存在第一指令缓存队列中;第一执行线程调度器从第一指令缓存队列中获取第一指令集;线程管理器控制指令存储器向至少两个调度器中的第二执行线程调度器发送第二指令集,包括:线程管理器向指令存储器发送第二指令集的读请求,以使指令存储器从多个指令缓存队列中选择第二指令缓存队列,并将第二指令集缓存在第二指令缓存队列中;第二执行线程调度器从第二指令缓存队列中获取第二指令集。上述可能的实现方式中,使得多个指令缓存队列能够灵活的缓存不同流水线程上对应的指令集,以及为不同的执行线程调度器提供指令,从而提高多个指令缓存队列的使用灵活性和利用率。
在第一方面的一种可能的实现方式中,该方法还包括:指令存储器根据至少两个执行线程调度器管理的多个执行线程中处于空闲状态的执行线程的数量,分别将第一指令缓存队列与第一执行线程调度器绑定、以及将第二指令缓存队列与第二执行线程调度器绑定。上述可能的实现方式中,能够减小执行线程调度器调度执行线程处理指令集的时间,进而提高指令的执行效率。
在第一方面的一种可能的实现方式中,在第一指令集处理完成后,该方法还包括:指令存储器解除第一指令缓存队列与第一执行线程调度器的绑定关系;线程管理器将第一流水线程设置为空闲状态;第一执行线程调度器将第一执行线程设置为空闲状态。上述可能的实现方式中,能够在第一指令集处理完成后,及时的释放处理器中与第一 指令集相关的资源,避免了资源的不合理利用,从而提高了资源的使用灵活性,进而提高了指令的执行效率。
在第一方面的一种可能的实现方式中,在第二指令集处理完成后,该方法还包括:指令存储器释放第二指令缓存队列与第二执行线程调度器的绑定关系;线程管理器将第二流水线程设置为空闲状态;第二执行线程调度器将第二流水线程设置为空闲状态。上述可能的实现方式中,能够在第二指令集处理完成后,及时的释放处理器中与第二指令集相关的资源,避免了资源的不合理利用,从而提高了资源的使用灵活性,进而提高了指令的执行效率。
在第一方面的一种可能的实现方式中,在第一指令集或者第二指令集处理过程中,若第一指令集或第二指令集中存在线程切换指令,该方法还包括:线程管理器根据线程切换指令控制指令存储器取出线程切换指令切换后的指令集;当线程切换指令属于第一指令集时,指令存储器清除第一指令缓存队列中的指令,第一执行线程调度器调度多个执行线程中的第三执行线程继续执行;或者,当线程切换指令属于第二指令集时,指令存储器清除第二指令缓存队列中的指令,第二执行线程调度器调度多个执行线程中的第三执行线程继续执行。上述可能的实现方式中,在当前处理的指令集中存在线程切换指令时,同时及时地清除其对应的指令缓存队列中的指令,能够提高指令缓存队列的利用率,同时执行线程调度器切换至其他执行线程继续执行,从而能够提高指令的处理效率。
第二方面,提供一种处理器,该处理器包括:线程管理器、指令存储器、以及用于调度多个执行线程的至少两个执行线程调度器;其中,线程管理器,用于接收第一指令处理请求和第二指令处理请求,第一指令处理请求用于请求处理器处理第一指令集,第二指令处理请求用于请求处理器处理第二指令集;线程管理器,还用于控制指令存储器向至少两个执行线程调度器中的第一执行线程调度器发送第一指令集,以及控制指令存储器向至少两个调度器中的第二执行线程调度器发送第二指令集,指令存储器用于存储第一指令集和第二指令集;第一执行线程调度器,用于从多个执行线程中为第一指令集选择第一执行线程,第一执行线程用于执行第一指令集;第二执行线程调度器用于,从多个执行线程中为第二指令集选择第二执行线程,第二执行线程用于执行第二指令集。
在第二方面的一种可能的实现方式中,线程管理器用于管理多个流水线程,线程管理器还用于:从多个流水线程中处于空闲状态的流程线程中,分别为第一指令处理请求选择第一流水线程,为第二指令处理请求选择第二流水线程。
在第二方面的一种可能的实现方式中,第一执行线程调度器具体用于:从多个执行线程中处于空闲状态的流程线程中,为第一指令集选择第一执行线程;第二执行线程调度器,具体用于:从多个执行线程中处于空闲状态的流程线程中,为第二指令集选择第二执行线程。
在第二方面的一种可能的实现方式中,线程管理器还具体用于:从指令存储器中读取第一指令集,以使指令存储器从多个指令缓存队列中选择第一指令缓存队列,并将第一指令集缓存在第一指令缓存队列中,第一指令缓存队列用于为第一执行线程调度器提供第一指令集;线程管理器,还具体用于:从指令存储器中读取第二指令集, 以使指令存储器从多个指令缓存队列中选择第二指令缓存队列,并将第二指令集缓存在第二指令缓存队列中,第二指令缓存队列用于为第二执行线程调度器提供第二指令集。
在第二方面的一种可能的实现方式中,指令存储器用于:根据至少两个执行线程调度器管理的多个执行线程中处于空闲状态的执行线程的数量,分别将第一指令缓存队列与第一执行线程调度器绑定、以及将第二指令缓存队列与第二执行线程调度器绑定。
在第二方面的一种可能的实现方式中,在第一指令集处理完成后,指令存储器,还用于解除第一指令缓存队列与第一执行线程调度器的绑定关系;线程管理器,还用于将第一流水线程设置为空闲状态;第一执行线程调度器,还用于将第一执行线程设置为空闲状态。
在第二方面的一种可能的实现方式中,在第二指令集处理完成后:指令存储器,还用于释放第二指令缓存队列与第二执行线程调度器的绑定关系;线程管理器,还用于将第二流水线程设置为空闲状态;第二执行线程调度器,还用于将第二流水线程设置为空闲状态。
在第二方面的一种可能的实现方式中,在第一指令集或者第二指令集处理过程中,若第一指令集或第二指令集中存在线程切换指令:线程管理器,还用于根据线程切换指令控制指令存储器取出线程切换指令切换后的指令集;当线程切换指令属于第一指令集时,指令存储器还用于清除第一指令缓存队列中的指令,第一执行线程调度器还用于调度多个执行线程中的第三执行线程继续执行;或者,当线程切换指令属于第二指令集时,指令存储器还用于清除第二指令缓存队列中的指令;第二执行线程调度器还用于调度多个执行线程中的第三执行线程继续执行。
第三方面,提供一种设备,该设备包括处理器和存储器,存储器用于存储该设备的代码和数据,处理器运行存储器中的代码,以使该处理器执行上述第一方面或者第二方面的任一项可能的实现方式所提供的基于多线程的指令处理方法。可选的,该处理器为上述第二方面或者第二方面的任一项可能的实现方式所提供的处理器。
可以理解地,上述提供的任一种基于多线程的指令处理方法的处理器或设备均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
附图说明
图1为一种多线程指令引擎的结构示意图;
图2为本申请实施例提供的一种多线程指令引擎的结构示意图一;
图3为本申请实施例提供的一种多线程指令引擎的结构示意图二;
图4为本申请实施例提供的一种基于多线程的指令处理方法的流程示意图;
图5为本申请实施例提供的一种指令处理设备的结构示意图。
具体实施方式
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者 复数。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c或a-b-c,其中a、b和c可以是单个,也可以是多个。字符“/”一般表示前后关联对象是一种“或”的关系。另外,在本申请的实施例中,“第一”、“第二”等字样并不对数量和执行次序进行限定。
需要说明的是,本申请中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
另外,在介绍本申请实施例之前,首先对本申请实施例所涉及的技术名词进行介绍说明。
线程(thread,TH):程序执行所需要关联的资源的集合,比如PC、程序状态(Program State)等。
流水线程(pipeline thread,PTH):排队等待进入流水线的线程。
执行线程(execution thread,ETH):对应IE内部的执行程序所关联的资源的集合,比如寄存器等。
取指(instruction fetching,IF):将指令从IMEM取出并发送给ID(Instruction Decoder)的操作。
指令存储器(Instruction Memory,IMEM):存储指令的缓存(Buffer)。
指令缓存队列(instruction queue,IQ):缓存指令的队列。
指令译码(instruction decoder,ID):对IF提供的指令进行译码的操作。
数据选择(data selector,DS):根据指令从执行线程关联的Program State中选择需要源操作数的操作。
执行(execution,EX):对DS选择的操作数执行相应的指令的操作。
写回(write back,WB):将EX执行指令生成的数据写回Program State的操作。
图2和图3为本申请实施例提供的一种处理器中指令引擎(instruction engine,IE)的框架示意图,该IE包括:线程管理器(TH Manager,THM)201、IMEM 202、至少两个执行线程调度器203和至少两个指令处理模块204。
其中,线程管理器201包括多个PTH和流水线程调度器,线程管理器201可用于管理多个PTH,流水线程调度器可根据多个PTH中每个PTH的请求从IMEM 202中取指。比如,当线程管理器201接收到指令处理请求时,线程管理器可以从多个PTH中选择一个PTH与该指令处理请求绑定,被绑定的PTH向流水线程调度器发送取指请求,从而流水线程调度器在接收到取指请求后可从IMEM 202中读取与该指令处理请求对应的指令集。图2和图3中均以多个PTH包括PTH0-PTH7为例进行说明,RR1表示流水线程调度器,RR对应的英文为Round Robin,中文为轮询。
IMEM 202可用于存储需要被处理的指令。IMEM 202还可以用于管理多个IQ,每个IQ可用于缓存从IMEM 202中读出的一个指令集。比如,从IMEM 202中取出一个指令集中的一个或者多个指令后,IMEM 202可从多个IQ中选择一个IQ用于缓存所述一个或者多个指令,当后续再取出同一指令集的其他指令时,可直接缓存在选择 的IQ中。图2和图3中均以多个IQ包括IQ0-IQ7为例进行说明。
至少两个执行线程调度器203可用于管理多个ETH。比如,至少两个执行线程调度器包括两个或者两个以上的执行线程调度器,每个执行线程调度用于管理多个ETH中的部分ETH。比如,至少两个执行线程调度器包括第一执行线程调度器和第二执行线程调度器,多个ETH包括8个ETH,第一执行线程调度器和第二执行线程调度器分别用于管理4个ETH。在图2和图3中,RR2表示执行线程调度器,且图2中以两个RR2和多个ETH包括ETH0-ETH7为例进行说明,图3中以四个RR2和多个ETH包括ETH0-ETH7为例进行说明。
至少两个指令处理模块204可以包括两个或者两个以上的指令处理模块,每个处理模块可以包括ID、DS、EX和WB,即每个处理模块可用于对执行线程执行的指令进行指令译码、数据选择、执行和写回等一系列操作。图2中以至少两个指令处理模块204包括两个指令处理模块为例进行说明,图3中以至少两个指令处理模块204包括四个指令处理模块为例进行说明。
图4为本申请实施例提供的一种基于多线程的指令处理方法的流程图,可应用于包括处理器的设备中,处理器中IE的结构可以如上述图2或图3所示,该方法包括以下几个步骤。
S401:线程管理器接收第一指令处理请求和第二指令处理请求,第一指令处理请求用于请求处理第一指令集,第二指令处理请求用于请求处理第二指令集。
其中,第一指令处理请求中可以包括第一指令集对应的程序指针(program counter,PC),第一指令集对应的PC可用于索引第一指令集,第一指令集可以是可独立执行的一个程序片段,该程序片段可以包括多个指令。类似的,第二指令处理请求中可以包括第二指令集对应的PC,第二指令集对应的PC可用于索引第二指令集,第二指令集也可以是可独立执行的一个程序片段。
具体的,线程管理器可以先接收到第一指令处理请求,后接收到第二指令处理请求;或者,线程管理器先接收到第二指令处理请求,后接收到第一指令处理请求;又或者,线程管理器同时接收到第一指令处理请求和第二指令处理请求。
进一步的,线程管理器在接收到第一指令处理请求和第二处理请求时,线程管理器还可以从多个流水线程中,分别为第一指令处理请求选择第一流水线程,为第二指令处理请求选择第二流水线程。
其中,线程管理器用于管理多个流水线程,在线程管理器接收到第一指令处理请求和第二指令处理请求时,线程管理器可以从管理的多个流水线程中为第一指令处理请求选择第一流水线程,以及为第二指令处理请求选择第二流水线程。
可选的,第一流水线程和第二流水线程可以为多个流水线程中处于空闲状态的流水线程,第一流水线程与第二流水线程为两个不同的流水线程。具体的,线程管理器可以从管理的多个流水线程中为第一指令处理请求选择一个处于空闲状态的流水线程,以及为第二指令处理请求选择一个处于空闲状态的流水线程。其中,线程管理器通过选择处于空闲状态的流水线程,能够减小该处理器处理第一指令处理请求和第二指令处理请求时的等待时间。
S402:线程管理器控制指令存储器向至少两个执行线程调度器中的第一执行线程 调度器发送第一指令集,以及控制指令存储器向至少两个调度器中的第二执行线程调度器发送第二指令集。
其中,线程管理器控制指令存储器向至少两个执行线程调度器中的第一执行线程调度器发送第一指令集的过程可以为:在线程管理器为第一指令处理请求选择第一流水线程后,第一流水线程可以向线程管理器中的流水线程调度器发送第一取指请求,第一取指请求用于读取第一指令集;当流水线程调度器在接收到第一取指请求时,流水线程调度器可以根据第一指令集对应的PC从指令存储器IMEM中读取第一指令集,进而使得指令存储器取出第一指令集以发送给第一执行线程调度器。进一步的,IMEM在首次将第一指令集中的指令取出后,IMEM可以从多个IQ中为第一指令集选择第一IQ,取出的第一指令集中的指令缓存在第一IQ中;当IMEM后续再将第一指令集中的其他指令取出后,可直接缓存在第一IQ中。之后,由第一执行线程调度器从第一IQ中获取第一指令集中的指令。
同理,线程管理器控制指令存储器向至少两个执行线程调度器中的第一执行线程调度器发送第二指令集的过程可以为:在线程管理器为第二指令处理请求选择第二流水线程后,第二流水线程可以向线程管理器中的流水线程调度器发送第二取指请求,第二取指请求用于读取第二指令集;当流水线程调度器在接收到第二取指请求时,流水线程调度器可以根据第二指令集对应的PC从指令存储器IMEM中读取第二指令集,进而使得指令存储器取出第二指令集以发送给第二执行线程调度器。进一步的,IMEM在首次将第二指令集中的指令取出后,IMEM可以从多个IQ中为第二指令集选择第二IQ,取出的第二指令集中的指令缓存在第二IQ中;当IMEM后续再将第二指令集中的其他指令取出后,可直接缓存在第二IQ中。
可选的,IMEM在选择第一IQ和第二IQ时,第一IQ和第二IQ可以是多个IQ中处于空闲状态的IQ、或者第一IQ和第二IQ为空、或者第一IQ和第二IQ中未缓存其他指令集中的指令。其中,IMEM通过选择处于空闲状态或者为空的IQ,能够减小该处理器处理第一指令集和第二指令集时的等待时间,同时使得多个指令缓存队列能够灵活的缓存不同流水线程上对应的指令集。
进一步的,IMEM还可以为第一IQ绑定第一执行线程调度器,为第二IQ绑定第二执行线程调度器,后续再由第一IQ持续为第一执行线程调度器提供指令,由第二IQ持续为第二执行线程调度器提供指令。具体的,IMEM根据至少两个执行线程调度器管理的多个执行线程中处于空闲状态的ETH的数量,分别将第一IQ与第一执行线程调度器绑定、以及将第二IQ与第二执行线程调度器绑定。比如,至少两个执行线程调度器的数量为4,每个执行线程调度器可以用于管理2个ETH,若4个执行线程调度器中处于空闲状态的ETH的数量分别为0、1、2和2时,第一执行线程调度器和第一执行线程调度器可以分别为后两个执行线程调度器,即处于空闲状态的ETH的数量为2的执行线程调度器。其中,通过绑定IQ与执行线程调度器,能够使得多个指令缓存队列灵活的为不同的执行线程调度器提供指令,从而提高多个指令缓存队列的使用灵活性和利用率。
可选的,至少两个执行线程调度器中的每个执行线程调度器可以统计各自管理的处于空闲状态的ETH的数量。比如,当一个执行线程调度器管理的一个ETH被使用 时,该执行线程调度器可以将当前处于空闲状态的ETH的数量减1;当一个执行线程调度器管理的一个ETH被释放时,该执行线程调度器可以将当前处于空闲状态的ETH的数量加1。其中,通过统计处于空闲状态的ETH的数量,进而基于该数量选择ETH,能够减小执行线程调度器调度ETH处理指令集的时间,进而提高指令的执行效率。
S403:第一执行线程调度器从多个执行线程中为第一指令集选择第一执行线程,第一执行线程用于执行第一指令集。
当第一执行线程调度器获取到第一指令集中的指令时,第一执行线程调度器可以从其管理的执行线程中为第一指令集选择第一执行线程,由第一执行线程执行第一指令集中的指令。进而,对第一指令集中的指令经过ID、DS、EX和WB等操作后,完成指令的执行。如果执行的指令为线程切换指令,线程切换指令也可以称为控制流指令,比如JMP指令或者I/O指令时,则EX可以通过IF停止取指,并更新线程切换指令对应的目的地址继续取指,同时还可以清除第一指令集中已经取出但尚未执行的指令,比如清除第一IQ中缓存的指令,对于已经进入IE的指令不执行WB操作等。此时,第一执行线程调度器可以调度自身管理的其他携带指令的ETH继续执行,从而ID切换至第一执行线程调度器的其他携带指令的ETH继续执行。
可选的,第一执行线程调度器在选择第一执行线程时,第一执行线程调度器可以从处于空闲状态的执行线程中选择一个执行线程作为第一执行线程,即第一执行线程在被第一执行线程调度器选择时处于空闲状态。
进一步的,在第一指令集处理完成后,该方法还可以包括:指令存储器解除第一IQ与第一执行线程调度器的绑定关系;线程管理器将第一流水线程设置为空闲状态;第一执行线程调度器将第一执行线程设置为空闲状态。其中,在第一指令集处理完成后,及时的释放处理器中与第一指令集相关的资源,避免了资源的不合理利用,从而提高了资源的使用灵活性,进而提高了指令的执行效率。
S404:第二执行线程调度器从多个执行线程中为第二指令集选择第二执行线程,第二执行线程用于执行第二指令集。其中,S403与S404可以不分先后顺序,图4中以S403与S404同时执行为例进行说明。
当第二执行线程调度器获取到第二指令集中的指令时,第二执行线程调度器可以从其管理的执行线程中为第二指令集选择第二执行线程,由第二执行线程执行第二指令集中的指令。进而,对第二指令集中的指令经过ID、DS、EX和WB等一系列操作后,完成指令的执行。如果执行的指令为线程切换指令,比如JMP指令或者I/O指令时,则EX可以通过IF停止取指,并更新线程切换指令对应的目的地址继续取指,同时还可以清除第二指令集中已经取出但尚未执行的指令,比如清除第二IQ中缓存的指令,对于已经进入IE的指令不执行WB操作等。此时,第二执行线程调度器可以调度自身管理的其他携带指令的ETH继续执行,从而ID切换至第二执行线程调度器的其他携带指令的ETH继续执行。
可选的,第二执行线程调度器在选择第二执行线程时,第二执行线程调度器可以从处于空闲状态的执行线程中选择一个执行线程作为第二执行线程,即第二执行线程在被第二执行线程调度器选择时处于空闲状态。
进一步的,在第二指令集处理完成后,该方法还可以包括:指令存储器解除第二 IQ与第二执行线程调度器的绑定关系;线程管理器将第二流水线程设置为空闲状态;第二执行线程调度器将第二执行线程设置为空闲状态。其中,在第二指令集处理完成后,及时的释放处理器中与第二指令集相关的资源,避免了资源的不合理利用,从而提高了资源的使用灵活性,进而提高了指令的执行效率。
本申请实施例提供的基于多线程的指令处理方法中,可以通过至少两个执行线程调度器中每个执行线程调度器调度一个执行线程执行指令,从而能够同时执行两个或者两个以上的指令集,这样在多个执行线程的数量一定的情况下,减少了处于等待阶段的执行线程,进而节省了IE的资源、提高了指令的执行效率和指令处理资源的利用率,进而也降低了成本。此外,在同一个指令集的处理过程中,灵活地选择流水线程、IQ和执行线程调度器,在指令集处理完成后,及时的释放流水线程、IQ和执行线程调度器,从而避免了上述资源的不合理占用,减小了由于上述资源不足而带来的延时,进一步提高了指令的处理效率。
上述主要从处理器的角度对本申请实施例提供的基于多线程的指令处理方法进行了介绍。可以理解的是,该处理器为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的网元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对处理器进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
本申请实施例还提供的一种处理器,该处理器的结构可以参见上述图2或者图3所示,该处理器包括:线程管理器、指令存储器、以及用于调度多个执行线程的至少两个执行线程调度器。
其中,线程管理器,用于接收第一指令处理请求和第二指令处理请求,第一指令处理请求用于请求处理器处理第一指令集,第二指令处理请求用于请求处理器处理第二指令集;线程管理器,还用于控制指令存储器向至少两个执行线程调度器中的第一执行线程调度器发送第一指令集,以及控制指令存储器向至少两个调度器中的第二执行线程调度器发送第二指令集,指令存储器用于存储第一指令集和第二指令集;第一执行线程调度器,用于从多个执行线程中为第一指令集选择第一执行线程,第一执行线程用于执行第一指令集;第二执行线程调度器用于,从多个执行线程中为第二指令集选择第二执行线程,第二执行线程用于执行第二指令集。
在一种可能的实现方式中,线程管理器用于管理多个流水线程,线程管理器还用于:从多个流水线程中处于空闲状态的流程线程中,分别为第一指令处理请求选择第一流水线程,为第二指令处理请求选择第二流水线程。
在一种可能的实现方式中,第一执行线程调度器,具体用于:从多个执行线程中 处于空闲状态的流程线程中,为第一指令集选择第一执行线程;第二执行线程调度器,具体用于:从多个执行线程中处于空闲状态的流程线程中,为第二指令集选择第二执行线程。
在一种可能的实现方式中,线程管理器还具体用于:从指令存储器中读取第一指令集,以使指令存储器从多个指令缓存队列中选择第一指令缓存队列,并将第一指令集缓存在第一指令缓存队列中,第一指令缓存队列用于为第一执行线程调度器提供第一指令集;线程管理器,还具体用于从指令存储器中读取第二指令集,以使指令存储器从多个指令缓存队列中选择第二指令缓存队列,并将第二指令集缓存在第一指令缓存队列中,第二指令缓存队列用于为第二执行线程调度器提供第一指令集。
在一种可能的实现方式中,指令存储器用于:根据至少两个执行线程调度器管理的多个执行线程中处于空闲状态的执行线程的数量,分别将第一指令缓存队列与第一执行线程调度器绑定、以及将第二指令缓存队列与第二执行线程调度器绑定。
在一种可能的实现方式中,在第一指令集处理完成后:指令存储器,还用于解除第一指令缓存队列与第一执行线程调度器的绑定关系;线程管理器,还用于将第一流水线程设置为空闲状态;第一执行线程调度器,还用于将第一执行线程设置为空闲状态。
在一种可能的实现方式中,在第二指令集处理完成后:指令存储器,还用于释放第二指令缓存队列与第二执行线程调度器的绑定关系;线程管理器,还用于将第二流水线程设置为空闲状态;第二执行线程调度器,还用于将第二流水线程设置为空闲状态。
在一种可能的实现方式中,在第一指令集或者第二指令集处理过程中,若第一指令集或第二指令集中存在线程切换指令:线程管理器,还用于根据线程切换指令控制指令存储器取出线程切换指令切换后的指令集;当线程切换指令属于第一指令集时,指令存储器还用于清除第一指令缓存队列中的指令,第一执行线程调度器,还用于调度多个执行线程中的第三执行线程继续执行;或者,当线程切换指令属于第二指令集时,指令存储器还用于清除第二指令缓存队列中的指令,第二执行线程调度器还用于调度多个执行线程中的第三执行线程继续执行。
如图5所示,本申请实施例还提供一种指令处理设备,参见图5,该指令处理设备包括:存储器501和处理器502。其中,存储器501用于存储该设备的程序代码和数据,处理器502用于对图5所示的设备的动作进行控制管理,处理器502中IE的结构可以为上述图2或者图3所示的结构,例如,具体用于支持该指令处理执行上述方法实施例中的S401-S404,和/或用于本文所描述的技术的其他过程。可选的,图5所示的指令处理还可以包括通信接口503,通信接口503用于支持该设备进行通信。
其中,处理器502可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,处理芯片、现场可编程门阵列或者其他可编程逻辑器件,晶体管逻辑器件,硬件部件或者其任意组合。其可以实现或执行结合本申请实施例公开内容所描述的各种例如逻辑方框,模块和电路。处理器502也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。通信接口503可以是收发器、收发电路或收发接口等。存储器501可以是易失性存储器或者非易失性 存储器等。
例如,通信接口503、处理器502以及存储器501通过总线504相互连接;总线504可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。总线504可以分为地址总线、数据总线、控制总线等。为便于表示,图5中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。可选地,存储器501可以包括于处理器502中。
在本申请实施例中,处理器可以通过至少两个执行线程调度器中每个执行线程调度器调度一个执行线程执行指令,从而能够同时执行两个或者两个以上的指令集,这样在多个执行线程的数量一定的情况下,减少了处于等待阶段的执行线程,进而节省了IE的资源、提高了指令的执行效率和指令处理资源的利用率,进而也降低了成本。此外,在同一个指令集的处理过程中,灵活地选择流水线程、IQ和执行线程调度器,在指令集处理完成后,及时的释放流水线程、IQ和执行线程调度器,从而避免了上述资源的不合理占用,减小了由于上述资源不足而带来的延时,进一步提高了指令的处理效率。
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (17)

  1. 一种基于多线程的指令处理方法,其特征在于,应用于处理器中,所述处理器包括:线程管理器、指令存储器、以及用于调度多个执行线程的至少两个执行线程调度器,所述方法包括:
    所述线程管理器接收第一指令处理请求和第二指令处理请求,所述第一指令处理请求用于请求所述处理器处理第一指令集,所述第二指令处理请求用于请求所述处理器处理第二指令集;
    所述线程管理器控制所述指令存储器向所述至少两个执行线程调度器中的第一执行线程调度器发送所述第一指令集,以及控制所述指令存储器向所述至少两个调度器中的第二执行线程调度器发送所述第二指令集,所述指令存储器用于存储所述第一指令集和所述第二指令集;
    所述第一执行线程调度器从所述多个执行线程中为所述第一指令集选择第一执行线程,所述第一执行线程用于执行所述第一指令集;所述第二执行线程调度器从所述多个执行线程中为所述第二指令集选择第二执行线程,所述第二执行线程用于执行所述第二指令集。
  2. 根据权利要求1所述的方法,其特征在于,所述线程管理器用于管理多个流水线程,所述方法还包括:
    所述线程管理器从所述多个流水线程中处于空闲状态的流程线程中,分别为所述第一指令处理请求选择第一流水线程,为所述第二指令处理请求选择第二流水线程。
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一执行线程调度器从所述多个执行线程中为所述第一指令集选择第一执行线程,包括:所述第一执行线程调度器从所述多个执行线程中处于空闲状态的流程线程中,为所述第一指令集选择第一执行线程;
    所述第二执行线程调度器从所述多个执行线程中为所述第二指令集选择第二执行线程,包括:所述第二执行线程调度器从所述多个执行线程中处于空闲状态的流程线程中,为所述第二指令集选择第二执行线程。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述线程管理器控制所述指令存储器向所述至少两个执行线程调度器中的第一执行线程调度器发送所述第一指令集,包括:所述线程管理器从所述指令存储器中读取所述第一指令集,以使所述指令存储器从所述多个指令缓存队列中选择第一指令缓存队列,并将所述第一指令集缓存在所述第一指令缓存队列中,所述第一指令缓存队列用于为所述第一执行线程调度器提供所述第一指令集;
    所述线程管理器控制所述指令存储器向所述至少两个调度器中的第二执行线程调度器发送所述第二指令集,包括:所述线程管理器从所述指令存储器读取所述第二指令集,以使所述指令存储器从所述多个指令缓存队列中选择第二指令缓存队列,并将所述第二指令集缓存在所述第二指令缓存队列中,所述第二指令缓存队列用于为所述第二执行线程调度器提供所述第二指令集。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    所述指令存储器根据所述至少两个执行线程调度器管理的多个执行线程中处于空闲状态的执行线程的数量,分别将所述第一指令缓存队列与所述第一执行线程调度器绑定、以及将所述第二指令缓存队列与所述第二执行线程调度器绑定。
  6. 根据权利要求5所述的方法,其特征在于,在所述第一指令集处理完成后,所述方法还包括:
    所述指令存储器解除所述第一指令缓存队列与所述第一执行线程调度器的绑定关系;
    所述线程管理器将所述第一流水线程设置为空闲状态;
    所述第一执行线程调度器将所述第一执行线程设置为空闲状态。
  7. 根据权利要求5或6所述的方法,其特征在于,在所述第二指令集处理完成后,所述方法还包括:
    所述指令存储器释放所述第二指令缓存队列与所述第二执行线程调度器的绑定关系;
    所述线程管理器将所述第二流水线程设置为空闲状态;
    所述第二执行线程调度器将所述第二流水线程设置为空闲状态。
  8. 根据权利要求1-7任一项所述的方法,其特征在于,在所述第一指令集或者所述第二指令集处理过程中,若所述第一指令集或所述第二指令集中存在线程切换指令,所述方法还包括:
    所述线程管理器根据所述线程切换指令控制所述指令存储器取出所述线程切换指令切换后的指令集;
    当所述线程切换指令属于所述第一指令集时,所述指令存储器清除所述第一指令缓存队列中的指令;所述第一执行线程调度器调度所述多个执行线程中的第三执行线程继续执行;
    或者,
    当所述线程切换指令属于所述第二指令集时,所述指令存储器清除所述第二指令缓存队列中的指令;所述第二执行线程调度器调度所述多个执行线程中的第三执行线程继续执行。
  9. 一种处理器,其特征在于,所述处理器包括:线程管理器、指令存储器、以及用于调度多个执行线程的至少两个执行线程调度器;其中,
    所述线程管理器,用于接收第一指令处理请求和第二指令处理请求,所述第一指令处理请求用于请求所述处理器处理第一指令集,所述第二指令处理请求用于请求所述处理器处理第二指令集;
    所述线程管理器,还用于控制所述指令存储器向所述至少两个执行线程调度器中的第一执行线程调度器发送所述第一指令集,以及控制所述指令存储器向所述至少两个调度器中的第二执行线程调度器发送所述第二指令集,所述指令存储器用于存储所述第一指令集和所述第二指令集;
    所述第一执行线程调度器,用于从所述多个执行线程中为所述第一指令集选择第一执行线程,所述第一执行线程用于执行所述第一指令集;所述第二执行线程调度器用于,从所述多个执行线程中为所述第二指令集选择第二执行线程,所述第二执行线 程用于执行所述第二指令集。
  10. 根据权利要求9所述的处理器,其特征在于,所述线程管理器用于管理多个流水线程,所述线程管理器还用于:
    从所述多个流水线程中处于空闲状态的流程线程中,分别为所述第一指令处理请求选择第一流水线程,为所述第二指令处理请求选择第二流水线程。
  11. 根据权利要求9或10所述的处理器,其特征在于,所述第一执行线程调度器,具体用于:从所述多个执行线程中处于空闲状态的流程线程中,为所述第一指令集选择第一执行线程;
    所述第二执行线程调度器,具体用于:从所述多个执行线程中处于空闲状态的流程线程中,为所述第二指令集选择第二执行线程。
  12. 根据权利要求9-11任一项所述的处理器,其特征在于,所述线程管理器还具体用于:从所述指令存储器中读取所述第一指令集,以使所述指令存储器从所述多个指令缓存队列中选择第一指令缓存队列,并将所述第一指令集缓存在所述第一指令缓存队列中,所述第一指令缓存队列用于为所述第一执行线程调度器提供所述第一指令集;
    所述线程管理器,还具体用于从所述指令存储器中读取所述第二指令集,以使所述指令存储器从所述多个指令缓存队列中选择第二指令缓存队列,并将所述第二指令集缓存在所述第一指令缓存队列中,所述第二指令缓存队列用于为所述第二执行线程调度器提供所述第一指令集。
  13. 根据权利要求12所述的处理器,其特征在于,所述指令存储器用于:
    根据所述至少两个执行线程调度器管理的多个执行线程中处于空闲状态的执行线程的数量,分别将所述第一指令缓存队列与所述第一执行线程调度器绑定、以及将所述第二指令缓存队列与所述第二执行线程调度器绑定。
  14. 根据权利要求13所述的处理器,其特征在于,在所述第一指令集处理完成后:
    所述指令存储器,还用于解除所述第一指令缓存队列与所述第一执行线程调度器的绑定关系;
    所述线程管理器,还用于将所述第一流水线程设置为空闲状态;
    所述第一执行线程调度器,还用于将所述第一执行线程设置为空闲状态。
  15. 根据权利要求13或14所述的处理器,其特征在于,在所述第二指令集处理完成后:
    所述指令存储器,还用于释放所述第二指令缓存队列与所述第二执行线程调度器的绑定关系;
    所述线程管理器,还用于将所述第二流水线程设置为空闲状态;
    所述第二执行线程调度器,还用于将所述第二流水线程设置为空闲状态。
  16. 根据权利要求9-15任一项所述的处理器,其特征在于,在所述第一指令集或者所述第二指令集处理过程中,若所述第一指令集或所述第二指令集中存在线程切换指令:
    所述线程管理器,还用于根据所述线程切换指令控制所述指令存储器取出所述线程切换指令切换后的指令集;
    当所述线程切换指令属于所述第一指令集时,所述指令存储器,还用于清除所述第一指令缓存队列中的指令;所述第一执行线程调度器,还用于调度所述多个执行线程中的第三执行线程继续执行;
    或者,
    当所述线程切换指令属于所述第二指令集时,所述指令存储器,还用于清除所述第二指令缓存队列中的指令;所述第二执行线程调度器,还用于调度所述多个执行线程中的第三执行线程继续执行。
  17. 一种指令处理设备,其特征在于,所述指令处理设备包括处理器、以及与所述处理器耦合的存储器,其中所述处理器为权利要求9-16任一项所述的处理器。
PCT/CN2018/123258 2018-12-24 2018-12-24 一种基于多线程的指令处理方法及装置 WO2020132841A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880098382.2A CN112789593A (zh) 2018-12-24 2018-12-24 一种基于多线程的指令处理方法及装置
PCT/CN2018/123258 WO2020132841A1 (zh) 2018-12-24 2018-12-24 一种基于多线程的指令处理方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/123258 WO2020132841A1 (zh) 2018-12-24 2018-12-24 一种基于多线程的指令处理方法及装置

Publications (1)

Publication Number Publication Date
WO2020132841A1 true WO2020132841A1 (zh) 2020-07-02

Family

ID=71126763

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/123258 WO2020132841A1 (zh) 2018-12-24 2018-12-24 一种基于多线程的指令处理方法及装置

Country Status (2)

Country Link
CN (1) CN112789593A (zh)
WO (1) WO2020132841A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114201219B (zh) * 2021-12-21 2023-03-17 海光信息技术股份有限公司 指令调度方法、指令调度装置、处理器及存储介质
CN114168202B (zh) * 2021-12-21 2023-01-31 海光信息技术股份有限公司 指令调度方法、指令调度装置、处理器及存储介质
CN115408153B (zh) * 2022-08-26 2023-06-30 海光信息技术股份有限公司 多线程处理器的指令分发方法、装置和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004719A (zh) * 2010-11-16 2011-04-06 清华大学 支持同时多线程的超长指令字处理器结构
US20110307688A1 (en) * 2010-06-10 2011-12-15 Carnegie Mellon University Synthesis system for pipelined digital circuits
US20120030657A1 (en) * 2010-07-30 2012-02-02 Qi Gao Method and system for using a virtualization system to identify deadlock conditions in multi-threaded programs by controlling scheduling in replay
CN105808357A (zh) * 2016-03-29 2016-07-27 沈阳航空航天大学 性能可精确控制多核多线程处理器

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023844B (zh) * 2009-09-18 2014-04-09 深圳中微电科技有限公司 并行处理器及其线程处理方法
CN104298552B (zh) * 2013-07-15 2018-06-19 华为技术有限公司 多线程处理器的线程取指调度方法、系统和多线程处理器
US9665466B2 (en) * 2014-09-02 2017-05-30 Nxp Usa, Inc. Debug architecture for multithreaded processors
CN105786448B (zh) * 2014-12-26 2019-02-05 深圳市中兴微电子技术有限公司 一种指令调度方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110307688A1 (en) * 2010-06-10 2011-12-15 Carnegie Mellon University Synthesis system for pipelined digital circuits
US20120030657A1 (en) * 2010-07-30 2012-02-02 Qi Gao Method and system for using a virtualization system to identify deadlock conditions in multi-threaded programs by controlling scheduling in replay
CN102004719A (zh) * 2010-11-16 2011-04-06 清华大学 支持同时多线程的超长指令字处理器结构
CN105808357A (zh) * 2016-03-29 2016-07-27 沈阳航空航天大学 性能可精确控制多核多线程处理器

Also Published As

Publication number Publication date
CN112789593A (zh) 2021-05-11

Similar Documents

Publication Publication Date Title
US11880687B2 (en) System having a hybrid threading processor, a hybrid threading fabric having configurable computing elements, and a hybrid interconnection network
US11836524B2 (en) Memory interface for a multi-threaded, self-scheduling reconfigurable computing fabric
US10929323B2 (en) Multi-core communication acceleration using hardware queue device
US11531543B2 (en) Backpressure control using a stop signal for a multi-threaded, self-scheduling reconfigurable computing fabric
US11567766B2 (en) Control registers to store thread identifiers for threaded loop execution in a self-scheduling reconfigurable computing fabric
US11635959B2 (en) Execution control of a multi-threaded, self-scheduling reconfigurable computing fabric
TWI628594B (zh) 用戶等級分叉及會合處理器、方法、系統及指令
US6829697B1 (en) Multiple logical interfaces to a shared coprocessor resource
KR101486025B1 (ko) 프로세서에서의 쓰레드 스케쥴링
US8963933B2 (en) Method for urgency-based preemption of a process
US20230153258A1 (en) Multi-Threaded, Self-Scheduling Reconfigurable Computing Fabric
CN107066408B (zh) 用于数字信号处理的方法、系统和装置
WO2001016714A1 (en) Fast write instruction for micro engine used in multithreaded parallel processor architecture
WO2001048619A2 (en) Distributed memory control and bandwidth optimization
WO2020132841A1 (zh) 一种基于多线程的指令处理方法及装置
US9170816B2 (en) Enhancing processing efficiency in large instruction width processors
US20050193186A1 (en) Heterogeneous parallel multithread processor (HPMT) with shared contexts
CN110908716B (zh) 一种向量聚合装载指令的实现方法
CN111045800A (zh) 一种基于短作业优先的优化gpu性能的方法及系统
CN112540796A (zh) 一种指令处理装置、处理器及其处理方法
US10771554B2 (en) Cloud scaling with non-blocking non-spinning cross-domain event synchronization and data communication
US20170147345A1 (en) Multiple operation interface to shared coprocessor
WO2024041625A1 (zh) 多线程处理器的指令分发方法、装置和存储介质
US10133578B2 (en) System and method for an asynchronous processor with heterogeneous processors
US7191309B1 (en) Double shift instruction for micro engine used in multithreaded parallel processor architecture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18944511

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18944511

Country of ref document: EP

Kind code of ref document: A1