WO2016101664A1 - 一种指令调度方法及装置 - Google Patents

一种指令调度方法及装置 Download PDF

Info

Publication number
WO2016101664A1
WO2016101664A1 PCT/CN2015/090154 CN2015090154W WO2016101664A1 WO 2016101664 A1 WO2016101664 A1 WO 2016101664A1 CN 2015090154 W CN2015090154 W CN 2015090154W WO 2016101664 A1 WO2016101664 A1 WO 2016101664A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
thread
module
fetch
scheduling
Prior art date
Application number
PCT/CN2015/090154
Other languages
English (en)
French (fr)
Inventor
周峰
安康
王志忠
刘衡祁
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2016101664A1 publication Critical patent/WO2016101664A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead

Definitions

  • the present invention relates to network processor technologies, and in particular, to an instruction scheduling method and apparatus.
  • the core routers in the backbone of the Internet have undergone one technological change.
  • network processors have become an irreplaceable part of the routing and forwarding engine with its outstanding packet processing performance and programmability.
  • the micro engine (ME, Micro Engine) is the core component of the network processor, and is responsible for parsing and processing the message according to the microcode instructions.
  • the microcode instruction is a necessary condition for the ME to work. Therefore, the scheduling of the ME instruction and the instruction affects the overall performance of the ME. The low scheduling efficiency of the instruction and instruction will make the ME not have enough instructions to execute and is in the initial state. Therefore, it is necessary to design a reasonable and efficient scheme to realize the scheduling of ME instruction and instruction, and at the same time, the ME has higher processing performance.
  • the ME of the fine-grained multi-threaded structure can use the thread switching to hide the problem of taking the empty shot
  • the degree of disorder of the ME processing message is aggravated to some extent due to the frequent switching of the thread.
  • the degree of disorder will become larger and larger, which will eventually lead to an increase in the degree of disorder of the message entering and leaving the ME, which will cause greater pressure on the subsequent order-keeping module of the network processor, resulting in a decline in the overall performance of the system.
  • the embodiments of the present invention provide:
  • An instruction scheduling method includes:
  • the instruction scheduling is performed according to the thread state and the cached prefetch instruction.
  • the prefetching instruction includes:
  • Receiving a response message from the instruction cache module determining that the response message carries the instruction fetch success information and the instruction content, acquiring the instruction content for local caching; determining that the response message carries the fetch failure information and the fetch address The fetch address is re-added to the fetch queue, and the fetch is re-acquired according to the schedule.
  • the fetch address includes one or more of the following: a new packet fetch address, a branch fetch address, a refetch address, and a sequential fetch address.
  • the method further includes:
  • a new round of instruction prefetching of the thread is initiated, that is, a preset number of instructions of the thread are prefetched and cached.
  • the thread state includes: an initial state, a wait state, and a ready state
  • the command scheduling according to the thread state and the cached prefetch instruction includes:
  • the thread whose thread state is in the ready state is scheduled.
  • An embodiment of the present invention further provides an instruction scheduling apparatus, including: an instruction fetching module, an instruction register module, a control state machine module, and an instruction scheduling module; wherein
  • the fetching module is configured to prefetch instructions of each thread
  • the instruction register module is configured to cache a preset number of instructions of each thread prefetched
  • the control state machine module is configured to perform thread state control
  • the instruction scheduling module is configured to perform instruction scheduling according to a thread state provided by the control state machine module and a prefetch instruction cached by the instruction register module.
  • the fetching module is specifically configured to: send an instruction to the instruction cache module, where the fetch request carries at least an fetch address; and receive a response message from the instruction cache module, where the response message carries The failure information and the fetching address are re-added to the fetching queue, and the fetching instruction is re-acquired according to the scheduling;
  • the control state machine module is further configured to receive a response message from the instruction cache module, where the response message carries the instruction fetch success information and the instruction content, and then acquires the instruction content and sends the instruction content to the instruction register module for caching.
  • control state machine module is further configured to determine whether the number of instructions of the thread cached by the instruction register module is not greater than a preset value
  • the fetching module is further configured to start a new round of instruction prefetching of the thread when the control state machine module determines that the number of instructions of the thread is not greater than a preset value.
  • the instruction scheduling module is specifically configured to schedule, according to the LRU, a thread whose thread state is in a ready state.
  • the instruction scheduling method and device prefetch and cache a preset number of instructions of each thread; and perform instruction scheduling according to the thread state and the cached prefetch instruction.
  • the embodiment of the present invention first performs instruction prefetching, and then performs scheduling according to the prefetched instruction, thereby avoiding instruction empty shooting, improving instruction scheduling efficiency and overall performance of the ME; in addition, if instruction priority scheduling is further performed, the disorder can be reduced.
  • the degree of order further improves the efficiency of instruction scheduling and the overall performance of the ME.
  • FIG. 1 is a schematic flowchart of an instruction scheduling method according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of an instruction scheduling apparatus according to an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a ME instruction scheduling process according to Embodiment 1 of the present invention.
  • FIG. 4 is a schematic diagram of a ME instruction prefetching process according to Embodiment 2 of the present invention.
  • FIG. 5 is a schematic diagram of a state transition of a control state machine according to Embodiment 2 of the present invention.
  • FIG. 6 is a schematic flowchart of ME thread level instruction scheduling according to Embodiment 3 of the present invention.
  • FIG. 7 is a schematic structural diagram of an instruction scheduling module according to Embodiment 3 of the present invention.
  • the embodiment of the present invention provides an instruction scheduling method, as shown in FIG. 1 , the method includes:
  • Step 101 Prefetch and cache a preset number of instructions of each thread.
  • the prefetching instruction includes:
  • the instruction fetching at least carries the fetch address; where the fetch address may include one or more of the following: a new packet fetch address, a branch fetch address, a re-fetch address, and Order fetch addresses.
  • Receiving a response message from the instruction cache module ie, the cache
  • determining that the response message carries the instruction fetch success information and the instruction content ie, receiving the fetch success response
  • acquiring the instruction content for local caching determining the When the response message carries the fetching failure information and the fetching address (that is, the fetching failure response is received), the fetching address is re-added to the fetching queue, and the fetching instruction is re-acquired according to the scheduling.
  • Step 102 Perform instruction scheduling according to a thread state and the cached prefetch instruction.
  • the thread state includes: an initial state, a wait state, and a ready state.
  • the command scheduling is performed according to the thread state and the cached prefetch instruction, including: the thread state according to the least recently used algorithm LRU Scheduled for the thread in the ready state.
  • the thread state is the ready state indicating that the control state machine module gives a thread ready state signal. It should be noted that the preparation status signal is not given immediately after the thread is ready, and the preset time is generally required to be separated between the two preparation status signals of the same thread.
  • the method further includes:
  • a new round of instruction prefetching of the thread is initiated, that is, a preset number of instructions of the thread are prefetched and cached.
  • An embodiment of the present invention further provides an instruction scheduling apparatus.
  • the apparatus includes: an instruction fetching module 201, an instruction register module 202, a control state machine module 203, and an instruction scheduling module 204.
  • the fetching module 201 is configured to prefetch instructions of each thread
  • the instruction register module 202 is configured to cache a preset number of instructions of each thread prefetched
  • the control state machine module 203 is configured to perform thread state control
  • the instruction scheduling module 204 is configured to perform instruction scheduling according to a thread state provided by the control state machine module and a prefetch instruction cached by the instruction register module.
  • the fetching module 201 is specifically configured to: send an instruction to the instruction cache module, where the fetch request carries at least an fetch address; and receive a response message from the instruction cache module, where the response message carries Fetching the failure information and the fetching address, and then re-adding the fetching address to the fetching queue, and retrieving the finger according to the scheduling;
  • control state machine module 203 is further configured to receive a response message from the instruction cache module, where the response message carries the instruction fetch success information and the instruction content, and then acquires the instruction content and sends the instruction content to the instruction register module. Cache.
  • control state machine module 203 is further configured to determine whether the number of instructions of the thread buffered by the instruction register module is not greater than a preset value
  • the fetching module 201 is further configured to start a new round of instruction prefetching of the thread when the control state machine module determines that the number of instructions of the thread is not greater than a preset value.
  • the instruction scheduling module 204 is specifically configured to schedule, according to the LRU, a thread whose thread state is in a ready state.
  • Setting the instruction scheduling apparatus according to the embodiment of the present invention on the ME can avoid the instruction empty Shoot, reduce the degree of disorder, and thus improve the efficiency of instruction scheduling and the overall performance of the ME.
  • the instruction scheduling apparatus is disposed in the ME, and the ME performs instruction fetching and instruction scheduling of the multi-thread by the instruction scheduling apparatus.
  • the instruction scheduling device generates the respective fetching PCs in advance according to the execution status of all the threads, and obtains the instructions of the corresponding threads from the cache, and loads them into the internal cache.
  • the instruction scheduling device manages the instruction scheduling of each thread, completes the thread-level instruction scheduling according to the LRU algorithm (Least Recently Used), and transmits the instruction to the pipeline from the cache, and ensures that the message entering the ME first executes the instruction first.
  • LRU algorithm Least Recently Used
  • the instruction scheduling apparatus involved in this embodiment mainly includes an instruction fetching module (gen_pc), a control state machine module (ctrl_fsm), an instruction register module (instr_rf), and an instruction scheduling module.
  • the instruction scheduling apparatus generates the respective fetching PCs in advance according to the execution status of all the threads, and obtains the instructions of the corresponding threads from the cache, and loads them into the internal cache, which specifically includes:
  • the message When a new message enters the ME, the message carries the new packet fetch address (pkt_pc).
  • the fetch module parses the message information, generates pkt_pc, writes the fetch queue (pc_queue), and issues an fetch request to the cache.
  • the instruction dispatching device After the cache returns an instruction, the instruction dispatching device writes the returned instruction to the corresponding cache in the instruction register module according to its thread number.
  • the instruction register module completes the loading and pre-parsing operations of the retrieval instruction, and each thread can store 4 instructions. Notifies the control state machine module of the parsed halt class, jump class, or illegal instruction.
  • the control state machine module monitors the instruction register of each thread. When a thread executes the second-to-last instruction, it will issue an instruction fetch request for each thread in advance (sequential fetch address) to avoid the last thread of the current thread. After the instruction is issued, there is no valid instruction in the cache that can be issued, and an empty shot occurs.
  • the cache returns the current pc (retrieve) Referring to the address) value, the instruction dispatch device rewrites the pc to the pc queue and waits for the fetch again.
  • the instruction scheduling apparatus further manages the instruction scheduling of each thread, completes the thread-level instruction scheduling according to the LRU algorithm (Least Recently Used) on the basis of the instruction prefetching, and transmits the instruction to the pipeline from the cache, while ensuring The message that enters the ME first executes the instruction first.
  • LRU algorithm Least Recently Used
  • the state machine When a thread instruction register is loaded with a valid instruction, the state machine is in the rdy state, and after giving the ready signal, it waits for the instruction module to dispatch the authorization.
  • the instruction scheduling module tries to get the highest priority for the thread that the packet first enters, so that the instruction of this thread is processed first.
  • the instruction scheduling module completes the authorization according to the ready given by different threads, and the authorized thread sends the instruction to the processing of the pipeline completion instruction.
  • the state machine When a thread's instruction register instruction is allocated (four instructions have been sent to the kernel pipeline) or a branch instruction is parsed, the state machine jumps to the wait state. If a thread has only the last instruction left in the cache, the instruction dispatching device writes the fetch address PC to the pc queue and retrieves the next four sets of instructions from the cache. If it is a branch instruction, wait for the pipeline to parse out the new pc and write it to the pc queue. The ready signal is not given at this time, that is, it does not participate in thread-level instruction scheduling. When the new instruction returns, the state machine re-enters the rdy state. If the halt instruction is parsed, the state machine jumps to the idle state, pulls down the ready signal, and does not participate in thread-level instruction scheduling.
  • FIG. 3 is a schematic diagram of a ME instruction scheduling process according to Embodiment 1 of the present invention. As shown in FIG. 3, the method includes the following steps:
  • Step 301 The fetching module parses the fetching pc information carried in the new packet, and issues an instruction for fetching;
  • the fetch module is composed of four queue plus arbitration modules, and the fetch request corresponding to the queue cache sends an instruction to the cache after 4 to 1 arbitration.
  • the four queues correspond to the four fetching pc requests, and the new packet fetches the pkt_pc and the report.
  • the text sequence fetches, jumps the fetch, and the cache misses the fetch.
  • the packet information is parsed, the fetching pc is extracted, and the fetching request is issued.
  • the fetching pc+1 For the message being processed, there is actually a fetch of the order pc+1, the fetch after the jump instruction, and the repeated fetch after the cache miss, in order to facilitate management and implement the fetch operation, so the queue is used. Management and caching, where the depth of the queue is only set to the number of threads of the ME.
  • the priority order of the queue setting arbitration is pc+1 fetch, jump fetch, cache miss fetch, new packet pc fetch (high priority first), which can ensure performance optimization.
  • Step 302 After the new packet sends the instruction to fetch the instruction, the cache returns an instruction to the control state machine module, and the instruction register module acquires the instruction and completes loading and pre-parsing of the instruction;
  • the instruction cache in the instruction register module is composed of register groups, and each thread can cache four instructions.
  • the instruction register module completes the pre-resolving operation on the instruction, analyzes the operation type of the instruction, and provides a control signal for the state controller.
  • Step 303 the instruction scheduling module completes the instruction level scheduling, and issues an instruction to the pipeline;
  • the instruction of the message will issue a ready request to the instruction scheduling module (the current thread has prepared the instruction, and the request is sent to the pipeline).
  • the instruction scheduling module gives authorization, the thread where the message is located issues an instruction to the pipeline.
  • the instruction scheduling module needs to use a scheduling policy to ensure that packets that enter first can always be authorized with higher priority.
  • the lru scheduling policy is an RR scheduling algorithm that dynamically updates the base value.
  • the instruction dispatch module uses the thread number of each thread in the queue as the base value of the RR scheduling algorithm.
  • the thread number of the packet that enters the ME first will be placed at the head of the queue and used as the base value of the RR scheduling algorithm.
  • the result of the RR scheduling will always give high priority to the packets in the head of the queue.
  • Level authorization is an RR scheduling algorithm that dynamically updates the base value.
  • step 304 the control state machine module monitors the working state of each thread, issues an instruction to fetch in advance, or releases the thread to complete the processing.
  • control state machine module monitors the instruction cache count of each thread. When only one instruction is left in the instruction cache of a thread, the instruction of the pc+1 of each thread is sent to the instruction fetch module in advance. After the last instruction of the current thread is issued, there is no valid instruction in the cache that can be issued, and an empty shot occurs.
  • the control state machine module sets a state machine for each thread.
  • a thread executes a jump class instruction
  • the state machine enters a wait state, and after the process resolves the jump instruction, the fetch request is sent to the fetch module, etc.
  • the state machine of the thread is reactivated to enter the rdy working state.
  • step 305 when a thread executes the Halt instruction, the state machine enters the idle state.
  • the Halt instruction is an instruction that is processed and issued by the message. Therefore, after executing this instruction, the control state machine module will cause the state machine of this thread to enter idle state, release all cache resources of this thread, and wait for this thread to be reassigned to new. It will be used again after the package.
  • FIG. 4 is a schematic diagram of the instruction prefetching process of the ME according to the second embodiment of the present invention, as shown in FIG. Includes the following steps:
  • Step 401 The new packet enters the ME, and the instruction scheduling device parses and extracts the fetching pc, and sends an instruction for fetching to the fetching module;
  • Step 402 the fetching module is subjected to arbitration scheduling, and sends an instruction for fetching to the cache;
  • step 403 the instruction is returned and loaded into the instruction register module, and is scheduled to be transmitted to the pipeline.
  • the Control State Machine module monitors the instruction firing for each thread. When there is only one instruction left in the instruction cache, the pc+1 fetch request will be issued to the fetch module in advance. It will continue to execute after returning.
  • the control state machine module sets a state machine for each thread, and when the thread executes the jump class instruction, the control state machine module will cause the state machine to enter the wait state, and send the pipeline-resolved jump address request to the fetching finger. Module. After the instruction returns, the state machine will be activated to enter the rdy state, wait for scheduling, and continue to transmit instructions to the pipeline.
  • control state machine module When the thread executes the Hlat instruction, it indicates that the message has been executed and issued.
  • the control state machine module will make the state machine enter the idle state and release all the resources of the thread, and the message is processed.
  • FIG. 5 is a schematic diagram of a state transition of a control state machine according to Embodiment 2 of the present invention. As shown in FIG. 5, the state jump specifically relates to:
  • a ready request is issued (a ready request signal for one thread is issued every 4 cycles). If the data port instruction resolves to halt and gets authorization for this instruction, it branches to the idle state. This thread package is processed and returned to the initial state.
  • the data port instruction resolves to a jump class instruction and is authorized by this instruction, it branches to the wait state.
  • the wait state it indicates that the instruction in the instruction cache has been transmitted, and waits for the new instruction to return.
  • the ready request is not given at this time. After the cache returns a new instruction, it re-transfers to the rdy state and reissues the ready request.
  • the instruction scheduling apparatus manages the instruction scheduling of each thread through the instruction scheduling module.
  • the instruction scheduling module completes the thread-level instruction scheduling according to the LRU algorithm (Least Recently Used), and transmits the instruction to the pipeline from the cache to ensure that the message entering the ME first executes the instruction first.
  • LRU algorithm Least Recently Used
  • FIG. 6 is a schematic flowchart of ME thread level instruction scheduling according to Embodiment 3 of the present invention. As shown in FIG. 6, the method includes the following steps:
  • Step 601 extract a thread number of the new package, and write it to the base queue
  • the essence of the thread-level instruction scheduling strategy of the present invention is to dynamically update the base's RR polling schedule.
  • the thread can always obtain the highest priority authorization.
  • the base queue depth is the same as the number of ME threads, and is used to store the thread number of each thread. Each thread number corresponds to a base value.
  • Step 602 dividing the base value of all threads into 4 group storages, and completing the final scheduling according to the four groups;
  • the new packets are sequentially written to the four groups in the order of 0-3. When the current group is full, the next group is written.
  • the four groups respectively give the scheduling results in the respective groups, and then press 0-3. The order is given from the four groups in the final authorization.
  • Step 603 when the base value corresponding to the new package is written to a group, the corresponding bit position of the thread is valid, and when the packet is processed, the bitmap position of the thread is invalid.
  • FIG. 7 is a schematic structural diagram of an instruction scheduling module according to Embodiment 3 of the present invention.
  • each thread in a group has a corresponding bitmap flag bit, which is used to mark whether a thread stored in the group is being executed.
  • the flag is set to 1, indicating that the thread is executing.
  • the corresponding bitmap tag position of the thread is 0, indicating that the thread has been processed and no longer participates in instruction scheduling.
  • the group checks the bitmap tag bit corresponding to the base value of the queue header every cycle.
  • the base value of the queue header is read out, and the thread corresponding to the base value no longer participates in the instruction scheduling; meanwhile, the next base value is read. Out of the head position, the corresponding thread participates in the instruction scheduling with the highest priority.
  • Step 604 The queue is output to the RR scheduling module in a first-in-first-out order to implement dynamic update of the base RR polling scheduling.
  • the thread at the top of the queue is enjoying the highest priority instruction dispatch, while base
  • the write order of the values is written in the order of the incoming packets. Therefore, the lru scheduling strategy implements the design of the instructions that first enter the ME and execute the instructions first.
  • Step 605 after a certain thread finishes processing, the base value of the thread is read, and the next base value is read out to the head position, and the corresponding thread participates in the instruction scheduling with the highest priority.
  • the thread-level instruction scheduling is completed according to the LRU algorithm (Least Recently Used), and the design of the instruction that first enters the ME is preferentially executed.
  • the foregoing embodiment of the present invention provides a micro-engine instruction scheduling scheme.
  • the ME completes multi-thread instruction fetching and instruction scheduling by using an instruction scheduling device, and generates respective fetching PCs in advance according to execution conditions of all threads, and obtains corresponding threads from the cache.
  • the instruction is loaded into the internal cache, and at the same time manages the instruction scheduling of each thread, completes the thread-level instruction scheduling according to the LRU algorithm, and transmits the instruction to the pipeline from the cache, ensuring that the message entering the ME first executes the instruction first;
  • the structure effectively avoids the problem of the ME taking the empty shot, effectively improving the ME performance; at the same time, ensuring that the messages entering the ME can be executed in sequence, improving the overall performance of the network processor, and the solution is relatively simple and easy to implement. achieve.
  • Each of the above units may be implemented by a central processing unit (CPU), a digital signal processor (DSP), or a field-programmable gate array (FPGA) in an electronic device.
  • CPU central processing unit
  • DSP digital signal processor
  • FPGA field-programmable gate array
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Abstract

本发明实施例公开了一种指令调度方法及装置,该方法包括:预取并缓存各线程预设数量的指令;根据线程状态及所述缓存的预取指令进行指令调度。本发明实施例首先执行指令预取,再根据预取的指令进行调度,从而能够避免指令空拍,提高指令调度效率及ME的整体性能;此外,如果进一步进行指令优先级调度,还可以减轻乱序程度,进一步提高指令调度效率及ME的整体性能。

Description

一种指令调度方法及装置 技术领域
本发明涉及网络处理器技术,尤其涉及一种指令调度方法及装置。
背景技术
为了满足未来网络发展的需要,提高路由器的性能,处于因特网(Internet)骨干位置的核心路由器进行了一个又一个的技术变革。尤其在高端路由器市场,网络处理器以其杰出的报文处理性能及可编程性已经成为构成路由转发引擎不可替代的部分。
在网络处理器系统中,微引擎(ME,Micro Engine)是网络处理器的核心部件,负责根据微码指令(Microcode Instructions)完成对报文的解析与处理。微码指令是ME工作的必要条件,因此,ME取指及指令的调度影响着ME整体的性能,取指及指令的调度效率不高将会使ME没有足够的指令来执行而处于初始状态。因此需要设计一个合理高效的方案来实现ME取指及指令的调度,同时使ME有较高的处理性能。
传统的多线程网络处理器都采用了高速缓存(cache)来存储微码指令,由于cache的访问有一定几率的不命中,对于粗粒度多线程结构的ME在取指及指令的调度方法不高效时,指令的空拍会造成内核流水线的空闲,从而导致ME性能的下降。
另外,虽然细粒度多线程结构的ME可以利用线程切换来隐藏取指空拍的问题,但是由于线程频繁的切换,在一定程度上会加重ME处理报文的乱序程度。这种乱序程度会越来越大,最终导致报文进出ME的乱序程度加大,从而对网络处理器后续的保序模块造成较大的压力,导致系统整体性能的下降。
发明内容
有鉴于此,为解决现有存在的技术问题,本发明实施例提供:
一种指令调度方法,包括:
预取并缓存各线程预设数量的指令;
根据线程状态及所述缓存的预取指令进行指令调度。
一具体实施例中,所述预取指令,包括:
向指令缓存模块发送取指请求,所述取指请求至少携带取指地址;
接收来自指令缓存模块的响应消息,确定所述响应消息携带取指成功信息和指令内容时,获取所述指令内容以进行本地缓存;确定所述响应消息携带取指失败信息和取指地址时,将所述取指地址重新加入取指队列,根据调度重新取指。
一具体实施例中,所述取指地址包括以下一种或多种:新包取指地址、分支取指地址、重新取指地址和顺序取指地址。
一具体实施例中,该方法还包括:
确定线程缓存的指令数量不大于预设值时,启动新一轮对所述线程的指令预取,即预取并缓存所述线程预设数量的指令。
一具体实施例中,线程状态包括:初始状态、等待状态和准备状态,
所述根据线程状态及所述缓存的预取指令进行指令调度,包括:
根据近期最少使用算法LRU及线程的进包先后顺序,对线程状态为准备状态的线程进行调度。
本发明实施例还提供一种指令调度装置,包括:取指模块、指令寄存器模块、控制状态机模块和指令调度模块;其中,
所述取指模块,用于预取各线程的指令;
所述指令寄存器模块,用于缓存预取的各线程预设数量的指令;
所述控制状态机模块,用于进行线程状态控制;
所述指令调度模块,用于根据控制状态机模块提供的线程状态及指令寄存器模块缓存的预取指令进行指令调度。
一具体实施例中,所述取指模块具体用于:向指令缓存模块发送取指请求,所述取指请求至少携带取指地址;接收来自指令缓存模块的响应消息,所述响应消息携带取指失败信息和取指地址,之后,将所述取指地址重新加入取指队列,根据调度重新取指;
所述控制状态机模块,还用于接收来自指令缓存模块的响应消息,所述响应消息携带取指成功信息和指令内容,之后,获取所述指令内容并发送至指令寄存器模块进行缓存。
一具体实施例中,所述控制状态机模块,还用于判断指令寄存器模块缓存的线程的指令数量是否不大于预设值;
所述取指模块,还用于在控制状态机模块确定线程的指令数量不大于预设值时,启动新一轮对所述线程的指令预取。
一具体实施例中,所述指令调度模块,具体用于根据LRU对线程状态为准备状态的线程进行调度。
本发明实施例所述的指令调度方法及装置,预取并缓存各线程预设数量的指令;根据线程状态及所述缓存的预取指令进行指令调度。本发明实施例首先执行指令预取,再根据预取的指令进行调度,从而能够避免指令空拍,提高指令调度效率及ME的整体性能;此外,如果进一步进行指令优先级调度,还可以减轻乱序程度,进一步提高指令调度效率及ME的整体性能。
附图说明
图1为本发明实施例一种指令调度方法流程示意图;
图2为本发明实施例一种指令调度装置结构示意图;
图3为本发明实施例1所述的ME指令调度流程示意图;
图4为本发明实施例2所述的ME指令预取流程示意图;
图5为本发明实施例2中控制状态机状态跳转示意图;
图6为本发明实施例3所述的ME线程级指令调度的流程示意图;
图7为本发明实施例3中指令调度模块的结构示意图。
具体实施方式
为了提高指令调度效率及ME的整体性能,本发明实施例提出了一种指令调度方法,如图1所示,该方法包括:
步骤101:预取并缓存各线程预设数量的指令。
一具体实施例中,所述预取指令,包括:
向指令缓存模块发送取指请求,所述取指请求至少携带取指地址;这里,取指地址可以包括以下一种或多种:新包取指地址、分支取指地址、重新取指地址和顺序取指地址。
接收来自指令缓存模块(即cache)的响应消息,确定所述响应消息携带取指成功信息和指令内容(即接收到取指成功响应)时,获取所述指令内容以进行本地缓存;确定所述响应消息携带取指失败信息和取指地址(即接收到取指失败响应)时,将该取指地址重新加入取指队列,根据调度重新取指。
步骤102:根据线程状态及所述缓存的预取指令进行指令调度。
一具体实施例中,线程状态包括:初始状态、等待状态和准备状态,相应的,所述根据线程状态及所述缓存的预取指令进行指令调度,包括:根据近期最少使用算法LRU对线程状态为准备状态的线程进行调度。这里,线程状态为准备状态表示控制状态机模块给出了线程准备状态信号。需要说明的是,并不是在线程准备好之后立即给出准备状态信号,同一线程的两个准备状态信号之间一般需要间隔预设时间。
需要说明的是,如果指令被调度,则删除缓存中的该指令。
一具体实施例中,该方法还包括:
确定线程缓存的指令数量不大于预设值时,启动新一轮对所述线程的指令预取,即预取并缓存所述线程预设数量的指令。
本发明实施例还相应地提出了一种指令调度装置,如图2所示,该装置包括:取指模块201、指令寄存器模块202、控制状态机模块203和指令调度模块204;其中,
所述取指模块201,用于预取各线程的指令;
所述指令寄存器模块202,用于缓存预取的各线程预设数量的指令;
所述控制状态机模块203,用于进行线程状态控制;
所述指令调度模块204,用于根据控制状态机模块提供的线程状态及指令寄存器模块缓存的预取指令进行指令调度。
一具体实施例中,所述取指模块201具体用于:向指令缓存模块发送取指请求,所述取指请求至少携带取指地址;接收来自指令缓存模块的响应消息,所述响应消息携带取指失败信息和取指地址,之后,将该取指地址重新加入取指队列,根据调度重新取指;
相应的,所述控制状态机模块203,还用于接收来自指令缓存模块的响应消息,所述响应消息携带取指成功信息和指令内容,之后,获取所述指令内容并发送至指令寄存器模块进行缓存。
一具体实施例中,所述控制状态机模块203,还用于判断指令寄存器模块缓存的线程的指令数量是否不大于预设值;
所述取指模块201,还用于在控制状态机模块确定线程的指令数量不大于预设值时,启动新一轮对所述线程的指令预取。
一具体实施例中,所述指令调度模块204,具体用于根据LRU对线程状态为准备状态的线程进行调度。
将本发明实施例所述的指令调度装置设置于ME上,能够避免指令空 拍,减轻乱序程度,进而提高指令调度效率及ME的整体性能。
下面通过具体实施例对本发明的技术方案作进一步详细说明。下述实施例中,指令调度装置设置于ME,ME通过指令调度装置完成多线程的指令取指及指令调度。指令调度装置根据所有线程执行情况,提前产生各自的取指PC,并从cache中取得相应线程的指令,加载到内部的缓存。指令调度装置管理每个线程的指令调度,按照LRU算法(Least Recently Used)完成线程级指令调度,从缓存中将指令发射给流水线(pipeline),同时保证先进入ME的报文优先执行完指令。
本实施例中涉及的指令调度装置参考图2,主要包括取指模块(gen_pc)、控制状态机模块(ctrl_fsm)、指令寄存器模块(instr_rf)和指令调度模块。本实施例中,指令调度装置根据所有线程执行情况,提前产生各自的取指PC,并从cache中取得相应线程的指令,加载到内部的缓存,具体包括:
1)新报文进入ME时,信息中携带有新包取指地址(pkt_pc)。取指模块解析报文信息,产生pkt_pc,写入取指队列(pc_queue),向cache发出取指请求。cache返回指令后,指令调度装置将返回的指令按照其线程号,写入到指令寄存器模块中的相应缓存。
2)指令寄存器模块完成取回指令的加载和预解析操作,每个线程可以存储4条指令。将解析出的halt类、跳转类或者非法指令通知给控制状态机模块。
3)控制状态机模块监视每个线程的指令寄存器,当某个线程执行到倒数第二条指令时,会提前发出每个线程的取指请求(顺序取指地址),以避免当前线程最后一条指令发出后,缓存中没有有效指令可被发出,出现空拍。
4)当取指发生不命中(cache miss)时,cache返回当前的pc(重新取 指地址)值,指令调度装置将此pc重新写入pc queue,等待再次取指。
下述实施例中,指令调度装置还管理每个线程的指令调度,在指令预取的基础上按照LRU算法(Least Recently Used)完成线程级指令调度,从缓存中将指令发射给pipeline,同时保证先进入ME的报文优先执行完指令,具体处理为:
当某个线程指令寄存器被加载有效指令后,状态机处于rdy状态,给出ready信号后,等待指令模块调度授权。指令调度模块根据Least Recently Used算法,尽量使报文先进入的线程取得最高优先级,使这个线程的指令最先处理完毕。指令调度模块根据不同线程给出的ready,完成授权,得到授权的线程即将指令发送到流水线完成指令的处理。
当某个线程的指令寄存器指令分配完毕(4个指令都已经发送给内核流水线)或者解析出分支指令时,状态机跳转到wait状态。如果某个线程只剩下最后一条指令在缓存未被发出时,指令调度装置则向pc queue写入取指地址PC,向cache取回下一组的4条指令。如果是branch指令,则等待流水线解析出新的pc再写入pc queue。此时不给出ready信号,即不参与线程级指令调度。当新的指令返回后,状态机重新进入rdy状态。如果解析出halt指令,状态机则跳到idle状态,拉低ready信号,不参与线程级指令调度。
实施例1
图3为本发明实施例1所述的ME指令调度流程示意图,如图3所示,该方法包括以下步骤:
步骤301,取指模块解析新包中携带的取指pc信息,并发出取指请求;
在这里,取指模块是由四个队列加仲裁模块组成,队列缓存对应的取指请求,经过4到1的仲裁后向cache发出取指请求。考虑到实际处理报文取指请求时,四个队列分别对应于四种取指pc请求,新包取指pkt_pc、报 文顺序取指、跳转取指、cache不命中后重复取指。
在实际报文处理中,对于新包,解析报文信息,提取取指pc,发出取指请求。而对于正在处理中的报文,实际有顺序pc+1的取指,跳转类指令后的取指和cache不命中后的重复取指,为方便管理和实现取指操作,所以采用队列分类管理和缓存,其中队列的深度只需设置为ME的线程数。
通过实际性能测试,取指设置队列仲裁的优先级顺序为pc+1取指、跳转取指、cache不命中取指、新包pc取指(高优先级在前),可以保证性能最优化。
步骤302,新包经过取指模块发出取指请求后,cache向控制状态机模块返回指令,指令寄存器模块获取该指令并完成指令的加载和预解析;
在实际应用中,指令寄存器模块中的指令缓存是由寄存器组构成,每个线程可以缓存四条指令。在某个线程发出一条指令之前,指令寄存器模块对指令完成预解析操作,分析出指令的操作类型,为状态控制机提供控制信号。
步骤303,指令调度模块完成指令级调度,发出指令给pipeline;
具体的,报文的指令在指令寄存器模块完成加载后,会对指令调度模块发出ready请求(当前线程已经将指令准备好,请求向pipeline发出指令)。指令调度模块给予授权之后,报文所在的线程向pipeline发出一条指令。
在实际应用中,ME是多线程运行的,存在多个线程发出ready请求,指令调度模块需要使用调度策略保证先进入的报文始终可以得到授权的优先级高。
具体的,lru调度策略是一种动态更新base值的RR调度算法。指令调度模块使用队列中记录每个线程的线程号,将其作为RR调度算法的base值。这样先进入ME的报文的线程号将排在队列的头部,将其作为RR调度算法的base值,RR调度的结果就会始终使排在队列头部的报文得到高优先 级授权。
步骤304,控制状态机模块监视每个线程的工作状态,预先发出取指请求或者释放线程完成处理。
具体的,控制状态机模块监视每个线程的指令缓存计数,当某个线程的指令缓存中只剩下一条指令时,会提前发出每个线程的pc+1的取指请求给取指模块,以避免当前线程最后一条指令发出后,缓存中没有有效指令可被发出,出现空拍。
控制状态机模块为每个线程设置一个状态机,当某个线程执行到跳转类指令时,状态机将进入wait状态,由流程解析跳转指令后将取指请求发出给取指模块,等新的指令返回后,再重新激活这个线程的状态机进入rdy工作状态。
步骤305,当某个线程执行到Halt指令时,状态机将进入idle状态。Halt指令是报文处理完毕并发出的指令,所以执行到此指令后,控制状态机模块将使此线程的状态机进入idle休眠状态,释放此线程所有的缓存资源,等此线程重新分配给新包后将再度被使用。
实施例2
本实施例中,指令调度装置通过取指模块和控制状态机模块完成指令的预取,图4为本发明实施例2所述的ME的指令预取流程示意图,如图4所示,该方法包括以下步骤:
步骤401,新包进入ME,指令调度装置解析并提取取指pc,向取指模块发出取指请求;
步骤402,取指模块经过仲裁调度,向cache发出取指请求;
步骤403,指令返回并加载到指令寄存器模块,被调度发射到pipeline。在执行过程中,控制状态机模块会监视每个线程的指令发射情况。当指令缓存中只剩下一条指令时,将会提前对取指模块发出pc+1取指请求,指令 返回后将继续执行。
控制状态机模块为每个线程设置一个状态机,而当线程执行到跳转类指令时,控制状态机模块将使状态机进入到wait状态,将pipeline解析后的跳转地址请求发送给取指模块。指令返回后,将激活状态机进入到rdy状态,等待调度,继续发射指令到pipeline。
当线程执行到Hlat指令时,代表报文执行完毕并发出,控制状态机模块将使状态机进入到idle状态,并释放该线程的所有资源,该报文处理完毕。
图5为本发明实施例2中控制状态机状态跳转示意图,如图5所示,状态跳转具体涉及:
1)初始,线程里没包时,状态机处于idle状态;
2)当第一条指令取回时,转移到rdy状态;
3)rdy时,发出ready请求(每4个周期发出一个线程的ready请求信号)。如果数据口指令解析为halt,并且得到这个指令的授权后,转移到idle状态。此线程包处理完毕,重新回到初始状态。
如果数据口指令解析为跳转类指令,并且得到这个指令的授权后,转移到wait状态。
4)wait状态时,表示指令缓存中的指令已经发射完,而等待新的指令返回。此时不给出ready请求。cache返回新的指令后,重新转移到rdy状态,并重新发出ready请求。
实施例3
本实施例中,指令调度装置通过指令调度模块管理每个线程的指令调度。指令调度模块按照LRU算法(Least Recently Used)完成线程级指令调度,从缓存中将指令发射给pipeline,保证先进入ME的报文优先执行完指令。
图6为本发明实施例3所述的ME线程级指令调度的流程示意图,如图6所示,该方法包括以下步骤:
步骤601,提取新包的线程号,将其写入base队列;
本发明线程级指令调度策略的本质是动态更新base的RR轮询调度,这个base的值置为对用线程的值时,那么这个线程即可始终得到最高优先级的授权。base队列深度和ME线程数目相同,用来存储每个线程的线程号,每个线程号对应一个base值。
步骤602,将所有线程的base值分为4个组(group)存储,按四组完成最终的调度;
新包按0-3顺序顺次写入到四组,当前group已满,则写入下一group;授权调度时,四个group分别给出各自组内的调度结果,再按0-3的顺序从四组内给出最终的授权。
步骤603,新包对应的base值写入到某个group时,该线程对应的标记位(bitmap)位置有效,包处理完毕时,该线程的bitmap位置无效;
图7为本发明实施例3中指令调度模块的结构示意图,如图7所示,group内部的每个线程有对应的bitmap标记位,用来标记组内存储的线程是否正在被执行。新包base值写入group时,该标记位置1,表示该线程在执行中;包处理完毕时,该线程对应的bitmap标记位置为0,表示该线程已经处理完毕,不再参与指令调度。group每周期查看队列排头base值对应的bitmap标记位,当标记位为0时,就将队列中排头的base值读出,base值对应的线程不再参与指令调度;同时将下一个base值读出到排头位置,对应的线程以最高优先级参与指令调度。
步骤604,队列按先入先出顺序,将base值输出给RR调度模块,实现动态更新base的RR轮询调度;
在这里,在队列排头位置的线程即享受最高优先级的指令调度,而base 值的写入顺序是按照进包顺序写入的,因此,lru调度策略就实现了先进入ME的报文优先执行完指令的设计。
步骤605,某个线程处理完毕,读出该线程的base值,下一个base值读出到排头位置,对应的线程以最高优先级参与指令调度。这样就实现了按照LRU算法(Least Recently Used)完成线程级指令调度,保证先进入ME的报文优先执行完指令的设计。
本发明上述实施例提供了微引擎指令调度方案,ME通过指令调度装置完成多线程的指令取指及指令调度,根据所有线程执行情况,提前产生各自的取指PC,并从cache中取得相应线程的指令,加载到内部的缓存,同时管理每个线程的指令调度,按照LRU算法完成线程级指令调度,从缓存中将指令发射给pipeline,保证先进入ME的报文优先执行完指令;从硬件结构上有效避免了ME取指空拍的问题,有效地提高ME工作性能;同时保证进入ME的报文能顺序的被执行完毕,提高了网络处理器的整体性能,并且方案实现相对简单,易实现。
上述各单元可以由电子设备中的中央处理器(Central Processing Unit,CPU)、数字信号处理器(Digital Signal Processor,DSP)或可编程逻辑阵列(Field-Programmable Gate Array,FPGA)实现。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和 /或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述,仅为本实用新型的较佳实施例而已,并非用于限定本实用新型的保护范围。

Claims (9)

  1. 一种指令调度方法,该方法包括:
    预取并缓存各线程预设数量的指令;
    根据线程状态及所述缓存的预取指令进行指令调度。
  2. 根据权利要求1所述的方法,其中,所述预取指令,包括:
    向指令缓存模块发送取指请求,所述取指请求至少携带取指地址;
    接收来自指令缓存模块的响应消息,确定所述响应消息携带取指成功信息和指令内容时,获取所述指令内容以进行本地缓存;确定所述响应消息携带取指失败信息和取指地址时,将所述取指地址重新加入取指队列,根据调度重新取指。
  3. 根据权利要求2所述的方法,其中,所述取指地址包括以下一种或多种:新包取指地址、分支取指地址、重新取指地址和顺序取指地址。
  4. 根据权利要求1所述的方法,其中,该方法还包括:
    确定线程缓存的指令数量不大于预设值时,启动新一轮对所述线程的指令预取,即预取并缓存所述线程预设数量的指令。
  5. 根据权利要求1至4任一项所述的方法,其中,线程状态包括:初始状态、等待状态和准备状态;
    所述根据线程状态及所述缓存的预取指令进行指令调度,包括:
    根据近期最少使用算法LRU及线程的进包先后顺序,对线程状态为准备状态的线程进行调度。
  6. 一种指令调度装置,该装置包括:取指模块、指令寄存器模块、控制状态机模块和指令调度模块;其中,
    所述取指模块,配置为预取各线程的指令;
    所述指令寄存器模块,配置为缓存预取的各线程预设数量的指令;
    所述控制状态机模块,配置为进行线程状态控制;
    所述指令调度模块,配置为根据控制状态机模块提供的线程状态及指令寄存器模块缓存的预取指令进行指令调度。
  7. 根据权利要求6所述的装置,其中,
    所述取指模块,配置为向指令缓存模块发送取指请求,所述取指请求至少携带取指地址;接收来自指令缓存模块的响应消息,所述响应消息携带取指失败信息和取指地址,将所述取指地址重新加入取指队列,根据调度重新取指;
    所述控制状态机模块,配置为接收来自指令缓存模块的响应消息,所述响应消息携带取指成功信息和指令内容,之后,获取所述指令内容并发送至指令寄存器模块进行缓存。
  8. 根据权利要求6所述的装置,其中,
    所述控制状态机模块,配置为判断指令寄存器模块缓存的线程的指令数量是否不大于预设值;
    所述取指模块,配置为在控制状态机模块确定线程的指令数量不大于预设值时,启动新一轮对所述线程的指令预取。
  9. 根据权利要求6至8任一项所述的装置,其中,
    所述指令调度模块,配置为根据LRU对线程状态为准备状态的线程进行调度。
PCT/CN2015/090154 2014-12-26 2015-09-21 一种指令调度方法及装置 WO2016101664A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410829197.X 2014-12-26
CN201410829197.XA CN105786448B (zh) 2014-12-26 2014-12-26 一种指令调度方法及装置

Publications (1)

Publication Number Publication Date
WO2016101664A1 true WO2016101664A1 (zh) 2016-06-30

Family

ID=56149185

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/090154 WO2016101664A1 (zh) 2014-12-26 2015-09-21 一种指令调度方法及装置

Country Status (2)

Country Link
CN (1) CN105786448B (zh)
WO (1) WO2016101664A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257280A (zh) * 2017-07-14 2019-01-22 深圳市中兴微电子技术有限公司 一种微引擎及其处理报文的方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909343B (zh) * 2017-02-23 2019-01-29 北京中科睿芯科技有限公司 一种基于数据流的指令调度方法及装置
CN109101276B (zh) * 2018-08-14 2020-05-05 阿里巴巴集团控股有限公司 在cpu中执行指令的方法
CN109308194B (zh) * 2018-09-29 2021-08-10 北京字节跳动网络技术有限公司 用于存储数据的方法和装置
CN111176729A (zh) * 2018-11-13 2020-05-19 深圳市中兴微电子技术有限公司 一种信息处理方法、装置及计算机可读存储介质
CN112789593A (zh) * 2018-12-24 2021-05-11 华为技术有限公司 一种基于多线程的指令处理方法及装置
US11016771B2 (en) * 2019-05-22 2021-05-25 Chengdu Haiguang Integrated Circuit Design Co., Ltd. Processor and instruction operation method
CN114168202B (zh) * 2021-12-21 2023-01-31 海光信息技术股份有限公司 指令调度方法、指令调度装置、处理器及存储介质
CN114721727B (zh) * 2022-06-10 2022-09-13 成都登临科技有限公司 一种处理器、电子设备及多线程共享的指令预取方法
CN116414463B (zh) * 2023-04-13 2024-04-12 海光信息技术股份有限公司 指令调度方法、指令调度装置、处理器及存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1364261A (zh) * 1999-04-29 2002-08-14 英特尔公司 用于在一个多线程处理器内进行线程切换的方法和装置
CN102567117A (zh) * 2010-09-30 2012-07-11 国际商业机器公司 调度处理器中的线程的方法和系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130166882A1 (en) * 2011-12-22 2013-06-27 Jack Hilaire Choquette Methods and apparatus for scheduling instructions without instruction decode

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1364261A (zh) * 1999-04-29 2002-08-14 英特尔公司 用于在一个多线程处理器内进行线程切换的方法和装置
CN102567117A (zh) * 2010-09-30 2012-07-11 国际商业机器公司 调度处理器中的线程的方法和系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257280A (zh) * 2017-07-14 2019-01-22 深圳市中兴微电子技术有限公司 一种微引擎及其处理报文的方法

Also Published As

Publication number Publication date
CN105786448A (zh) 2016-07-20
CN105786448B (zh) 2019-02-05

Similar Documents

Publication Publication Date Title
WO2016101664A1 (zh) 一种指令调度方法及装置
US9898409B2 (en) Issue control for multithreaded processing
US6978350B2 (en) Methods and apparatus for improving throughput of cache-based embedded processors
US9934037B2 (en) Decoding a complex program instruction corresponding to multiple micro-operations
US7734897B2 (en) Allocation of memory access operations to memory access capable pipelines in a superscalar data processing apparatus and method having a plurality of execution threads
US20060242365A1 (en) Method and apparatus for suppressing duplicative prefetches for branch target cache lines
KR100936601B1 (ko) 멀티 프로세서 시스템
CN110457238B (zh) 减缓GPU访存请求及指令访问cache时停顿的方法
US9772851B2 (en) Retrieving instructions of a single branch, backwards short loop from a local loop buffer or virtual loop buffer
US20120137077A1 (en) Miss buffer for a multi-threaded processor
KR101635395B1 (ko) 멀티포트 데이터 캐시 장치 및 멀티포트 데이터 캐시 장치의 제어 방법
JP2007207248A (ja) 複数のキャッシュ・ミス後の命令リスト順序付けのための方法
US10019283B2 (en) Predicting a context portion to move between a context buffer and registers based on context portions previously used by at least one other thread
US9870315B2 (en) Memory and processor hierarchy to improve power efficiency
CN110908716B (zh) 一种向量聚合装载指令的实现方法
US9152566B2 (en) Prefetch address translation using prefetch buffer based on availability of address translation logic
US20110022802A1 (en) Controlling data accesses to hierarchical data stores to retain access order
US10740029B2 (en) Expandable buffer for memory transactions
WO2013185660A1 (zh) 一种网络处理器的指令存储装置及该装置的指令存储方法
EP3872642B1 (en) Caching device, cache, system, method and apparatus for processing data, and medium
US9158696B2 (en) Hiding instruction cache miss latency by running tag lookups ahead of the instruction accesses
KR20140131781A (ko) 메모리 제어 장치 및 방법
CN109257280B (zh) 一种微引擎及其处理报文的方法
CN105786758B (zh) 一种具有数据缓存功能的处理器装置
US8977815B2 (en) Control of entry of program instructions to a fetch stage within a processing pipepline

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15871738

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15871738

Country of ref document: EP

Kind code of ref document: A1