WO2017016255A1 - Parallel processing method and apparatus for multiple launch instructions of micro-engine, and storage medium - Google Patents

Parallel processing method and apparatus for multiple launch instructions of micro-engine, and storage medium Download PDF

Info

Publication number
WO2017016255A1
WO2017016255A1 PCT/CN2016/080579 CN2016080579W WO2017016255A1 WO 2017016255 A1 WO2017016255 A1 WO 2017016255A1 CN 2016080579 W CN2016080579 W CN 2016080579W WO 2017016255 A1 WO2017016255 A1 WO 2017016255A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
instructions
class
thread
register
Prior art date
Application number
PCT/CN2016/080579
Other languages
French (fr)
Chinese (zh)
Inventor
周峰
安康
王志忠
刘衡祁
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2017016255A1 publication Critical patent/WO2017016255A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead

Definitions

  • the present invention relates to network processor technology, and in particular, to a network processor micro-engine (ME, Micro Engine) multi-transmission instruction parallel processing method and device, and a storage medium.
  • ME network processor micro-engine
  • Micro Engine multi-transmission instruction parallel processing method and device
  • the core routers in the backbone of the Internet have undergone one technological change.
  • network processors have become an irreplaceable part of the routing and forwarding engine with its outstanding packet processing performance and programmability.
  • the ME is the core component of the network processor, and is responsible for parsing and processing the message according to the Microcode Instructions. Therefore, the processing performance of the microengine is an important parameter of the network processor, which determines the overall performance of the network processor.
  • the traditional single-embedding instruction pipeline can only process one instruction at a time, and complete one type of operation in the logic calculation/jump/data movement, which causes many other execution units to be in an idle state.
  • the kernel's resources are not fully utilized, ie the microengine performance is not maximized.
  • the existing multi-issue instruction pipeline mainly uses ultra-long instruction set technology.
  • users should try to use as many different executable units as possible in a very long instruction according to their requirements to improve instruction parallelism.
  • This kind of scheme relies mainly on the pre-compilation stage, and the user uses the parallel use of the execution unit, which increases the complexity of user programming, thereby increasing the labor cost.
  • the storage of very long instructions requires a larger instruction memory, which increases the cost of the chip.
  • an embodiment of the present invention provides a multi-transmission instruction parallel processing method and device, and a storage medium of a micro engine.
  • the instructions are parsed by the parallel decoding unit to obtain an instruction type of each instruction and an address of the source operand;
  • the determining and marking the correlation between the instructions includes:
  • the determining, according to the marking, whether to transmit the instructions in parallel comprises:
  • the multi-port kernel register is divided into two groups of registers according to threads, and each group of registers includes four register units; two source operands of one instruction are respectively in two different register units; one thread The destination operands of the two instructions are in two different register units;
  • the multi-port core register has eight data read ports and four data write ports, and supports four instruction accesses, each of which accesses two source operands and one destination operand.
  • the instruction type of the instruction is mainly classified into a logical computing class instruction, a data uploading/downloading class instruction, and a jump class instruction; each instruction type large class includes a plurality of instruction small classes; each The thread corresponds to a set of executable units, including: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit;
  • the assigning the corresponding executable unit to the instruction according to the instruction type of the instruction comprises:
  • the respective logic calculation class execution unit is allocated in the thread;
  • the instruction type is an upload/download class instruction
  • the respective data upload/download class execution units are allocated in the thread
  • the respective executable units are allocated according to the constraints.
  • a compiling unit configured to determine and mark the correlation between the instructions, and determine whether to transmit the instructions in parallel according to the marking
  • a parallel decoding unit configured to parse the instructions in parallel when the instructions are transmitted in parallel Obtain the instruction type of each instruction and the address of the source operand;
  • a read unit configured to obtain a source operand in the multi-port kernel register according to an address of a source operand of the instruction
  • An instruction allocating unit configured to allocate, according to an instruction type of the instruction, a corresponding executable unit to process the source operand
  • a write unit configured to store processing results in a multiport core register.
  • the compiling unit is further configured to determine whether the destination registers of the two instructions are in the same area; when the destination registers of the two subsequent instructions are not in the same area, determine the destination registers of the two instructions before and after Whether there is a data adventure; the current two registers of the destination register does not exist data adventure, determine whether the instruction type of the two instructions before and after is different; when the current two instructions have different instruction types, determine whether the previous instruction is a jump instruction When the current instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is placed on the latter instruction.
  • the compiling unit is further configured to: when the latter instruction is provided with an irrelevant flag, one thread simultaneously transmits two instructions before and after.
  • the multi-port kernel register is divided into two groups of registers according to threads, and each group of registers includes four register units; two source operands of one instruction are respectively in two different register units; one thread The destination operands of the two instructions are in two different register units;
  • the multi-port core register has eight data read ports and four data write ports, and supports four instruction accesses, each of which accesses two source operands and one destination operand.
  • the instruction type of the instruction is mainly classified into a logical computing class instruction, a data uploading/downloading class instruction, and a jump class instruction; each instruction type large class includes a plurality of instruction small classes; each The thread corresponds to a set of executable units, including: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit;
  • the instruction allocating unit is further configured to allocate instructions of each group to respective corresponding executable units when two instructions of one thread are inconsistent; when two instructions of one thread are in a large class and the instruction class is inconsistent
  • the thread allocates its own logical computing class execution unit; when the instruction is a generic class upload/download class instruction, the thread allocates its own data upload/download.
  • Class execution unit when one of the instructions is a jump class instruction, the respective executable unit is allocated according to the constraint.
  • the storage medium provided by the embodiment of the present invention stores a computer program for executing the multi-transmission instruction parallel processing method of the micro engine.
  • the compiling unit completes the judgment and marking of the correlation between the instructions, thereby reducing the complexity of the microcode personnel programming; determining whether to transmit the instructions in parallel according to the marking; and using parallel decoding when transmitting the instructions in parallel
  • the unit parses the instruction to obtain an instruction type of each instruction and an address of the source operand, and implements parallel parsing of the multi-transmission instruction; and then, according to the address of the source operand of the instruction, obtains in the multi-port kernel register a source operand; processing the source operand by assigning a corresponding executable unit to the instruction according to the instruction type of the instruction; storing the processing result in a multi-port kernel register.
  • the unique multi-end kernel register structure can well support multiple instruction parallel processing, and the corresponding executable unit can also perform parallel processing on the source operand, which greatly improves the performance of the microengine.
  • FIG. 1 is a schematic flowchart diagram of a multi-transmission instruction parallel processing method of a micro engine according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of parallel processing of multiple transmit instructions according to an embodiment of the present invention
  • FIG. 3 is a flow chart showing the correlation between a judgment and a mark instruction according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of a pipeline read source operand and a writeback destination register according to an embodiment of the present invention
  • FIG. 5 is a structural diagram of a multi-port kernel register according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of a pipeline parallel processing instruction according to an embodiment of the present invention.
  • FIG. 7 is a structural diagram of a parallel decoding unit and an instruction allocation unit according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a multi-transmission instruction parallel processing apparatus of a microengine according to an embodiment of the present invention.
  • a multi-transmission instruction parallel processing method and apparatus for a micro-engine completes inter-instruction correlation judgment and labeling by a compiling unit; a unique multi-port kernel register structure is designed; a parallel decoding unit and an executable unit are adopted Complete parallel processing of multiple transmit instructions.
  • FIG. 1 is a schematic flowchart of a multi-transmission instruction parallel processing method of a micro-engine according to an embodiment of the present invention. As shown in FIG. 1 , the multi-transmission instruction parallel processing method of the micro-engine includes the following steps:
  • Step 101 Determine and mark the correlation between the instructions, and determine whether to transmit the instructions in parallel according to the mark.
  • the correlation between the instructions includes:
  • Embodiments of the present invention support simultaneous scheduling of two thread executions, namely, thread A and thread B.
  • the compiling unit judges the correlation between the two instructions before and after compiling, and sets the irrelevant flag of the instruction to be valid when the last two instructions are irrelevant.
  • each thread decides whether to transmit one instruction or two instructions at the same time according to the irrelevant flag.
  • the parallelism of instructions can be maximized and execution can be performed.
  • the efficiency of the unit reduces the performance loss caused by the idle unit, thus improving the overall performance of the ME.
  • the determining and marking the correlation between the instructions includes:
  • the determining, according to the marking, whether to transmit the instructions in parallel comprises:
  • Step 102 When the instructions are transmitted in parallel, the instructions are parsed by the parallel decoding unit to obtain the instruction type of each instruction and the address of the source operand.
  • the instruction enters the pipeline decoding stage and performs instruction parsing 201.
  • the embodiment of the present invention provides four parallel decoding units.
  • the decoding unit decodes the instruction and parses out the instruction type.
  • the instruction type includes:
  • the instruction classes are divided into logical computing class instructions, data uploading/downloading class instructions, and jump class instructions.
  • Each instruction class includes multiple instruction classes, for example, logical computing class instructions include addition operations and sword operations. , or logical operations, etc., each instruction class has its own separate instruction code.
  • the types of instructions described in the embodiments of the present invention mainly refer to the instruction subclass of each instruction.
  • the parallel decoding unit also parses the address of the source operand required by the instruction in the multiport core register.
  • Step 103 Obtain a source operand in a multi-port kernel register according to an address of a source operand of the instruction.
  • the multi-port kernel register is accessed to obtain the source operand 202.
  • the multi-port kernel register of the embodiment of the present invention provides eight data read ports and four data write ports, and can simultaneously support four instruction accesses, each of which can access two source operands and one destination operand.
  • the multi-port kernel register is divided into two groups of registers according to threads, and each group of registers includes four register units; two source operands of one instruction are respectively in two different register units; one thread The destination operands of the two instructions are in two different register locations.
  • Step 104 Process the source operand by assigning a corresponding executable unit to the instruction according to the instruction type of the instruction.
  • the instruction allocation unit starts the allocation of the executable unit according to the instruction type to maximize the processing performance 203.
  • the executable unit includes: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit.
  • the three types of execution units described in the embodiments of the present invention respectively perform the execution functions of the three types of instructions.
  • Embodiments of the present invention provide two sets of logical computing class execution units, two sets of data uploading/downloading class execution units, and two sets of jump class execution units.
  • the pipeline of the embodiment of the present invention has at most four instructions executed at the same time, and the instruction allocation unit allocates the instructions to the respective executable units according to the respective instruction types, and ensures that the same type of instructions are allocated to different groups of executable units, and cannot Generating resource conflicts triggers structural adventures.
  • the instruction types in the embodiments of the present invention are classified into logical computing instructions, data uploading/downloading.
  • the respective logic calculation class execution unit is allocated in the thread;
  • the instruction type is an upload/download class instruction
  • the respective data upload/download class execution units are allocated in the thread
  • the respective executable units are allocated according to the constraints.
  • Step 105 Store the processing result in a multi-port kernel register.
  • the instructions are allocated to the respective executable units and the execution is completed.
  • the processed result after execution needs to be written back to the specified destination register, and if it is a jump type instruction, the address 204 is re-addressed from the instruction memory.
  • the kernel register of the embodiment of the invention provides four data write ports, and supports up to four instructions to complete data write back. After the operation result is written back, an instruction is processed.
  • FIG. 3 is a flowchart of determining and marking the correlation between instructions according to an embodiment of the present invention, where the process includes the following steps:
  • Step 301 Determine whether the destination registers of the two instructions before and after are in the same area.
  • the same area is mainly:
  • Multi-port kernel registers can provide 32 registers for each thread, numbered in order From 0 to register 31, each register space is 4 bytes. Register 0 to register 15 are divided into one area, and register 16 to register 31 are divided into another area.
  • step 302 If the destination registers of the two instructions are in the same area, then it is determined that the instructions are related to each other. As shown in FIG. 3, if the conditions are not met, the compiling unit discards the irrelevant flag. If the destination registers of the two instructions are not in the same area, then the decision of step 302 is continued.
  • Step 302 Determine whether there is a data risk in the destination register of the two instructions before and after.
  • the data adventure is mainly: whether the source operand register of the latter instruction is the destination register of the previous instruction.
  • step 303 If there are data risks in the two instructions before and after, then it is determined that the instructions are related before and after, as shown in Figure 3, the compiler unit discards the irrelevant flag as shown in Figure 3. If there is no data risk in the previous two instructions, then the determination in step 303 is continued.
  • Step 303 Determine whether the instruction types of the two instructions before and after are different, and do not use the same executable unit.
  • the type of instruction judged here is an instruction subclass except for the jump class instruction. If the two instruction subclasses are the same, then it is determined to be related to the instruction before and after. If the same instruction is a jump class instruction, then only the instruction class is judged. Assume that the preceding and following instructions are related. As shown in FIG. 3, the compiling unit discards the irrelevant flag if the condition is not met. If the two instruction types are different before and after, it is determined that the two instructions are irrelevant, and the unrelated flag is placed in the latter instruction.
  • Step 304 Determine whether the previous instruction is a jump instruction.
  • Step 305 If the previous instruction is a jump instruction, then it is determined that the preceding and following instructions are related, and the compiling unit discards the irrelevant flag if the condition is not met.
  • Step 306 If the previous instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is set in the latter instruction.
  • FIG. 4 is a flowchart of a pipeline read source operand and a write-back destination register according to an embodiment of the present invention, and the flow includes the following steps:
  • Step 401 According to the thread allocation, the multi-port kernel registers are divided into two groups.
  • the multi-port kernel register module of the embodiment of the present invention is divided into two groups, namely, a group of registers of thread A and thread B, and each group of registers provides four register units.
  • the four register units of the thread A include: a set of registers 0 to 15 being register unit 0, another set of registers 0 to 15 being register unit 2; and a set of registers 16 to 31 For register unit 1, another set of registers 16 through 31 is register unit 3.
  • the four register unit division rules of thread B are the same as thread A, which are register units 4, 5, 6, and 7, respectively.
  • Step 402 Within the group, the source operand is read according to constraints and instructions.
  • the two source operands of an instruction use the operands in two regions as much as possible, that is, one in register 0 to register 15, and the other in register 16 to register 31.
  • read port 0 and read port 1 are supplied to instruction 0 to complete the reading of the source operand, and so on, read port 2 and read port 3 are provided to instruction 1, and read port 4 and read port 5 are provided to the instruction. 2, read port 6 and read port 7 are provided to instruction 3, so that one instruction can access all 32 registers, and can also obtain two different operands, and can fully utilize the kernel register read port, and support up to four instructions. Access multiple port core registers simultaneously.
  • Step 403 In the group, the operation result is written back to the destination register according to the constraint and the instruction.
  • the destination registers of the two instructions of a thread also use the registers in the two regions.
  • write port 0 is supplied to instruction 0, the operand result is written back to the destination register, and so on, write port 1 is supplied to instruction 1, write port 2 is supplied to instruction 2, and write port 3 is supplied to instruction. 3, in this way, you can fully utilize the kernel register write port, which supports up to four fingers. Let the kernel registers be accessed at the same time.
  • FIG. 6 is a flowchart of a pipeline parallel processing instruction according to an embodiment of the present invention, where the process includes the following steps:
  • Step 601 Instructing parallel decoding to parse out the instruction type.
  • four decoding units analyze and decode four instructions, and parse out the respective instruction types.
  • the three types of instruction types described in the embodiments of the patent the logic calculation type instruction, the data upload/download type instruction, and the jump type instruction.
  • Step 602 Group the executable units according to the type of the instruction.
  • the embodiment of the present invention provides two sets of logical computing class execution units, two sets of data uploading/downloading class execution units, and two sets of jump class execution units, respectively, for providing a set of logical computing class executions for threads A and B respectively.
  • Unit, data upload/download class execution unit, and jump class execution unit are two sets of logical computing class execution units, respectively, for providing a set of logical computing class executions for threads A and B respectively.
  • the main classification rules here are mainly for the case where the two instruction classes of one thread are the same and the instruction classes are inconsistent. If the instructions of the two instructions are inconsistent, then only the instructions of each group need to be assigned to their respective corresponding ones. The executable unit is OK and there is no conflict.
  • the first case the instruction type is a logical calculation class instruction, the small class is inconsistent, and the respective calculation unit is allocated in the thread.
  • the second case the instruction type is an upload/download class instruction, the small class is inconsistent, and the respective data upload/download unit is allocated in the thread.
  • the third case one of the instructions is a jump class instruction, and the respective execution units are allocated according to the constraints.
  • the compiler unit constraint if the previous one is a jump instruction, then one thread only transmits one instruction, then only the jump execution unit is assigned to this instruction. If the latter one is a jump instruction, the execution units are assigned according to the type.
  • Step 603 The instruction allocating unit completes the effective allocation of the executable unit.
  • FIG. 8 is a schematic structural diagram of a multi-transmission instruction parallel processing apparatus of a micro-engine according to an embodiment of the present invention. As shown in FIG. 8, the multi-transmission instruction parallel processing apparatus of the micro-engine includes:
  • the compiling unit 81 is configured to determine and mark the correlation between the instructions, and determine, according to the flag, whether to transmit the instructions in parallel;
  • the parallel decoding unit 82 is configured to parse the instructions in parallel when the instructions are transmitted in parallel, to obtain an instruction type of each instruction and an address of the source operand;
  • a reading unit 83 configured to obtain a source operand in the multi-port kernel register according to an address of the source operand of the instruction
  • the instruction allocating unit 84 is configured to allocate, according to the instruction type of the instruction, the corresponding executable unit to process the source operand;
  • Write unit 85 is configured to store the processing results in a multi-port core register.
  • the compiling unit 81 is further configured to determine whether the destination registers of the two instructions are in the same area; when the destination registers of the two subsequent instructions are not in the same area, determine whether the destination registers of the two instructions have data risk; When the destination register of the last two instructions does not exist, the data type of the two instructions is different. When the instruction types of the two previous instructions are different, it is judged whether the previous instruction is a jump instruction; the current one is not When the instruction is jumped, it is determined that the two instructions are irrelevant, and an irrelevant flag is placed on the latter instruction.
  • the compiling unit 81 is further configured to: when the latter instruction is provided with an irrelevant flag, one thread transmits two instructions before and after in parallel.
  • the multi-port kernel registers are divided into two sets of registers according to threads, each set of registers including four register units; two source operands of one instruction are respectively in two different register units; the purpose operation of two instructions of one thread The numbers are in two different register units;
  • the multi-port core register has eight data read ports and four data write ports, and supports four instruction accesses, each of which accesses two source operands and one destination operand.
  • the instruction type of the instruction is divided into a logical calculation type instruction, a data upload/download type instruction, and a jump type instruction; each instruction type includes a plurality of instruction subclasses in a large class; each thread corresponds to a set of executables.
  • the unit includes: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit;
  • the instruction allocating unit 84 is further configured to allocate the instructions of each group to the corresponding executable units when the two instructions of one thread are inconsistent; when two instructions of one thread are of the same type and the instruction class is small When there is inconsistency, it is processed according to the following three situations: when the instruction type is a logic calculation class instruction, the thread is assigned a respective logic calculation class execution unit; when the instruction type is an upload/download class instruction, the respective data upload is assigned in the thread/ The class execution unit is downloaded; when one of the instructions is a jump class instruction, the respective executable unit is allocated according to the constraint.
  • the multi-transmission instruction parallel processing method and device of the microengine completes the judgment and labeling of the inter-instruction correlation by the compiling unit; and designs a unique kernel register structure supporting multi-port access;
  • the decoding unit and the instruction dispatch unit perform parallel processing of the multi-transmit instructions.
  • the embodiment of the present invention firstly reduces the complexity of micro-code software personnel programming by compiling the correlation and marking of instructions between the instructions; in addition, the unique multi-port access kernel register structure can well support multiple instructions for parallel processing;
  • the parallel decoding unit and the instruction allocation unit implement parallel processing of multiple transmission instructions, and the implementation of the scheme is relatively simple and extremely large. Improved the performance of the microengine.
  • Each of the above units may be implemented by a central processing unit (CPU), a digital signal processor (DSP), or a field-programmable gate array (FPGA) in an electronic device.
  • CPU central processing unit
  • DSP digital signal processor
  • FPGA field-programmable gate array
  • the apparatus for tracking the service signaling may also be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a separate product.
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions.
  • a computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read only memory (ROM), a magnetic disk, or an optical disk.
  • program codes such as a USB flash drive, a mobile hard disk, a read only memory (ROM), a magnetic disk, or an optical disk.
  • the embodiment of the present invention further provides a storage medium, wherein a computer program for executing a multi-transmission instruction parallel processing method of the micro-engine of the embodiment of the present invention is stored.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • the judgment and marking of the inter-instruction correlation can reduce the complexity of the micro-code personnel programming; whether the instruction is transmitted in parallel according to the flag; when the instruction is transmitted in parallel, the instruction is parsed to obtain the instruction of each instruction.
  • the instruction allocates a corresponding executable unit to process the source operand.
  • the unique multi-end kernel register structure can well support multiple instruction parallel processing, and the corresponding executable unit can also perform parallel processing on the source operand, which greatly improves the performance of the microengine.

Abstract

Disclosed are a parallel processing method and apparatus for multiple launch instructions of a micro-engine, and a storage medium. The method comprises: judging and marking correlation between instructions, and judging, according to marks, whether to process launch instructions in parallel; when the launch instructions are processed in parallel, parsing the instructions by means of a parallel decoding unit to obtain an instruction type of each instruction and an address of a source operand; acquiring a source operand in a multi-port kernel register according to the address of the source operand of each instruction; allocating corresponding executable units for the instructions to process the source operand according to the instruction type of each instruction; and storing a processing result in the multi-port kernel register.

Description

微引擎的多发射指令并行处理方法及装置、存储介质Micro-engine multi-transmission instruction parallel processing method and device, storage medium 技术领域Technical field
本发明涉及网络处理器技术,尤其涉及一种网络处理器微引擎(ME,Micro Engine)的多发射指令并行处理方法与装置、存储介质。The present invention relates to network processor technology, and in particular, to a network processor micro-engine (ME, Micro Engine) multi-transmission instruction parallel processing method and device, and a storage medium.
背景技术Background technique
为了满足未来网络发展的需要,提高路由器的性能,处于因特网(Internet)骨干位置的核心路由器进行了一个又一个的技术变革。尤其在高端路由器市场,网络处理器以其杰出的报文处理性能及可编程性已经成为构成路由转发引擎不可替代的部分。In order to meet the needs of future network development and improve the performance of routers, the core routers in the backbone of the Internet have undergone one technological change. Especially in the high-end router market, network processors have become an irreplaceable part of the routing and forwarding engine with its outstanding packet processing performance and programmability.
在网络处理器系统中,ME是网络处理器的核心部件,负责根据微码指令(Microcode Instructions)完成对报文的解析与处理。因此,微引擎的处理性能是网络处理器的重要参数,决定了网络处理器整体的性能表现。In the network processor system, the ME is the core component of the network processor, and is responsible for parsing and processing the message according to the Microcode Instructions. Therefore, the processing performance of the microengine is an important parameter of the network processor, which determines the overall performance of the network processor.
现有的微引擎技术中,传统的单发射指令流水线,每次只能处理一条指令,完成逻辑计算/跳转/数据搬移中的一类操作,这就造成很多其他的执行单元处于空闲状态,内核的资源没有充分利用起来,即微引擎性能没有最大化。In the existing micro-engine technology, the traditional single-embedding instruction pipeline can only process one instruction at a time, and complete one type of operation in the logic calculation/jump/data movement, which causes many other execution units to be in an idle state. The kernel's resources are not fully utilized, ie the microengine performance is not maximized.
现有的多发射指令流水线,主要采用超长指令集技术。用户在编写微码时,根据需求,尽量在一条超长指令中尽可能多将不同的可执行单元利用起来,提高指令并行性。这种方案最主要依靠在预编译阶段,由用户来完成执行单元的并行使用,易增加用户编程的复杂度,从而增加了人力成本。另外,超长指令的存储需要更大的指令存储器,增加了芯片成本。 The existing multi-issue instruction pipeline mainly uses ultra-long instruction set technology. When writing microcode, users should try to use as many different executable units as possible in a very long instruction according to their requirements to improve instruction parallelism. This kind of scheme relies mainly on the pre-compilation stage, and the user uses the parallel use of the execution unit, which increases the complexity of user programming, thereby increasing the labor cost. In addition, the storage of very long instructions requires a larger instruction memory, which increases the cost of the chip.
发明内容Summary of the invention
为解决上述技术问题,本发明实施例提供了一种微引擎的多发射指令并行处理方法及装置、存储介质。To solve the above technical problem, an embodiment of the present invention provides a multi-transmission instruction parallel processing method and device, and a storage medium of a micro engine.
本发明实施例提供的微引擎的多发射指令并行处理方法包括:The multi-transmission instruction parallel processing method of the microengine provided by the embodiment of the present invention includes:
对指令间的相关性进行判断和标记,根据所述标记判断是否并行发射指令;Determining and marking the correlation between the instructions, and determining whether to transmit the instructions in parallel according to the marking;
当并行发射指令时,利用并行解码单元对所述指令进行解析,得到各个指令的指令类型和源操作数的地址;When the instructions are transmitted in parallel, the instructions are parsed by the parallel decoding unit to obtain an instruction type of each instruction and an address of the source operand;
根据所述指令的源操作数的地址,在多端口内核寄存器中获取源操作数;Obtaining a source operand in a multi-port kernel register according to an address of a source operand of the instruction;
根据所述指令的指令类型,为所述指令分配相应的可执行单元对所述源操作数进行处理;And processing the source operand according to the instruction type of the instruction, by assigning a corresponding executable unit to the instruction;
将处理结果存储至多端口内核寄存器中。Store the processing results in a multiport core register.
本发明实施例中,所述对指令间的相关性进行判断和标记,包括:In the embodiment of the present invention, the determining and marking the correlation between the instructions includes:
判断前后两条指令的目的寄存器是否在同一个区域;Determine whether the destination registers of the two instructions before and after are in the same area;
当前后两条指令的目的寄存器不在同一个区域时,判断前后两条指令的目的寄存器是否存在数据冒险;When the destination registers of the last two instructions are not in the same area, it is judged whether there is data risk in the destination registers of the two instructions before and after;
当前后两条指令的目的寄存器不存在数据冒险时,判断前后两条指令的指令类型是否不同;When the destination register of the last two instructions does not have data risk, it is judged whether the instruction types of the two instructions before and after are different;
当前后两条指令的指令类型不同时,判断前一条指令是否为跳转指令;When the instruction types of the last two instructions are different, it is determined whether the previous instruction is a jump instruction;
当前一条指令不是跳转指令时,确定两条指令不相关,并在后一条指令上置上不相关标记。When the current instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is placed on the latter instruction.
本发明实施例中,所述根据所述标记判断是否并行发射指令,包括:In the embodiment of the present invention, the determining, according to the marking, whether to transmit the instructions in parallel comprises:
当所述后一条指令置有不相关标记时,一个线程并行发射前后两条指令。 When the latter instruction has an irrelevant flag, one thread transmits two instructions before and after in parallel.
本发明实施例中,所述多端口内核寄存器按照线程分为两组寄存器,每组寄存器包括4个寄存器单元;一条指令的两个源操作数分别在两个不同的寄存器单元中;一个线程的两条指令的目的操作数分别在两个不同的寄存器单元中;In the embodiment of the present invention, the multi-port kernel register is divided into two groups of registers according to threads, and each group of registers includes four register units; two source operands of one instruction are respectively in two different register units; one thread The destination operands of the two instructions are in two different register units;
所述多端口内核寄存器具有8个数据读口和4个数据写口,同时支持四条指令访问,每条指令访问两个源操作数和一个目的操作数。The multi-port core register has eight data read ports and four data write ports, and supports four instruction accesses, each of which accesses two source operands and one destination operand.
本发明实施例中,所述指令的指令类型大类分为逻辑计算类指令、数据上传/下载类指令、跳转类指令;每一指令类型大类中又包括多个指令小类;每个线程对应一组可执行单元,包括:逻辑计算类执行单元、数据上传/下载类执行单元、跳转类执行单元;In the embodiment of the present invention, the instruction type of the instruction is mainly classified into a logical computing class instruction, a data uploading/downloading class instruction, and a jump class instruction; each instruction type large class includes a plurality of instruction small classes; each The thread corresponds to a set of executable units, including: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit;
所述根据所述指令的指令类型,为所述指令分配相应的可执行单元,包括:The assigning the corresponding executable unit to the instruction according to the instruction type of the instruction comprises:
当一个线程的两条指令大类不一致时,将各组的指令分配到各自对应的可执行单元;When two threads of a thread are inconsistent, assign the instructions of each group to their respective executable units;
当一个线程的两条指令大类一致且指令小类不一致时,根据以下三种情况处理:When the two instructions of a thread are of the same class and the instruction classes are inconsistent, they are handled according to the following three conditions:
指令大类属逻辑计算类指令时,线程内分配各自的逻辑计算类执行单元;When the instruction type is a logic calculation class instruction, the respective logic calculation class execution unit is allocated in the thread;
指令大类属上传/下载类指令时,线程内分配各自的数据上传/下载类执行单元;When the instruction type is an upload/download class instruction, the respective data upload/download class execution units are allocated in the thread;
其中一条指令属跳转类指令时,按约束分配各自的可执行单元。When one of the instructions is a jump-type instruction, the respective executable units are allocated according to the constraints.
本发明实施例提供的微引擎的多发射指令并行处理装置包括:The multi-transmission instruction parallel processing apparatus of the microengine provided by the embodiment of the present invention includes:
编译单元,配置为对指令间的相关性进行判断和标记,根据所述标记判断是否并行发射指令;a compiling unit configured to determine and mark the correlation between the instructions, and determine whether to transmit the instructions in parallel according to the marking;
并行解码单元,配置为当并行发射指令时,并行对所述指令进行解析, 得到各个指令的指令类型和源操作数的地址;a parallel decoding unit configured to parse the instructions in parallel when the instructions are transmitted in parallel Obtain the instruction type of each instruction and the address of the source operand;
读取单元,配置为根据所述指令的源操作数的地址,在多端口内核寄存器中获取源操作数;a read unit configured to obtain a source operand in the multi-port kernel register according to an address of a source operand of the instruction;
指令分配单元,配置为根据所述指令的指令类型,为所述指令分配相应的可执行单元对所述源操作数进行处理;An instruction allocating unit configured to allocate, according to an instruction type of the instruction, a corresponding executable unit to process the source operand;
写入单元,配置为将处理结果存储至多端口内核寄存器中。A write unit configured to store processing results in a multiport core register.
本发明实施例中,所述编译单元,还配置为判断前后两条指令的目的寄存器是否在同一个区域;当前后两条指令的目的寄存器不在同一个区域时,判断前后两条指令的目的寄存器是否存在数据冒险;当前后两条指令的目的寄存器不存在数据冒险时,判断前后两条指令的指令类型是否不同;当前后两条指令的指令类型不同时,判断前一条指令是否为跳转指令;当前一条指令不是跳转指令时,确定两条指令不相关,并在后一条指令上置上不相关标记。In the embodiment of the present invention, the compiling unit is further configured to determine whether the destination registers of the two instructions are in the same area; when the destination registers of the two subsequent instructions are not in the same area, determine the destination registers of the two instructions before and after Whether there is a data adventure; the current two registers of the destination register does not exist data adventure, determine whether the instruction type of the two instructions before and after is different; when the current two instructions have different instruction types, determine whether the previous instruction is a jump instruction When the current instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is placed on the latter instruction.
本发明实施例中,所述编译单元,还配置为当所述后一条指令置有不相关标记时,一个线程并行发射前后两条指令。In the embodiment of the present invention, the compiling unit is further configured to: when the latter instruction is provided with an irrelevant flag, one thread simultaneously transmits two instructions before and after.
本发明实施例中,所述多端口内核寄存器按照线程分为两组寄存器,每组寄存器包括4个寄存器单元;一条指令的两个源操作数分别在两个不同的寄存器单元中;一个线程的两条指令的目的操作数分别在两个不同的寄存器单元中;In the embodiment of the present invention, the multi-port kernel register is divided into two groups of registers according to threads, and each group of registers includes four register units; two source operands of one instruction are respectively in two different register units; one thread The destination operands of the two instructions are in two different register units;
所述多端口内核寄存器具有8个数据读口和4个数据写口,同时支持四条指令访问,每条指令访问两个源操作数和一个目的操作数。The multi-port core register has eight data read ports and four data write ports, and supports four instruction accesses, each of which accesses two source operands and one destination operand.
本发明实施例中,所述指令的指令类型大类分为逻辑计算类指令、数据上传/下载类指令、跳转类指令;每一指令类型大类中又包括多个指令小类;每个线程对应一组可执行单元,包括:逻辑计算类执行单元、数据上传/下载类执行单元、跳转类执行单元; In the embodiment of the present invention, the instruction type of the instruction is mainly classified into a logical computing class instruction, a data uploading/downloading class instruction, and a jump class instruction; each instruction type large class includes a plurality of instruction small classes; each The thread corresponds to a set of executable units, including: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit;
所述指令分配单元,还配置为当一个线程的两条指令大类不一致时,将各组的指令分配到各自对应的可执行单元;当一个线程的两条指令大类一致且指令小类不一致时,根据以下三种情况处理:指令大类属逻辑计算类指令时,线程内分配各自的逻辑计算类执行单元;指令大类属上传/下载类指令时,线程内分配各自的数据上传/下载类执行单元;其中一条指令属跳转类指令时,按约束分配各自的可执行单元。The instruction allocating unit is further configured to allocate instructions of each group to respective corresponding executable units when two instructions of one thread are inconsistent; when two instructions of one thread are in a large class and the instruction class is inconsistent When processing a large class of logical computing class instructions, the thread allocates its own logical computing class execution unit; when the instruction is a generic class upload/download class instruction, the thread allocates its own data upload/download. Class execution unit; when one of the instructions is a jump class instruction, the respective executable unit is allocated according to the constraint.
本发明实施例提供的存储介质存储有计算机程序,该计算机程序用于执行上述微引擎的多发射指令并行处理方法。The storage medium provided by the embodiment of the present invention stores a computer program for executing the multi-transmission instruction parallel processing method of the micro engine.
本发明实施例的技术方案中,首先通过编译单元完成指令间相关性的判断和标记,能够降低微码人员编程的复杂度;根据标记判断是否并行发射指令;当并行发射指令时,利用并行解码单元对所述指令进行解析,得到各个指令的指令类型和源操作数的地址,实现了多发射指令的并行解析;然后,根据所述指令的源操作数的地址,在多端口内核寄存器中获取源操作数;根据所述指令的指令类型,为所述指令分配相应的可执行单元对所述源操作数进行处理;将处理结果存储至多端口内核寄存器中。独特的多端内核寄存器结构能够很好的支持多条指令并行处理,并且分配相应的可执行单元也可对所述源操作数进行并行处理,极大的提升了微引擎的性能。In the technical solution of the embodiment of the present invention, firstly, the compiling unit completes the judgment and marking of the correlation between the instructions, thereby reducing the complexity of the microcode personnel programming; determining whether to transmit the instructions in parallel according to the marking; and using parallel decoding when transmitting the instructions in parallel The unit parses the instruction to obtain an instruction type of each instruction and an address of the source operand, and implements parallel parsing of the multi-transmission instruction; and then, according to the address of the source operand of the instruction, obtains in the multi-port kernel register a source operand; processing the source operand by assigning a corresponding executable unit to the instruction according to the instruction type of the instruction; storing the processing result in a multi-port kernel register. The unique multi-end kernel register structure can well support multiple instruction parallel processing, and the corresponding executable unit can also perform parallel processing on the source operand, which greatly improves the performance of the microengine.
附图说明DRAWINGS
图1为本发明实施例的微引擎的多发射指令并行处理方法的流程示意图;FIG. 1 is a schematic flowchart diagram of a multi-transmission instruction parallel processing method of a micro engine according to an embodiment of the present invention; FIG.
图2为本发明实施例的多发射指令并行处理的原理图;2 is a schematic diagram of parallel processing of multiple transmit instructions according to an embodiment of the present invention;
图3为本发明实施例的判断和标记指令间相关性的流程图;3 is a flow chart showing the correlation between a judgment and a mark instruction according to an embodiment of the present invention;
图4为本发明实施例的流水线读取源操作数和回写目的寄存器的流程图;4 is a flowchart of a pipeline read source operand and a writeback destination register according to an embodiment of the present invention;
图5为本发明实施例的多端口内核寄存器的结构图; FIG. 5 is a structural diagram of a multi-port kernel register according to an embodiment of the present invention; FIG.
图6为本发明实施例的流水线并行处理指令的流程图;6 is a flowchart of a pipeline parallel processing instruction according to an embodiment of the present invention;
图7为本发明实施例的并行解码单元和指令分配单元的结构图;FIG. 7 is a structural diagram of a parallel decoding unit and an instruction allocation unit according to an embodiment of the present invention; FIG.
图8为本发明实施例的微引擎的多发射指令并行处理装置的结构组成示意图。FIG. 8 is a schematic structural diagram of a multi-transmission instruction parallel processing apparatus of a microengine according to an embodiment of the present invention.
具体实施方式detailed description
本发明实施例的一种微引擎的多发射指令并行处理方法与装置,通过编译单元完成指令间相关性的判断和标记;设计了独特的多端口内核寄存器结构;采用并行解码单元和可执行单元完成多发射指令的并行处理。为使本发明实施例的目的、技术方案和优点更加清楚明白,下文中将结合附图对本发明实施例进行详细说明。A multi-transmission instruction parallel processing method and apparatus for a micro-engine according to an embodiment of the present invention completes inter-instruction correlation judgment and labeling by a compiling unit; a unique multi-port kernel register structure is designed; a parallel decoding unit and an executable unit are adopted Complete parallel processing of multiple transmit instructions. The embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
图1为本发明实施例的微引擎的多发射指令并行处理方法的流程示意图,如图1所示,所述微引擎的多发射指令并行处理方法包括以下步骤:FIG. 1 is a schematic flowchart of a multi-transmission instruction parallel processing method of a micro-engine according to an embodiment of the present invention. As shown in FIG. 1 , the multi-transmission instruction parallel processing method of the micro-engine includes the following steps:
步骤101:对指令间的相关性进行判断和标记,根据所述标记判断是否并行发射指令。Step 101: Determine and mark the correlation between the instructions, and determine whether to transmit the instructions in parallel according to the mark.
本发明实施例中,所述指令间的相关性,包括:In the embodiment of the present invention, the correlation between the instructions includes:
一个线程前后两条指令间是否存在数据冒险、是否共用源操作数、是否共用目的操作数、是否共用同一个可执行单元的情况,只要存在其中一种情况,那么就判定前后两条指令存在相关性,否则不存在相关性。前后指令是否存在相关性,决定了这两条指令在一个线程中是否可以被同时发射,并行执行。Whether there is data risk between two instructions before and after a thread, whether to share the source operand, whether to share the destination operand, whether to share the same executable unit, as long as one of the cases exists, then it is determined that the two instructions are related. Sex, otherwise there is no correlation. Whether there is correlation between before and after instructions determines whether the two instructions can be simultaneously transmitted in one thread and executed in parallel.
本发明实施例支持同时调度两个线程执行,即线程A和线程B。Embodiments of the present invention support simultaneous scheduling of two thread executions, namely, thread A and thread B.
编译单元在编译时,判断前后两条指令的相关性,当前后两条指令不相关时,将指令的不相关标记置为有效。每个线程在调度时,根据不相关标记决定同时发射一条指令还是发射两条指令。The compiling unit judges the correlation between the two instructions before and after compiling, and sets the irrelevant flag of the instruction to be valid when the last two instructions are irrelevant. When scheduling, each thread decides whether to transmit one instruction or two instructions at the same time according to the irrelevant flag.
通过利用指令间不相关性,可以最大程度上完成指令并行,发挥执行 单元的功效,减少执行单元空闲造成的性能损失,从而提高ME整体性能。By utilizing the uncorrelation between instructions, the parallelism of instructions can be maximized and execution can be performed. The efficiency of the unit reduces the performance loss caused by the idle unit, thus improving the overall performance of the ME.
在一实施方式中,所述对指令间的相关性进行判断和标记,包括:In an embodiment, the determining and marking the correlation between the instructions includes:
判断前后两条指令的目的寄存器是否在同一个区域;Determine whether the destination registers of the two instructions before and after are in the same area;
当前后两条指令的目的寄存器不在同一个区域时,判断前后两条指令的目的寄存器是否存在数据冒险;When the destination registers of the last two instructions are not in the same area, it is judged whether there is data risk in the destination registers of the two instructions before and after;
当前后两条指令的目的寄存器不存在数据冒险时,判断前后两条指令的指令类型是否不同;When the destination register of the last two instructions does not have data risk, it is judged whether the instruction types of the two instructions before and after are different;
当前后两条指令的指令类型不同时,判断前一条指令是否为跳转指令;When the instruction types of the last two instructions are different, it is determined whether the previous instruction is a jump instruction;
当前一条指令不是跳转指令时,确定两条指令不相关,并在后一条指令上置上不相关标记。When the current instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is placed on the latter instruction.
本发明实施例中,所述根据所述标记判断是否并行发射指令,包括:In the embodiment of the present invention, the determining, according to the marking, whether to transmit the instructions in parallel comprises:
当所述后一条指令置有不相关标记时,一个线程并行发射前后两条指令。When the latter instruction has an irrelevant flag, one thread transmits two instructions before and after in parallel.
步骤102:当并行发射指令时,利用并行解码单元对所述指令进行解析,得到各个指令的指令类型和源操作数的地址。Step 102: When the instructions are transmitted in parallel, the instructions are parsed by the parallel decoding unit to obtain the instruction type of each instruction and the address of the source operand.
参照图2,指令进入流水线解码级,进行指令解析201。Referring to Figure 2, the instruction enters the pipeline decoding stage and performs instruction parsing 201.
为支持同时最多4条指令处理,本发明实施例提供了4个并行的解码单元。解码单元对指令进行解码,解析出指令类型。To support up to four instruction processing at the same time, the embodiment of the present invention provides four parallel decoding units. The decoding unit decodes the instruction and parses out the instruction type.
本发明实施例中,所述指令类型包括:In the embodiment of the present invention, the instruction type includes:
指令大类分为逻辑计算类指令、数据上传/下载类指令、跳转类指令,每一指令大类中又包括多个指令小类,例如:逻辑计算类指令中包括加法运算、剑法运算、与或逻辑运算等,每个指令小类有自己单独的指令编码。本发明实施例所述的指令类型主要是指每条指令的指令小类。The instruction classes are divided into logical computing class instructions, data uploading/downloading class instructions, and jump class instructions. Each instruction class includes multiple instruction classes, for example, logical computing class instructions include addition operations and sword operations. , or logical operations, etc., each instruction class has its own separate instruction code. The types of instructions described in the embodiments of the present invention mainly refer to the instruction subclass of each instruction.
同时,并行解码单元还解析出指令所需的源操作数在多端口内核寄存器中的地址。 At the same time, the parallel decoding unit also parses the address of the source operand required by the instruction in the multiport core register.
步骤103:根据所述指令的源操作数的地址,在多端口内核寄存器中获取源操作数。Step 103: Obtain a source operand in a multi-port kernel register according to an address of a source operand of the instruction.
如图2所示,取得源操作数在多端口内核寄存器中的地址后,访问多端口内核寄存器取得源操作数202。As shown in FIG. 2, after obtaining the address of the source operand in the multi-port kernel register, the multi-port kernel register is accessed to obtain the source operand 202.
为了支持最大四条指令同时被执行,考虑到源操作数/目的操作数的访问,需要将多端口内核寄存器做成多端口的结构。本发明实施例的多端口内核寄存器提供8个数据读口和4个数据写口,可以同时支持四条指令访问,每条指令可以访问两个源操作数和一个目的操作数。In order to support the maximum of four instructions being executed at the same time, in consideration of the access of the source operand/destination operand, it is necessary to make the multi-port kernel register into a multi-port structure. The multi-port kernel register of the embodiment of the present invention provides eight data read ports and four data write ports, and can simultaneously support four instruction accesses, each of which can access two source operands and one destination operand.
本发明实施例中,所述多端口内核寄存器按照线程分为两组寄存器,每组寄存器包括4个寄存器单元;一条指令的两个源操作数分别在两个不同的寄存器单元中;一个线程的两条指令的目的操作数分别在两个不同的寄存器单元中。In the embodiment of the present invention, the multi-port kernel register is divided into two groups of registers according to threads, and each group of registers includes four register units; two source operands of one instruction are respectively in two different register units; one thread The destination operands of the two instructions are in two different register locations.
步骤104:根据所述指令的指令类型,为所述指令分配相应的可执行单元对所述源操作数进行处理。Step 104: Process the source operand by assigning a corresponding executable unit to the instruction according to the instruction type of the instruction.
如图2所示,从多端口内核寄存器中取得源操作数后,指令分配单元开始根据指令类型进行可执行单元的分配,使处理性能最大化203。As shown in FIG. 2, after the source operand is obtained from the multi-port kernel register, the instruction allocation unit starts the allocation of the executable unit according to the instruction type to maximize the processing performance 203.
本发明实施例中,所述可执行单元,包括:逻辑计算类执行单元、数据上传/下载类执行单元、跳转类执行单元。本发明实施例所述的这三类执行单元,对应的完成所述三大类指令的执行功能。本发明实施例提供两组逻辑计算类执行单元、两组数据上传/下载类执行单元和两组跳转类执行单元。In the embodiment of the present invention, the executable unit includes: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit. The three types of execution units described in the embodiments of the present invention respectively perform the execution functions of the three types of instructions. Embodiments of the present invention provide two sets of logical computing class execution units, two sets of data uploading/downloading class execution units, and two sets of jump class execution units.
本发明实施例所述的流水线最多有4条指令同时执行,指令分配单元根据各自的指令类型将指令分配到各自的可执行单元,并且保证相同类型的指令分配到不同组的可执行单元,不能产生资源冲突引发结构冒险。The pipeline of the embodiment of the present invention has at most four instructions executed at the same time, and the instruction allocation unit allocates the instructions to the respective executable units according to the respective instruction types, and ensures that the same type of instructions are allocated to different groups of executable units, and cannot Generating resource conflicts triggers structural adventures.
本发明实施例中的指令类型大类分为逻辑计算类指令、数据上传/下载 类指令、跳转类指令;每一指令类型大类中又包括多个指令小类;每个线程对应一组可执行单元,包括:逻辑计算类执行单元、数据上传/下载类执行单元、跳转类执行单元;为此,所述根据所述指令的指令类型,为所述指令分配相应的可执行单元,包括:The instruction types in the embodiments of the present invention are classified into logical computing instructions, data uploading/downloading. Class instruction, jump class instruction; each instruction type large class includes multiple instruction subclasses; each thread corresponds to a set of executable units, including: logical computing class execution unit, data upload/download class execution unit, and jump a class of execution units; for this purpose, according to the instruction type of the instruction, assigning a corresponding executable unit to the instruction, including:
当一个线程的两条指令大类不一致时,将各组的指令分配到各自对应的可执行单元;When two threads of a thread are inconsistent, assign the instructions of each group to their respective executable units;
当一个线程的两条指令大类一致且指令小类不一致时,根据以下三种情况处理:When the two instructions of a thread are of the same class and the instruction classes are inconsistent, they are handled according to the following three conditions:
指令大类属逻辑计算类指令时,线程内分配各自的逻辑计算类执行单元;When the instruction type is a logic calculation class instruction, the respective logic calculation class execution unit is allocated in the thread;
指令大类属上传/下载类指令时,线程内分配各自的数据上传/下载类执行单元;When the instruction type is an upload/download class instruction, the respective data upload/download class execution units are allocated in the thread;
其中一条指令属跳转类指令时,按约束分配各自的可执行单元。When one of the instructions is a jump-type instruction, the respective executable units are allocated according to the constraints.
步骤105:将处理结果存储至多端口内核寄存器中。Step 105: Store the processing result in a multi-port kernel register.
如图2所示,指令分配到各自的可执行单元并完成执行,执行后的处理结果需要回写到指定的目的寄存器,如果是跳转类指令那么将从指令存储器中重新取址204。As shown in FIG. 2, the instructions are allocated to the respective executable units and the execution is completed. The processed result after execution needs to be written back to the specified destination register, and if it is a jump type instruction, the address 204 is re-addressed from the instruction memory.
本发明实施例的内核寄存器提供4个数据写口,最多支持4条指令完成数据回写。操作结果回写后,一条指令也就处理完毕。The kernel register of the embodiment of the invention provides four data write ports, and supports up to four instructions to complete data write back. After the operation result is written back, an instruction is processed.
编译单元在编译时,判断前后两条指令的相关性,相关标记决定了一个线程可以同时发射一条指令还是发射两条指令。图3为本发明实施例的判断和标记指令间相关性的流程图,该流程包括以下步骤:At compile time, the compiling unit judges the correlation between the two instructions before and after. The related flag determines whether a thread can transmit one instruction or two instructions at the same time. FIG. 3 is a flowchart of determining and marking the correlation between instructions according to an embodiment of the present invention, where the process includes the following steps:
步骤301:判断前后两条指令的目的寄存器是否在同一个区域。Step 301: Determine whether the destination registers of the two instructions before and after are in the same area.
一具体实施例中,所述同一个区域,主要是:In a specific embodiment, the same area is mainly:
多端口内核寄存器可以为每个线程提供32个寄存器,编号依次为寄存 器0到寄存器31,每个寄存器空间为4字节。将其中的寄存器0到寄存器15分为一个区域,寄存器16到寄存器31分为另外一个区域。Multi-port kernel registers can provide 32 registers for each thread, numbered in order From 0 to register 31, each register space is 4 bytes. Register 0 to register 15 are divided into one area, and register 16 to register 31 are divided into another area.
如果前后两条指令的目的寄存器在同一个区域,那么就判定为前后指令相关,如图3所示,不符合条件,编译单元放弃置不相关标记。如果前后两条指令的目的寄存器不在同一个区域,那么就继续进行步骤302的判定。If the destination registers of the two instructions are in the same area, then it is determined that the instructions are related to each other. As shown in FIG. 3, if the conditions are not met, the compiling unit discards the irrelevant flag. If the destination registers of the two instructions are not in the same area, then the decision of step 302 is continued.
步骤302:判断前后两条指令的目的寄存器是否存在数据冒险。Step 302: Determine whether there is a data risk in the destination register of the two instructions before and after.
一具体实施例中,所述数据冒险,主要是:后一条指令的源操作数寄存器是否为前一条指令的目的寄存器。In a specific embodiment, the data adventure is mainly: whether the source operand register of the latter instruction is the destination register of the previous instruction.
如果前后两条指令存在数据冒险,那么就判定为前后指令相关,如图3所示,不符合条件,编译单元放弃置不相关标记。如果前后两条指令不存在数据冒险,那么就继续进行步骤303的判定。If there are data risks in the two instructions before and after, then it is determined that the instructions are related before and after, as shown in Figure 3, the compiler unit discards the irrelevant flag as shown in Figure 3. If there is no data risk in the previous two instructions, then the determination in step 303 is continued.
步骤303:判断前后两条指令的指令类型是否不同,不使用同一个可执行单元。Step 303: Determine whether the instruction types of the two instructions before and after are different, and do not use the same executable unit.
这里判断的指令类型除跳转类指令外,都是指令小类,如果前后两条指令小类相同,那么就判定为前后指令相关,如果同属跳转类指令,那么只需判断指令大类,就判定为前后指令相关,如图3所示,不符合条件,编译单元放弃置不相关标记。如果前后两条指令类型不同,至此,就判定前后两条指令不相关,在后一条指令中置上不相关标记。The type of instruction judged here is an instruction subclass except for the jump class instruction. If the two instruction subclasses are the same, then it is determined to be related to the instruction before and after. If the same instruction is a jump class instruction, then only the instruction class is judged. Assume that the preceding and following instructions are related. As shown in FIG. 3, the compiling unit discards the irrelevant flag if the condition is not met. If the two instruction types are different before and after, it is determined that the two instructions are irrelevant, and the unrelated flag is placed in the latter instruction.
步骤304:判断前一条指令是否为跳转指令。Step 304: Determine whether the previous instruction is a jump instruction.
步骤305:如果前一条指令为跳转指令,那么就判定为前后指令相关,不符合条件,编译单元放弃置不相关标记。Step 305: If the previous instruction is a jump instruction, then it is determined that the preceding and following instructions are related, and the compiling unit discards the irrelevant flag if the condition is not met.
步骤306:如果前一条指令不为跳转指令,就判定前后两条指令不相关,在后一条指令中置上不相关标记。Step 306: If the previous instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is set in the latter instruction.
本发明实施例中,指令需要访问多端口内核寄存器取得源操作数和回 写目的寄存器,图4为本发明实施例的流水线读取源操作数和回写目的寄存器的流程图,该流程包括以下步骤:In the embodiment of the present invention, the instruction needs to access the multi-port kernel register to obtain the source operand and back Write destination register, FIG. 4 is a flowchart of a pipeline read source operand and a write-back destination register according to an embodiment of the present invention, and the flow includes the following steps:
步骤401:按线程分配,将多端口内核寄存器分为两组。Step 401: According to the thread allocation, the multi-port kernel registers are divided into two groups.
如图5所示,本发明实施例的多端口内核寄存器模块分为两组,分别为线程A和线程B一组寄存器,每组寄存器提供4个寄存器单元。As shown in FIG. 5, the multi-port kernel register module of the embodiment of the present invention is divided into two groups, namely, a group of registers of thread A and thread B, and each group of registers provides four register units.
一具体实施例中,所述线程A的4个寄存器单元,包括:一组寄存器0到寄存器15为寄存器单元0,另一组寄存器0到寄存器15为寄存器单元2;一组寄存器16到寄存器31为寄存器单元1,另一组寄存器16到寄存器31为寄存器单元3。线程B的4个寄存器单元划分规则和线程A相同,分别为寄存器单元4、5、6、7。In one embodiment, the four register units of the thread A include: a set of registers 0 to 15 being register unit 0, another set of registers 0 to 15 being register unit 2; and a set of registers 16 to 31 For register unit 1, another set of registers 16 through 31 is register unit 3. The four register unit division rules of thread B are the same as thread A, which are register units 4, 5, 6, and 7, respectively.
步骤402:组内,根据约束和指令读取源操作数。Step 402: Within the group, the source operand is read according to constraints and instructions.
按约束规定,一条指令的两个源操作数尽量分别使用两个区域中的操作数,即一个在寄存器0到寄存器15中,另一个在寄存器16到寄存器31中。According to the constraint, the two source operands of an instruction use the operands in two regions as much as possible, that is, one in register 0 to register 15, and the other in register 16 to register 31.
如图5所示,读端口0和读端口1提供给指令0完成源操作数的读取,依次类推,读端口2和读端口3提供给指令1,读端口4和读端口5提供给指令2,读端口6和读端口7提供给指令3,这样一条指令就可以访问全部的32个寄存器,也可以取得两个不同的操作数,同时可以充分的利用内核寄存器读口,最大支持四条指令同时访问多端口内核寄存器。As shown in Figure 5, read port 0 and read port 1 are supplied to instruction 0 to complete the reading of the source operand, and so on, read port 2 and read port 3 are provided to instruction 1, and read port 4 and read port 5 are provided to the instruction. 2, read port 6 and read port 7 are provided to instruction 3, so that one instruction can access all 32 registers, and can also obtain two different operands, and can fully utilize the kernel register read port, and support up to four instructions. Access multiple port core registers simultaneously.
步骤403:组内,根据约束和指令将操作结果回写到目的寄存器。Step 403: In the group, the operation result is written back to the destination register according to the constraint and the instruction.
按约束规定,一个线程的两条指令的目的寄存器也分别使用两个区域中的寄存器。According to the constraint, the destination registers of the two instructions of a thread also use the registers in the two regions.
如图5所示,写端口0提供给指令0,完成操作数结果到目的寄存器的回写,依次类推,写端口1提供给指令1,写端口2提供给指令2,写端口3提供给指令3,这样就可以充分的利用内核寄存器写口,最大支持四条指 令同时访问内核寄存器。As shown in Figure 5, write port 0 is supplied to instruction 0, the operand result is written back to the destination register, and so on, write port 1 is supplied to instruction 1, write port 2 is supplied to instruction 2, and write port 3 is supplied to instruction. 3, in this way, you can fully utilize the kernel register write port, which supports up to four fingers. Let the kernel registers be accessed at the same time.
图6为本发明实施例的流水线并行处理指令的流程图,该流程包括以下步骤:FIG. 6 is a flowchart of a pipeline parallel processing instruction according to an embodiment of the present invention, where the process includes the following steps:
步骤601:指令并行解码,解析出指令类型。Step 601: Instructing parallel decoding to parse out the instruction type.
如图7所示,4个解码单元分析对4条指令进行解码,解析出各自的指令类型。本专利所述实施例所述的三类指令类型,逻辑计算类指令、数据上传/下载类指令、跳转类指令。As shown in FIG. 7, four decoding units analyze and decode four instructions, and parse out the respective instruction types. The three types of instruction types described in the embodiments of the patent, the logic calculation type instruction, the data upload/download type instruction, and the jump type instruction.
步骤602:根据指令类型,对可执行单元进行分组。Step 602: Group the executable units according to the type of the instruction.
如图7所示,本发明实施例提供两组逻辑计算类执行单元、两组数据上传/下载类执行单元和两组跳转类执行单元,为线程A和B分别提供一组逻辑计算类执行单元、数据上传/下载类执行单元和跳转类执行单元。As shown in FIG. 7, the embodiment of the present invention provides two sets of logical computing class execution units, two sets of data uploading/downloading class execution units, and two sets of jump class execution units, respectively, for providing a set of logical computing class executions for threads A and B respectively. Unit, data upload/download class execution unit, and jump class execution unit.
这里主要分类的规则,主要是针对一个线程的两条指令大类一样而指令小类不一致的情况,如果前后两条指令的指令大类不一致,那么只需要将各组的指令分配到各自对应的可执行单元即可,不会产生冲突。The main classification rules here are mainly for the case where the two instruction classes of one thread are the same and the instruction classes are inconsistent. If the instructions of the two instructions are inconsistent, then only the instructions of each group need to be assigned to their respective corresponding ones. The executable unit is OK and there is no conflict.
两条指令大类一样而指令小类不一致时,分三种情况,具体地如下:When two instructions are of the same general class and the instruction subclasses are inconsistent, there are three cases, as follows:
第一种情况:指令大类属逻辑计算类指令,小类不一致,线程内分配各自的计算单元。The first case: the instruction type is a logical calculation class instruction, the small class is inconsistent, and the respective calculation unit is allocated in the thread.
一个线程的两条指令如果都是逻辑计算类指令,由于编译单元的约束,这两条指令小类不一致,所以只需要将一个线程根据指令类型分配到各自所需的计算单元即可。If two instructions of a thread are logical calculation class instructions, the two instruction classes are inconsistent due to the constraints of the compilation unit, so only one thread needs to be allocated to the required calculation unit according to the instruction type.
第二种情况:指令大类属上传/下载类指令,小类不一致,线程内分配各自的数据上传/下载单元。The second case: the instruction type is an upload/download class instruction, the small class is inconsistent, and the respective data upload/download unit is allocated in the thread.
一个线程的两条指令如果都是指令大类属上传/下载类指令,由于编译单元的约束,这两条指令小类不一致,所以只需要将根据指令类型分配到各自所需的计算单元即可。 If two instructions of a thread are instructions of a large class of upload/download classes, the two classes are inconsistent due to the constraints of the compilation unit, so it is only necessary to assign the instruction types to the respective calculation units. .
第三种情况:其中一条指令属跳转类指令,按约束分配各自的执行单元。The third case: one of the instructions is a jump class instruction, and the respective execution units are allocated according to the constraints.
按照编译单元约束,如果前一条为跳转指令,这时一个线程只发射一条指令,那么只给这条指令分配跳转执行单元。如果后一条为跳转指令,那么根据类型,各自分配执行单元。According to the compiler unit constraint, if the previous one is a jump instruction, then one thread only transmits one instruction, then only the jump execution unit is assigned to this instruction. If the latter one is a jump instruction, the execution units are assigned according to the type.
步骤603:指令分配单元完成可执行单元的有效分配。Step 603: The instruction allocating unit completes the effective allocation of the executable unit.
图8为本发明实施例的微引擎的多发射指令并行处理装置的结构组成示意图,如图8所示,所述微引擎的多发射指令并行处理装置包括:FIG. 8 is a schematic structural diagram of a multi-transmission instruction parallel processing apparatus of a micro-engine according to an embodiment of the present invention. As shown in FIG. 8, the multi-transmission instruction parallel processing apparatus of the micro-engine includes:
编译单元81,配置为对指令间的相关性进行判断和标记,根据所述标记判断是否并行发射指令;The compiling unit 81 is configured to determine and mark the correlation between the instructions, and determine, according to the flag, whether to transmit the instructions in parallel;
并行解码单元82,配置为当并行发射指令时,并行对所述指令进行解析,得到各个指令的指令类型和源操作数的地址;The parallel decoding unit 82 is configured to parse the instructions in parallel when the instructions are transmitted in parallel, to obtain an instruction type of each instruction and an address of the source operand;
读取单元83,配置为根据所述指令的源操作数的地址,在多端口内核寄存器中获取源操作数;a reading unit 83 configured to obtain a source operand in the multi-port kernel register according to an address of the source operand of the instruction;
指令分配单元84,配置为根据所述指令的指令类型,为所述指令分配相应的可执行单元对所述源操作数进行处理;The instruction allocating unit 84 is configured to allocate, according to the instruction type of the instruction, the corresponding executable unit to process the source operand;
写入单元85,配置为将处理结果存储至多端口内核寄存器中。Write unit 85 is configured to store the processing results in a multi-port core register.
所述编译单元81,还配置为判断前后两条指令的目的寄存器是否在同一个区域;当前后两条指令的目的寄存器不在同一个区域时,判断前后两条指令的目的寄存器是否存在数据冒险;当前后两条指令的目的寄存器不存在数据冒险时,判断前后两条指令的指令类型是否不同;当前后两条指令的指令类型不同时,判断前一条指令是否为跳转指令;当前一条指令不是跳转指令时,确定两条指令不相关,并在后一条指令上置上不相关标记。The compiling unit 81 is further configured to determine whether the destination registers of the two instructions are in the same area; when the destination registers of the two subsequent instructions are not in the same area, determine whether the destination registers of the two instructions have data risk; When the destination register of the last two instructions does not exist, the data type of the two instructions is different. When the instruction types of the two previous instructions are different, it is judged whether the previous instruction is a jump instruction; the current one is not When the instruction is jumped, it is determined that the two instructions are irrelevant, and an irrelevant flag is placed on the latter instruction.
所述编译单元81,还配置为当所述后一条指令置有不相关标记时,一个线程并行发射前后两条指令。 The compiling unit 81 is further configured to: when the latter instruction is provided with an irrelevant flag, one thread transmits two instructions before and after in parallel.
所述多端口内核寄存器按照线程分为两组寄存器,每组寄存器包括4个寄存器单元;一条指令的两个源操作数分别在两个不同的寄存器单元中;一个线程的两条指令的目的操作数分别在两个不同的寄存器单元中;The multi-port kernel registers are divided into two sets of registers according to threads, each set of registers including four register units; two source operands of one instruction are respectively in two different register units; the purpose operation of two instructions of one thread The numbers are in two different register units;
所述多端口内核寄存器具有8个数据读口和4个数据写口,同时支持四条指令访问,每条指令访问两个源操作数和一个目的操作数。The multi-port core register has eight data read ports and four data write ports, and supports four instruction accesses, each of which accesses two source operands and one destination operand.
所述指令的指令类型大类分为逻辑计算类指令、数据上传/下载类指令、跳转类指令;每一指令类型大类中又包括多个指令小类;每个线程对应一组可执行单元,包括:逻辑计算类执行单元、数据上传/下载类执行单元、跳转类执行单元;The instruction type of the instruction is divided into a logical calculation type instruction, a data upload/download type instruction, and a jump type instruction; each instruction type includes a plurality of instruction subclasses in a large class; each thread corresponds to a set of executables. The unit includes: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit;
所述指令分配单元84,还配置为当一个线程的两条指令大类不一致时,将各组的指令分配到各自对应的可执行单元;当一个线程的两条指令大类一致且指令小类不一致时,根据以下三种情况处理:指令大类属逻辑计算类指令时,线程内分配各自的逻辑计算类执行单元;指令大类属上传/下载类指令时,线程内分配各自的数据上传/下载类执行单元;其中一条指令属跳转类指令时,按约束分配各自的可执行单元。The instruction allocating unit 84 is further configured to allocate the instructions of each group to the corresponding executable units when the two instructions of one thread are inconsistent; when two instructions of one thread are of the same type and the instruction class is small When there is inconsistency, it is processed according to the following three situations: when the instruction type is a logic calculation class instruction, the thread is assigned a respective logic calculation class execution unit; when the instruction type is an upload/download class instruction, the respective data upload is assigned in the thread/ The class execution unit is downloaded; when one of the instructions is a jump class instruction, the respective executable unit is allocated according to the constraint.
本领域技术人员应当理解,图8所示的微引擎的多发射指令并行处理装置中的各单元的实现功能可参照前述微引擎的多发射指令并行处理方法的相关描述而理解。It should be understood by those skilled in the art that the implementation functions of the units in the multi-transmission instruction parallel processing apparatus of the micro-engine shown in FIG. 8 can be understood by referring to the related description of the multi-transmission instruction parallel processing method of the aforementioned micro-engine.
本发明上述实施例所述的一种微引擎的多发射指令并行处理方法与装置,通过编译单元完成指令间相关性的判断和标记;设计了独特的支持多端口访问的内核寄存器结构;采用并行解码单元和指令分配单元完成多发射指令的并行处理。本发明实施例首先通过编译单元完成指令间相关性的判断和标记能够降低微码软件人员编程的复杂度;此外,独特的多端口访问的内核寄存器结构能够很好的支持多条指令并行处理;并行解码单元和指令分配单元实现了多发射指令的并行处理,方案实现相对简单,极大的 提升了微引擎的性能。The multi-transmission instruction parallel processing method and device of the microengine according to the above embodiment of the present invention completes the judgment and labeling of the inter-instruction correlation by the compiling unit; and designs a unique kernel register structure supporting multi-port access; The decoding unit and the instruction dispatch unit perform parallel processing of the multi-transmit instructions. The embodiment of the present invention firstly reduces the complexity of micro-code software personnel programming by compiling the correlation and marking of instructions between the instructions; in addition, the unique multi-port access kernel register structure can well support multiple instructions for parallel processing; The parallel decoding unit and the instruction allocation unit implement parallel processing of multiple transmission instructions, and the implementation of the scheme is relatively simple and extremely large. Improved the performance of the microengine.
上述各单元可以由电子设备中的中央处理器(Central Processing Unit,CPU)、数字信号处理器(Digital Signal Processor,DSP)或可编程逻辑阵列(Field-Programmable Gate Array,FPGA)实现。Each of the above units may be implemented by a central processing unit (CPU), a digital signal processor (DSP), or a field-programmable gate array (FPGA) in an electronic device.
本发明实施例上述业务信令跟踪的装置如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本发明实施例不限制于任何特定的硬件和软件结合。The apparatus for tracking the service signaling according to the embodiment of the present invention may also be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a separate product. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions. A computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present invention. The foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read only memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
相应地,本发明实施例还提供一种存储介质,其中存储有计算机程序,该计算机程序用于执行本发明实施例的微引擎的多发射指令并行处理方法。Correspondingly, the embodiment of the present invention further provides a storage medium, wherein a computer program for executing a multi-transmission instruction parallel processing method of the micro-engine of the embodiment of the present invention is stored.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、 嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a general purpose computer, a special purpose computer, An processor of an embedded processor or other programmable data processing device to generate a machine such that instructions executed by a processor of a computer or other programmable data processing device are generated for implementation in a flow or a flow of flowcharts and/or Or a block diagram of a device in a box or a function specified in a plurality of boxes.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。The above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention.
工业实用性Industrial applicability
本发明实施例中,指令间相关性的判断和标记能够降低微码人员编程的复杂度;根据标记判断是否并行发射指令;当并行发射指令时,对所述指令进行解析,得到各个指令的指令类型和源操作数的地址,实现了多发射指令的并行解析;根据所述指令的源操作数的地址,在多端口内核寄存器中获取源操作数;根据所述指令的指令类型,为所述指令分配相应的可执行单元对所述源操作数进行处理。独特的多端内核寄存器结构能够很好的支持多条指令并行处理,并且分配相应的可执行单元也可对所述源操作数进行并行处理,极大的提升了微引擎的性能。 In the embodiment of the present invention, the judgment and marking of the inter-instruction correlation can reduce the complexity of the micro-code personnel programming; whether the instruction is transmitted in parallel according to the flag; when the instruction is transmitted in parallel, the instruction is parsed to obtain the instruction of each instruction. The address of the type and the source operand, implementing parallel parsing of the multi-transmission instruction; obtaining the source operand in the multi-port kernel register according to the address of the source operand of the instruction; according to the instruction type of the instruction, The instruction allocates a corresponding executable unit to process the source operand. The unique multi-end kernel register structure can well support multiple instruction parallel processing, and the corresponding executable unit can also perform parallel processing on the source operand, which greatly improves the performance of the microengine.

Claims (11)

  1. 一种微引擎的多发射指令并行处理方法,所述方法包括:A multi-transmission instruction parallel processing method for a microengine, the method comprising:
    对指令间的相关性进行判断和标记,根据所述标记判断是否并行发射指令;Determining and marking the correlation between the instructions, and determining whether to transmit the instructions in parallel according to the marking;
    当并行发射指令时,利用并行解码单元对所述指令进行解析,得到各个指令的指令类型和源操作数的地址;When the instructions are transmitted in parallel, the instructions are parsed by the parallel decoding unit to obtain an instruction type of each instruction and an address of the source operand;
    根据所述指令的源操作数的地址,在多端口内核寄存器中获取源操作数;Obtaining a source operand in a multi-port kernel register according to an address of a source operand of the instruction;
    根据所述指令的指令类型,为所述指令分配相应的可执行单元对所述源操作数进行处理;And processing the source operand according to the instruction type of the instruction, by assigning a corresponding executable unit to the instruction;
    将处理结果存储至多端口内核寄存器中。Store the processing results in a multiport core register.
  2. 根据权利要求1所述的微引擎的多发射指令并行处理方法,其中,所述对指令间的相关性进行判断和标记,包括:The multi-transmission instruction parallel processing method of the microengine according to claim 1, wherein the determining and marking the correlation between the instructions comprises:
    判断前后两条指令的目的寄存器是否在同一个区域;Determine whether the destination registers of the two instructions before and after are in the same area;
    当前后两条指令的目的寄存器不在同一个区域时,判断前后两条指令的目的寄存器是否存在数据冒险;When the destination registers of the last two instructions are not in the same area, it is judged whether there is data risk in the destination registers of the two instructions before and after;
    当前后两条指令的目的寄存器不存在数据冒险时,判断前后两条指令的指令类型是否不同;When the destination register of the last two instructions does not have data risk, it is judged whether the instruction types of the two instructions before and after are different;
    当前后两条指令的指令类型不同时,判断前一条指令是否为跳转指令;When the instruction types of the last two instructions are different, it is determined whether the previous instruction is a jump instruction;
    当前一条指令不是跳转指令时,确定两条指令不相关,并在后一条指令上置上不相关标记。When the current instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is placed on the latter instruction.
  3. 根据权利要求2所述的微引擎的多发射指令并行处理方法,其中,所述根据所述标记判断是否并行发射指令,包括:The multi-transmission instruction parallel processing method of the microengine according to claim 2, wherein the determining whether to transmit the instructions in parallel according to the flag comprises:
    当所述后一条指令置有不相关标记时,一个线程并行发射前后两条指令。 When the latter instruction has an irrelevant flag, one thread transmits two instructions before and after in parallel.
  4. 根据权利要求1所述的微引擎的多发射指令并行处理方法,其中,The multi-transmission instruction parallel processing method of a microengine according to claim 1, wherein
    所述多端口内核寄存器按照线程分为两组寄存器,每组寄存器包括4个寄存器单元;一条指令的两个源操作数分别在两个不同的寄存器单元中;一个线程的两条指令的目的操作数分别在两个不同的寄存器单元中;The multi-port kernel registers are divided into two sets of registers according to threads, each set of registers including four register units; two source operands of one instruction are respectively in two different register units; the purpose operation of two instructions of one thread The numbers are in two different register units;
    所述多端口内核寄存器具有8个数据读口和4个数据写口,同时支持四条指令访问,每条指令访问两个源操作数和一个目的操作数。The multi-port core register has eight data read ports and four data write ports, and supports four instruction accesses, each of which accesses two source operands and one destination operand.
  5. 根据权利要求1至4任一项所述的微引擎的多发射指令并行处理方法,其中,所述指令的指令类型大类分为逻辑计算类指令、数据上传/下载类指令、跳转类指令;每一指令类型大类中又包括多个指令小类;每个线程对应一组可执行单元,包括:逻辑计算类执行单元、数据上传/下载类执行单元、跳转类执行单元;The multi-transmission instruction parallel processing method for a microengine according to any one of claims 1 to 4, wherein the instruction type of the instruction is largely classified into a logical calculation type instruction, a data upload/download type instruction, and a jump type instruction. Each instruction type large class includes a plurality of instruction subclasses; each thread corresponds to a set of executable units, including: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit;
    所述根据所述指令的指令类型,为所述指令分配相应的可执行单元,包括:The assigning the corresponding executable unit to the instruction according to the instruction type of the instruction comprises:
    当一个线程的两条指令大类不一致时,将各组的指令分配到各自对应的可执行单元;When two threads of a thread are inconsistent, assign the instructions of each group to their respective executable units;
    当一个线程的两条指令大类一致且指令小类不一致时,根据以下三种情况处理:When the two instructions of a thread are of the same class and the instruction classes are inconsistent, they are handled according to the following three conditions:
    指令大类属逻辑计算类指令时,线程内分配各自的逻辑计算类执行单元;When the instruction type is a logic calculation class instruction, the respective logic calculation class execution unit is allocated in the thread;
    指令大类属上传/下载类指令时,线程内分配各自的数据上传/下载类执行单元;When the instruction type is an upload/download class instruction, the respective data upload/download class execution units are allocated in the thread;
    其中一条指令属跳转类指令时,按约束分配各自的可执行单元。When one of the instructions is a jump-type instruction, the respective executable units are allocated according to the constraints.
  6. 一种微引擎的多发射指令并行处理装置,所述装置包括:A multi-engineering instruction parallel processing device for a microengine, the device comprising:
    编译单元,配置为对指令间的相关性进行判断和标记,根据所述标记判断是否并行发射指令; a compiling unit configured to determine and mark the correlation between the instructions, and determine whether to transmit the instructions in parallel according to the marking;
    并行解码单元,配置为当并行发射指令时,并行对所述指令进行解析,得到各个指令的指令类型和源操作数的地址;a parallel decoding unit configured to parse the instructions in parallel when the instructions are transmitted in parallel, to obtain an instruction type of each instruction and an address of the source operand;
    读取单元,配置为根据所述指令的源操作数的地址,在多端口内核寄存器中获取源操作数;a read unit configured to obtain a source operand in the multi-port kernel register according to an address of a source operand of the instruction;
    指令分配单元,配置为根据所述指令的指令类型,为所述指令分配相应的可执行单元对所述源操作数进行处理;An instruction allocating unit configured to allocate, according to an instruction type of the instruction, a corresponding executable unit to process the source operand;
    写入单元,配置为将处理结果存储至多端口内核寄存器中。A write unit configured to store processing results in a multiport core register.
  7. 根据权利要求6所述的微引擎的多发射指令并行处理装置,其中,所述编译单元,还配置为判断前后两条指令的目的寄存器是否在同一个区域;当前后两条指令的目的寄存器不在同一个区域时,判断前后两条指令的目的寄存器是否存在数据冒险;当前后两条指令的目的寄存器不存在数据冒险时,判断前后两条指令的指令类型是否不同;当前后两条指令的指令类型不同时,判断前一条指令是否为跳转指令;当前一条指令不是跳转指令时,确定两条指令不相关,并在后一条指令上置上不相关标记。The multi-engine instruction parallel processing device of the microengine according to claim 6, wherein the compiling unit is further configured to determine whether the destination registers of the two instructions are in the same area; the destination registers of the current two instructions are not In the same area, it is judged whether there is data risk in the destination register of the two instructions before and after; when the destination register of the last two instructions does not have data adventure, it is judged whether the instruction types of the two instructions before and after are different; the instructions of the current two instructions When the type is different, it is judged whether the previous instruction is a jump instruction; when the current one instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is set on the latter instruction.
  8. 根据权利要求7所述的微引擎的多发射指令并行处理装置,其中,所述编译单元,还配置为当所述后一条指令置有不相关标记时,一个线程并行发射前后两条指令。The multi-engine instruction parallel processing apparatus of the microengine according to claim 7, wherein the compiling unit is further configured to: when the subsequent instruction is placed with an irrelevant flag, one thread transmits two instructions before and after in parallel.
  9. 根据权利要求6所述的微引擎的多发射指令并行处理装置,其中,A multi-transmission instruction parallel processing apparatus for a microengine according to claim 6, wherein
    所述多端口内核寄存器按照线程分为两组寄存器,每组寄存器包括4个寄存器单元;一条指令的两个源操作数分别在两个不同的寄存器单元中;一个线程的两条指令的目的操作数分别在两个不同的寄存器单元中;The multi-port kernel registers are divided into two sets of registers according to threads, each set of registers including four register units; two source operands of one instruction are respectively in two different register units; the purpose operation of two instructions of one thread The numbers are in two different register units;
    所述多端口内核寄存器具有8个数据读口和4个数据写口,同时支持四条指令访问,每条指令访问两个源操作数和一个目的操作数。The multi-port core register has eight data read ports and four data write ports, and supports four instruction accesses, each of which accesses two source operands and one destination operand.
  10. 根据权利要求6至9任一项所述的微引擎的多发射指令并行处理装置,其中,所述指令的指令类型大类分为逻辑计算类指令、数据上传/下 载类指令、跳转类指令;每一指令类型大类中又包括多个指令小类;每个线程对应一组可执行单元,包括:逻辑计算类执行单元、数据上传/下载类执行单元、跳转类执行单元;The multi-transmission instruction parallel processing apparatus of the microengine according to any one of claims 6 to 9, wherein the instruction type of the instruction is largely classified into a logical calculation type instruction, a data upload/down Load class instructions, jump class instructions; each instruction type large class includes a plurality of instruction subclasses; each thread corresponds to a set of executable units, including: a logical computing class execution unit, a data upload/download class execution unit, Jump class execution unit;
    所述指令分配单元,还配置为当一个线程的两条指令大类不一致时,将各组的指令分配到各自对应的可执行单元;当一个线程的两条指令大类一致且指令小类不一致时,根据以下三种情况处理:指令大类属逻辑计算类指令时,线程内分配各自的逻辑计算类执行单元;指令大类属上传/下载类指令时,线程内分配各自的数据上传/下载类执行单元;其中一条指令属跳转类指令时,按约束分配各自的可执行单元。The instruction allocating unit is further configured to allocate instructions of each group to respective corresponding executable units when two instructions of one thread are inconsistent; when two instructions of one thread are in a large class and the instruction class is inconsistent When processing a large class of logical computing class instructions, the thread allocates its own logical computing class execution unit; when the instruction is a generic class upload/download class instruction, the thread allocates its own data upload/download. Class execution unit; when one of the instructions is a jump class instruction, the respective executable unit is allocated according to the constraint.
  11. 一种存储介质,所述存储介质中存储有计算机可执行指令,该计算机可执行指令配置为执行权利要求1-6任一项所述的微引擎的多发射指令并行处理方法。 A storage medium having stored therein computer executable instructions configured to perform the multi-transmission instruction parallel processing method of the microengine of any of claims 1-6.
PCT/CN2016/080579 2015-07-29 2016-04-28 Parallel processing method and apparatus for multiple launch instructions of micro-engine, and storage medium WO2017016255A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510456059.6A CN106406820B (en) 2015-07-29 2015-07-29 A kind of multi-emitting parallel instructions processing method and processing device of network processor micro-engine
CN201510456059.6 2015-07-29

Publications (1)

Publication Number Publication Date
WO2017016255A1 true WO2017016255A1 (en) 2017-02-02

Family

ID=57884049

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/080579 WO2017016255A1 (en) 2015-07-29 2016-04-28 Parallel processing method and apparatus for multiple launch instructions of micro-engine, and storage medium

Country Status (2)

Country Link
CN (1) CN106406820B (en)
WO (1) WO2017016255A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113703841A (en) * 2021-09-10 2021-11-26 中国人民解放军国防科技大学 Optimization method, device and medium for reading register data
CN117093270A (en) * 2023-08-18 2023-11-21 摩尔线程智能科技(北京)有限责任公司 Instruction sending method, device, equipment and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257280B (en) * 2017-07-14 2022-05-27 深圳市中兴微电子技术有限公司 Micro-engine and message processing method thereof
CN111240682A (en) * 2018-11-28 2020-06-05 深圳市中兴微电子技术有限公司 Instruction data processing method and device, equipment and storage medium
CN115657090B (en) * 2022-10-24 2023-04-28 上海时空奇点智能技术有限公司 Low-delay analysis processing method for interface data of GNSS Beidou positioning module

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999026132A2 (en) * 1997-11-17 1999-05-27 Advanced Micro Devices, Inc. Processor configured to generate lookahead results from collapsed moves, compares and simple arithmetic instructions
CN101706715A (en) * 2009-12-04 2010-05-12 北京龙芯中科技术服务中心有限公司 Device and method for scheduling instruction
CN101957743A (en) * 2010-10-12 2011-01-26 中国电子科技集团公司第三十八研究所 Parallel digital signal processor
CN102945148A (en) * 2012-09-26 2013-02-27 中国航天科技集团公司第九研究院第七七一研究所 Method for realizing parallel instruction set
CN103218207A (en) * 2012-01-18 2013-07-24 上海算芯微电子有限公司 Microprocessor instruction processing method and system based on single/dual transmitting instruction set

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999026132A2 (en) * 1997-11-17 1999-05-27 Advanced Micro Devices, Inc. Processor configured to generate lookahead results from collapsed moves, compares and simple arithmetic instructions
CN101706715A (en) * 2009-12-04 2010-05-12 北京龙芯中科技术服务中心有限公司 Device and method for scheduling instruction
CN101957743A (en) * 2010-10-12 2011-01-26 中国电子科技集团公司第三十八研究所 Parallel digital signal processor
CN103218207A (en) * 2012-01-18 2013-07-24 上海算芯微电子有限公司 Microprocessor instruction processing method and system based on single/dual transmitting instruction set
CN102945148A (en) * 2012-09-26 2013-02-27 中国航天科技集团公司第九研究院第七七一研究所 Method for realizing parallel instruction set

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113703841A (en) * 2021-09-10 2021-11-26 中国人民解放军国防科技大学 Optimization method, device and medium for reading register data
CN113703841B (en) * 2021-09-10 2023-09-26 中国人民解放军国防科技大学 Optimization method, device and medium for register data reading
CN117093270A (en) * 2023-08-18 2023-11-21 摩尔线程智能科技(北京)有限责任公司 Instruction sending method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN106406820B (en) 2019-01-15
CN106406820A (en) 2017-02-15

Similar Documents

Publication Publication Date Title
WO2017016255A1 (en) Parallel processing method and apparatus for multiple launch instructions of micro-engine, and storage medium
US9672035B2 (en) Data processing apparatus and method for performing vector processing
US8332854B2 (en) Virtualized thread scheduling for hardware thread optimization based on hardware resource parameter summaries of instruction blocks in execution groups
CN101799760B (en) System and method of generating parallel simd code for an arbitrary target architecture
US9195443B2 (en) Providing performance tuned versions of compiled code to a CPU in a system of heterogeneous cores
US20170235578A1 (en) Method and Apparatus for Scheduling of Instructions in a Multi-Strand Out-Of-Order Processor
RU2614583C2 (en) Determination of path profile by using combination of hardware and software tools
US7725573B2 (en) Methods and apparatus for supporting agile run-time network systems via identification and execution of most efficient application code in view of changing network traffic conditions
US20130117543A1 (en) Low overhead operation latency aware scheduler
EP3111333B1 (en) Thread and data assignment in multi-core processors
US20140181476A1 (en) Scheduler Implementing Dependency Matrix Having Restricted Entries
US20100037035A1 (en) Generating An Executable Version Of An Application Using A Distributed Compiler Operating On A Plurality Of Compute Nodes
US20200336421A1 (en) Optimized function assignment in a multi-core processor
US20090037889A1 (en) Speculative code motion for memory latency hiding
US20080155197A1 (en) Locality optimization in multiprocessor systems
US10430191B2 (en) Methods and apparatus to compile instructions for a vector of instruction pointers processor architecture to enable speculative execution and avoid data corruption
US11507386B2 (en) Booting tiles of processing units
US9715392B2 (en) Multiple clustered very long instruction word processing core
US20190087316A1 (en) Dynamically allocated thread-local storage
TWI639951B (en) Central processing unit based on simultaneous multiple threads (SMT) and device for detecting data correlation of instructions
Su et al. An efficient GPU implementation of inclusion-based pointer analysis
CN107506623B (en) Application program reinforcing method and device, computing equipment and computer storage medium
US11513841B2 (en) Method and system for scheduling tasks in a computing system
Brouer et al. A practical introduction to XDP
US11442794B1 (en) Event assignment for synchronization of concurrent execution engines

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16829622

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16829622

Country of ref document: EP

Kind code of ref document: A1