WO2017016255A1

WO2017016255A1 - Parallel processing method and apparatus for multiple launch instructions of micro-engine, and storage medium

Info

Publication number: WO2017016255A1
Application number: PCT/CN2016/080579
Authority: WO
Inventors: 周峰; 安康; 王志忠; 刘衡祁
Original assignee: 深圳市中兴微电子技术有限公司
Priority date: 2015-07-29
Filing date: 2016-04-28
Publication date: 2017-02-02
Also published as: CN106406820B; CN106406820A

Abstract

Disclosed are a parallel processing method and apparatus for multiple launch instructions of a micro-engine, and a storage medium. The method comprises: judging and marking correlation between instructions, and judging, according to marks, whether to process launch instructions in parallel; when the launch instructions are processed in parallel, parsing the instructions by means of a parallel decoding unit to obtain an instruction type of each instruction and an address of a source operand; acquiring a source operand in a multi-port kernel register according to the address of the source operand of each instruction; allocating corresponding executable units for the instructions to process the source operand according to the instruction type of each instruction; and storing a processing result in the multi-port kernel register.

Description

Micro-engine multi-transmission instruction parallel processing method and device, storage medium

Technical field

The present invention relates to network processor technology, and in particular, to a network processor micro-engine (ME, Micro Engine) multi-transmission instruction parallel processing method and device, and a storage medium.

Background technique

In order to meet the needs of future network development and improve the performance of routers, the core routers in the backbone of the Internet have undergone one technological change. Especially in the high-end router market, network processors have become an irreplaceable part of the routing and forwarding engine with its outstanding packet processing performance and programmability.

In the network processor system, the ME is the core component of the network processor, and is responsible for parsing and processing the message according to the Microcode Instructions. Therefore, the processing performance of the microengine is an important parameter of the network processor, which determines the overall performance of the network processor.

In the existing micro-engine technology, the traditional single-embedding instruction pipeline can only process one instruction at a time, and complete one type of operation in the logic calculation/jump/data movement, which causes many other execution units to be in an idle state. The kernel's resources are not fully utilized, ie the microengine performance is not maximized.

The existing multi-issue instruction pipeline mainly uses ultra-long instruction set technology. When writing microcode, users should try to use as many different executable units as possible in a very long instruction according to their requirements to improve instruction parallelism. This kind of scheme relies mainly on the pre-compilation stage, and the user uses the parallel use of the execution unit, which increases the complexity of user programming, thereby increasing the labor cost. In addition, the storage of very long instructions requires a larger instruction memory, which increases the cost of the chip.

Summary of the invention

To solve the above technical problem, an embodiment of the present invention provides a multi-transmission instruction parallel processing method and device, and a storage medium of a micro engine.

The multi-transmission instruction parallel processing method of the microengine provided by the embodiment of the present invention includes:

Determining and marking the correlation between the instructions, and determining whether to transmit the instructions in parallel according to the marking;

When the instructions are transmitted in parallel, the instructions are parsed by the parallel decoding unit to obtain an instruction type of each instruction and an address of the source operand;

Obtaining a source operand in a multi-port kernel register according to an address of a source operand of the instruction;

And processing the source operand according to the instruction type of the instruction, by assigning a corresponding executable unit to the instruction;

Store the processing results in a multiport core register.

In the embodiment of the present invention, the determining and marking the correlation between the instructions includes:

Determine whether the destination registers of the two instructions before and after are in the same area;

When the destination registers of the last two instructions are not in the same area, it is judged whether there is data risk in the destination registers of the two instructions before and after;

When the destination register of the last two instructions does not have data risk, it is judged whether the instruction types of the two instructions before and after are different;

When the instruction types of the last two instructions are different, it is determined whether the previous instruction is a jump instruction;

When the current instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is placed on the latter instruction.

In the embodiment of the present invention, the determining, according to the marking, whether to transmit the instructions in parallel comprises:

When the latter instruction has an irrelevant flag, one thread transmits two instructions before and after in parallel.

In the embodiment of the present invention, the multi-port kernel register is divided into two groups of registers according to threads, and each group of registers includes four register units; two source operands of one instruction are respectively in two different register units; one thread The destination operands of the two instructions are in two different register units;

The multi-port core register has eight data read ports and four data write ports, and supports four instruction accesses, each of which accesses two source operands and one destination operand.

In the embodiment of the present invention, the instruction type of the instruction is mainly classified into a logical computing class instruction, a data uploading/downloading class instruction, and a jump class instruction; each instruction type large class includes a plurality of instruction small classes; each The thread corresponds to a set of executable units, including: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit;

The assigning the corresponding executable unit to the instruction according to the instruction type of the instruction comprises:

When two threads of a thread are inconsistent, assign the instructions of each group to their respective executable units;

When the two instructions of a thread are of the same class and the instruction classes are inconsistent, they are handled according to the following three conditions:

When the instruction type is a logic calculation class instruction, the respective logic calculation class execution unit is allocated in the thread;

When the instruction type is an upload/download class instruction, the respective data upload/download class execution units are allocated in the thread;

When one of the instructions is a jump-type instruction, the respective executable units are allocated according to the constraints.

The multi-transmission instruction parallel processing apparatus of the microengine provided by the embodiment of the present invention includes:

a compiling unit configured to determine and mark the correlation between the instructions, and determine whether to transmit the instructions in parallel according to the marking;

a parallel decoding unit configured to parse the instructions in parallel when the instructions are transmitted in parallel Obtain the instruction type of each instruction and the address of the source operand;

a read unit configured to obtain a source operand in the multi-port kernel register according to an address of a source operand of the instruction;

An instruction allocating unit configured to allocate, according to an instruction type of the instruction, a corresponding executable unit to process the source operand;

A write unit configured to store processing results in a multiport core register.

In the embodiment of the present invention, the compiling unit is further configured to determine whether the destination registers of the two instructions are in the same area; when the destination registers of the two subsequent instructions are not in the same area, determine the destination registers of the two instructions before and after Whether there is a data adventure; the current two registers of the destination register does not exist data adventure, determine whether the instruction type of the two instructions before and after is different; when the current two instructions have different instruction types, determine whether the previous instruction is a jump instruction When the current instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is placed on the latter instruction.

In the embodiment of the present invention, the compiling unit is further configured to: when the latter instruction is provided with an irrelevant flag, one thread simultaneously transmits two instructions before and after.

The instruction allocating unit is further configured to allocate instructions of each group to respective corresponding executable units when two instructions of one thread are inconsistent; when two instructions of one thread are in a large class and the instruction class is inconsistent When processing a large class of logical computing class instructions, the thread allocates its own logical computing class execution unit; when the instruction is a generic class upload/download class instruction, the thread allocates its own data upload/download. Class execution unit; when one of the instructions is a jump class instruction, the respective executable unit is allocated according to the constraint.

The storage medium provided by the embodiment of the present invention stores a computer program for executing the multi-transmission instruction parallel processing method of the micro engine.

In the technical solution of the embodiment of the present invention, firstly, the compiling unit completes the judgment and marking of the correlation between the instructions, thereby reducing the complexity of the microcode personnel programming; determining whether to transmit the instructions in parallel according to the marking; and using parallel decoding when transmitting the instructions in parallel The unit parses the instruction to obtain an instruction type of each instruction and an address of the source operand, and implements parallel parsing of the multi-transmission instruction; and then, according to the address of the source operand of the instruction, obtains in the multi-port kernel register a source operand; processing the source operand by assigning a corresponding executable unit to the instruction according to the instruction type of the instruction; storing the processing result in a multi-port kernel register. The unique multi-end kernel register structure can well support multiple instruction parallel processing, and the corresponding executable unit can also perform parallel processing on the source operand, which greatly improves the performance of the microengine.

DRAWINGS

FIG. 1 is a schematic flowchart diagram of a multi-transmission instruction parallel processing method of a micro engine according to an embodiment of the present invention; FIG.

2 is a schematic diagram of parallel processing of multiple transmit instructions according to an embodiment of the present invention;

3 is a flow chart showing the correlation between a judgment and a mark instruction according to an embodiment of the present invention;

4 is a flowchart of a pipeline read source operand and a writeback destination register according to an embodiment of the present invention;

FIG. 5 is a structural diagram of a multi-port kernel register according to an embodiment of the present invention; FIG.

6 is a flowchart of a pipeline parallel processing instruction according to an embodiment of the present invention;

FIG. 7 is a structural diagram of a parallel decoding unit and an instruction allocation unit according to an embodiment of the present invention; FIG.

FIG. 8 is a schematic structural diagram of a multi-transmission instruction parallel processing apparatus of a microengine according to an embodiment of the present invention.

detailed description

A multi-transmission instruction parallel processing method and apparatus for a micro-engine according to an embodiment of the present invention completes inter-instruction correlation judgment and labeling by a compiling unit; a unique multi-port kernel register structure is designed; a parallel decoding unit and an executable unit are adopted Complete parallel processing of multiple transmit instructions. The embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

FIG. 1 is a schematic flowchart of a multi-transmission instruction parallel processing method of a micro-engine according to an embodiment of the present invention. As shown in FIG. 1 , the multi-transmission instruction parallel processing method of the micro-engine includes the following steps:

Step 101: Determine and mark the correlation between the instructions, and determine whether to transmit the instructions in parallel according to the mark.

In the embodiment of the present invention, the correlation between the instructions includes:

Whether there is data risk between two instructions before and after a thread, whether to share the source operand, whether to share the destination operand, whether to share the same executable unit, as long as one of the cases exists, then it is determined that the two instructions are related. Sex, otherwise there is no correlation. Whether there is correlation between before and after instructions determines whether the two instructions can be simultaneously transmitted in one thread and executed in parallel.

Embodiments of the present invention support simultaneous scheduling of two thread executions, namely, thread A and thread B.

The compiling unit judges the correlation between the two instructions before and after compiling, and sets the irrelevant flag of the instruction to be valid when the last two instructions are irrelevant. When scheduling, each thread decides whether to transmit one instruction or two instructions at the same time according to the irrelevant flag.

By utilizing the uncorrelation between instructions, the parallelism of instructions can be maximized and execution can be performed. The efficiency of the unit reduces the performance loss caused by the idle unit, thus improving the overall performance of the ME.

In an embodiment, the determining and marking the correlation between the instructions includes:

Step 102: When the instructions are transmitted in parallel, the instructions are parsed by the parallel decoding unit to obtain the instruction type of each instruction and the address of the source operand.

Referring to Figure 2, the instruction enters the pipeline decoding stage and performs instruction parsing 201.

To support up to four instruction processing at the same time, the embodiment of the present invention provides four parallel decoding units. The decoding unit decodes the instruction and parses out the instruction type.

In the embodiment of the present invention, the instruction type includes:

The instruction classes are divided into logical computing class instructions, data uploading/downloading class instructions, and jump class instructions. Each instruction class includes multiple instruction classes, for example, logical computing class instructions include addition operations and sword operations. , or logical operations, etc., each instruction class has its own separate instruction code. The types of instructions described in the embodiments of the present invention mainly refer to the instruction subclass of each instruction.

At the same time, the parallel decoding unit also parses the address of the source operand required by the instruction in the multiport core register.

Step 103: Obtain a source operand in a multi-port kernel register according to an address of a source operand of the instruction.

As shown in FIG. 2, after obtaining the address of the source operand in the multi-port kernel register, the multi-port kernel register is accessed to obtain the source operand 202.

In order to support the maximum of four instructions being executed at the same time, in consideration of the access of the source operand/destination operand, it is necessary to make the multi-port kernel register into a multi-port structure. The multi-port kernel register of the embodiment of the present invention provides eight data read ports and four data write ports, and can simultaneously support four instruction accesses, each of which can access two source operands and one destination operand.

In the embodiment of the present invention, the multi-port kernel register is divided into two groups of registers according to threads, and each group of registers includes four register units; two source operands of one instruction are respectively in two different register units; one thread The destination operands of the two instructions are in two different register locations.

Step 104: Process the source operand by assigning a corresponding executable unit to the instruction according to the instruction type of the instruction.

As shown in FIG. 2, after the source operand is obtained from the multi-port kernel register, the instruction allocation unit starts the allocation of the executable unit according to the instruction type to maximize the processing performance 203.

In the embodiment of the present invention, the executable unit includes: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit. The three types of execution units described in the embodiments of the present invention respectively perform the execution functions of the three types of instructions. Embodiments of the present invention provide two sets of logical computing class execution units, two sets of data uploading/downloading class execution units, and two sets of jump class execution units.

The pipeline of the embodiment of the present invention has at most four instructions executed at the same time, and the instruction allocation unit allocates the instructions to the respective executable units according to the respective instruction types, and ensures that the same type of instructions are allocated to different groups of executable units, and cannot Generating resource conflicts triggers structural adventures.

The instruction types in the embodiments of the present invention are classified into logical computing instructions, data uploading/downloading. Class instruction, jump class instruction; each instruction type large class includes multiple instruction subclasses; each thread corresponds to a set of executable units, including: logical computing class execution unit, data upload/download class execution unit, and jump a class of execution units; for this purpose, according to the instruction type of the instruction, assigning a corresponding executable unit to the instruction, including:

Step 105: Store the processing result in a multi-port kernel register.

As shown in FIG. 2, the instructions are allocated to the respective executable units and the execution is completed. The processed result after execution needs to be written back to the specified destination register, and if it is a jump type instruction, the address 204 is re-addressed from the instruction memory.

The kernel register of the embodiment of the invention provides four data write ports, and supports up to four instructions to complete data write back. After the operation result is written back, an instruction is processed.

At compile time, the compiling unit judges the correlation between the two instructions before and after. The related flag determines whether a thread can transmit one instruction or two instructions at the same time. FIG. 3 is a flowchart of determining and marking the correlation between instructions according to an embodiment of the present invention, where the process includes the following steps:

Step 301: Determine whether the destination registers of the two instructions before and after are in the same area.

In a specific embodiment, the same area is mainly:

Multi-port kernel registers can provide 32 registers for each thread, numbered in order From 0 to register 31, each register space is 4 bytes. Register 0 to register 15 are divided into one area, and register 16 to register 31 are divided into another area.

If the destination registers of the two instructions are in the same area, then it is determined that the instructions are related to each other. As shown in FIG. 3, if the conditions are not met, the compiling unit discards the irrelevant flag. If the destination registers of the two instructions are not in the same area, then the decision of step 302 is continued.

Step 302: Determine whether there is a data risk in the destination register of the two instructions before and after.

In a specific embodiment, the data adventure is mainly: whether the source operand register of the latter instruction is the destination register of the previous instruction.

If there are data risks in the two instructions before and after, then it is determined that the instructions are related before and after, as shown in Figure 3, the compiler unit discards the irrelevant flag as shown in Figure 3. If there is no data risk in the previous two instructions, then the determination in step 303 is continued.

Step 303: Determine whether the instruction types of the two instructions before and after are different, and do not use the same executable unit.

The type of instruction judged here is an instruction subclass except for the jump class instruction. If the two instruction subclasses are the same, then it is determined to be related to the instruction before and after. If the same instruction is a jump class instruction, then only the instruction class is judged. Assume that the preceding and following instructions are related. As shown in FIG. 3, the compiling unit discards the irrelevant flag if the condition is not met. If the two instruction types are different before and after, it is determined that the two instructions are irrelevant, and the unrelated flag is placed in the latter instruction.

Step 304: Determine whether the previous instruction is a jump instruction.

Step 305: If the previous instruction is a jump instruction, then it is determined that the preceding and following instructions are related, and the compiling unit discards the irrelevant flag if the condition is not met.

Step 306: If the previous instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is set in the latter instruction.

In the embodiment of the present invention, the instruction needs to access the multi-port kernel register to obtain the source operand and back Write destination register, FIG. 4 is a flowchart of a pipeline read source operand and a write-back destination register according to an embodiment of the present invention, and the flow includes the following steps:

Step 401: According to the thread allocation, the multi-port kernel registers are divided into two groups.

As shown in FIG. 5, the multi-port kernel register module of the embodiment of the present invention is divided into two groups, namely, a group of registers of thread A and thread B, and each group of registers provides four register units.

In one embodiment, the four register units of the thread A include: a set of registers 0 to 15 being register unit 0, another set of registers 0 to 15 being register unit 2; and a set of registers 16 to 31 For register unit 1, another set of registers 16 through 31 is register unit 3. The four register unit division rules of thread B are the same as thread A, which are register units 4, 5, 6, and 7, respectively.

Step 402: Within the group, the source operand is read according to constraints and instructions.

According to the constraint, the two source operands of an instruction use the operands in two regions as much as possible, that is, one in register 0 to register 15, and the other in register 16 to register 31.

As shown in Figure 5, read port 0 and read port 1 are supplied to instruction 0 to complete the reading of the source operand, and so on, read port 2 and read port 3 are provided to instruction 1, and read port 4 and read port 5 are provided to the instruction. 2, read port 6 and read port 7 are provided to instruction 3, so that one instruction can access all 32 registers, and can also obtain two different operands, and can fully utilize the kernel register read port, and support up to four instructions. Access multiple port core registers simultaneously.

Step 403: In the group, the operation result is written back to the destination register according to the constraint and the instruction.

According to the constraint, the destination registers of the two instructions of a thread also use the registers in the two regions.

As shown in Figure 5, write port 0 is supplied to instruction 0, the operand result is written back to the destination register, and so on, write port 1 is supplied to instruction 1, write port 2 is supplied to instruction 2, and write port 3 is supplied to instruction. 3, in this way, you can fully utilize the kernel register write port, which supports up to four fingers. Let the kernel registers be accessed at the same time.

FIG. 6 is a flowchart of a pipeline parallel processing instruction according to an embodiment of the present invention, where the process includes the following steps:

Step 601: Instructing parallel decoding to parse out the instruction type.

As shown in FIG. 7, four decoding units analyze and decode four instructions, and parse out the respective instruction types. The three types of instruction types described in the embodiments of the patent, the logic calculation type instruction, the data upload/download type instruction, and the jump type instruction.

Step 602: Group the executable units according to the type of the instruction.

As shown in FIG. 7, the embodiment of the present invention provides two sets of logical computing class execution units, two sets of data uploading/downloading class execution units, and two sets of jump class execution units, respectively, for providing a set of logical computing class executions for threads A and B respectively. Unit, data upload/download class execution unit, and jump class execution unit.

The main classification rules here are mainly for the case where the two instruction classes of one thread are the same and the instruction classes are inconsistent. If the instructions of the two instructions are inconsistent, then only the instructions of each group need to be assigned to their respective corresponding ones. The executable unit is OK and there is no conflict.

When two instructions are of the same general class and the instruction subclasses are inconsistent, there are three cases, as follows:

The first case: the instruction type is a logical calculation class instruction, the small class is inconsistent, and the respective calculation unit is allocated in the thread.

If two instructions of a thread are logical calculation class instructions, the two instruction classes are inconsistent due to the constraints of the compilation unit, so only one thread needs to be allocated to the required calculation unit according to the instruction type.

The second case: the instruction type is an upload/download class instruction, the small class is inconsistent, and the respective data upload/download unit is allocated in the thread.

If two instructions of a thread are instructions of a large class of upload/download classes, the two classes are inconsistent due to the constraints of the compilation unit, so it is only necessary to assign the instruction types to the respective calculation units. .

The third case: one of the instructions is a jump class instruction, and the respective execution units are allocated according to the constraints.

According to the compiler unit constraint, if the previous one is a jump instruction, then one thread only transmits one instruction, then only the jump execution unit is assigned to this instruction. If the latter one is a jump instruction, the execution units are assigned according to the type.

Step 603: The instruction allocating unit completes the effective allocation of the executable unit.

FIG. 8 is a schematic structural diagram of a multi-transmission instruction parallel processing apparatus of a micro-engine according to an embodiment of the present invention. As shown in FIG. 8, the multi-transmission instruction parallel processing apparatus of the micro-engine includes:

The compiling unit 81 is configured to determine and mark the correlation between the instructions, and determine, according to the flag, whether to transmit the instructions in parallel;

The parallel decoding unit 82 is configured to parse the instructions in parallel when the instructions are transmitted in parallel, to obtain an instruction type of each instruction and an address of the source operand;

a reading unit 83 configured to obtain a source operand in the multi-port kernel register according to an address of the source operand of the instruction;

The instruction allocating unit 84 is configured to allocate, according to the instruction type of the instruction, the corresponding executable unit to process the source operand;

Write unit 85 is configured to store the processing results in a multi-port core register.

The compiling unit 81 is further configured to determine whether the destination registers of the two instructions are in the same area; when the destination registers of the two subsequent instructions are not in the same area, determine whether the destination registers of the two instructions have data risk; When the destination register of the last two instructions does not exist, the data type of the two instructions is different. When the instruction types of the two previous instructions are different, it is judged whether the previous instruction is a jump instruction; the current one is not When the instruction is jumped, it is determined that the two instructions are irrelevant, and an irrelevant flag is placed on the latter instruction.

The compiling unit 81 is further configured to: when the latter instruction is provided with an irrelevant flag, one thread transmits two instructions before and after in parallel.

The multi-port kernel registers are divided into two sets of registers according to threads, each set of registers including four register units; two source operands of one instruction are respectively in two different register units; the purpose operation of two instructions of one thread The numbers are in two different register units;

The instruction type of the instruction is divided into a logical calculation type instruction, a data upload/download type instruction, and a jump type instruction; each instruction type includes a plurality of instruction subclasses in a large class; each thread corresponds to a set of executables. The unit includes: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit;

The instruction allocating unit 84 is further configured to allocate the instructions of each group to the corresponding executable units when the two instructions of one thread are inconsistent; when two instructions of one thread are of the same type and the instruction class is small When there is inconsistency, it is processed according to the following three situations: when the instruction type is a logic calculation class instruction, the thread is assigned a respective logic calculation class execution unit; when the instruction type is an upload/download class instruction, the respective data upload is assigned in the thread/ The class execution unit is downloaded; when one of the instructions is a jump class instruction, the respective executable unit is allocated according to the constraint.

It should be understood by those skilled in the art that the implementation functions of the units in the multi-transmission instruction parallel processing apparatus of the micro-engine shown in FIG. 8 can be understood by referring to the related description of the multi-transmission instruction parallel processing method of the aforementioned micro-engine.

The multi-transmission instruction parallel processing method and device of the microengine according to the above embodiment of the present invention completes the judgment and labeling of the inter-instruction correlation by the compiling unit; and designs a unique kernel register structure supporting multi-port access; The decoding unit and the instruction dispatch unit perform parallel processing of the multi-transmit instructions. The embodiment of the present invention firstly reduces the complexity of micro-code software personnel programming by compiling the correlation and marking of instructions between the instructions; in addition, the unique multi-port access kernel register structure can well support multiple instructions for parallel processing; The parallel decoding unit and the instruction allocation unit implement parallel processing of multiple transmission instructions, and the implementation of the scheme is relatively simple and extremely large. Improved the performance of the microengine.

Each of the above units may be implemented by a central processing unit (CPU), a digital signal processor (DSP), or a field-programmable gate array (FPGA) in an electronic device.

The apparatus for tracking the service signaling according to the embodiment of the present invention may also be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a separate product. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions. A computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present invention. The foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read only memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

Correspondingly, the embodiment of the present invention further provides a storage medium, wherein a computer program for executing a multi-transmission instruction parallel processing method of the micro-engine of the embodiment of the present invention is stored.

Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a general purpose computer, a special purpose computer, An processor of an embedded processor or other programmable data processing device to generate a machine such that instructions executed by a processor of a computer or other programmable data processing device are generated for implementation in a flow or a flow of flowcharts and/or Or a block diagram of a device in a box or a function specified in a plurality of boxes.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

The above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention.

Industrial applicability

In the embodiment of the present invention, the judgment and marking of the inter-instruction correlation can reduce the complexity of the micro-code personnel programming; whether the instruction is transmitted in parallel according to the flag; when the instruction is transmitted in parallel, the instruction is parsed to obtain the instruction of each instruction. The address of the type and the source operand, implementing parallel parsing of the multi-transmission instruction; obtaining the source operand in the multi-port kernel register according to the address of the source operand of the instruction; according to the instruction type of the instruction, The instruction allocates a corresponding executable unit to process the source operand. The unique multi-end kernel register structure can well support multiple instruction parallel processing, and the corresponding executable unit can also perform parallel processing on the source operand, which greatly improves the performance of the microengine.

Claims

A multi-transmission instruction parallel processing method for a microengine, the method comprising:

Determining and marking the correlation between the instructions, and determining whether to transmit the instructions in parallel according to the marking;

When the instructions are transmitted in parallel, the instructions are parsed by the parallel decoding unit to obtain an instruction type of each instruction and an address of the source operand;

Obtaining a source operand in a multi-port kernel register according to an address of a source operand of the instruction;

And processing the source operand according to the instruction type of the instruction, by assigning a corresponding executable unit to the instruction;

Store the processing results in a multiport core register.
The multi-transmission instruction parallel processing method of the microengine according to claim 1, wherein the determining and marking the correlation between the instructions comprises:

Determine whether the destination registers of the two instructions before and after are in the same area;

When the destination registers of the last two instructions are not in the same area, it is judged whether there is data risk in the destination registers of the two instructions before and after;

When the destination register of the last two instructions does not have data risk, it is judged whether the instruction types of the two instructions before and after are different;

When the instruction types of the last two instructions are different, it is determined whether the previous instruction is a jump instruction;

When the current instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is placed on the latter instruction.
The multi-transmission instruction parallel processing method of the microengine according to claim 2, wherein the determining whether to transmit the instructions in parallel according to the flag comprises:

When the latter instruction has an irrelevant flag, one thread transmits two instructions before and after in parallel.
The multi-transmission instruction parallel processing method of a microengine according to claim 1, wherein

The multi-port kernel registers are divided into two sets of registers according to threads, each set of registers including four register units; two source operands of one instruction are respectively in two different register units; the purpose operation of two instructions of one thread The numbers are in two different register units;

The multi-port core register has eight data read ports and four data write ports, and supports four instruction accesses, each of which accesses two source operands and one destination operand.
The multi-transmission instruction parallel processing method for a microengine according to any one of claims 1 to 4, wherein the instruction type of the instruction is largely classified into a logical calculation type instruction, a data upload/download type instruction, and a jump type instruction. Each instruction type large class includes a plurality of instruction subclasses; each thread corresponds to a set of executable units, including: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit;

The assigning the corresponding executable unit to the instruction according to the instruction type of the instruction comprises:

When two threads of a thread are inconsistent, assign the instructions of each group to their respective executable units;

When the two instructions of a thread are of the same class and the instruction classes are inconsistent, they are handled according to the following three conditions:

When the instruction type is a logic calculation class instruction, the respective logic calculation class execution unit is allocated in the thread;

When the instruction type is an upload/download class instruction, the respective data upload/download class execution units are allocated in the thread;

When one of the instructions is a jump-type instruction, the respective executable units are allocated according to the constraints.
A multi-engineering instruction parallel processing device for a microengine, the device comprising:

a compiling unit configured to determine and mark the correlation between the instructions, and determine whether to transmit the instructions in parallel according to the marking;

a parallel decoding unit configured to parse the instructions in parallel when the instructions are transmitted in parallel, to obtain an instruction type of each instruction and an address of the source operand;

a read unit configured to obtain a source operand in the multi-port kernel register according to an address of a source operand of the instruction;

An instruction allocating unit configured to allocate, according to an instruction type of the instruction, a corresponding executable unit to process the source operand;

A write unit configured to store processing results in a multiport core register.
The multi-engine instruction parallel processing device of the microengine according to claim 6, wherein the compiling unit is further configured to determine whether the destination registers of the two instructions are in the same area; the destination registers of the current two instructions are not In the same area, it is judged whether there is data risk in the destination register of the two instructions before and after; when the destination register of the last two instructions does not have data adventure, it is judged whether the instruction types of the two instructions before and after are different; the instructions of the current two instructions When the type is different, it is judged whether the previous instruction is a jump instruction; when the current one instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is set on the latter instruction.
The multi-engine instruction parallel processing apparatus of the microengine according to claim 7, wherein the compiling unit is further configured to: when the subsequent instruction is placed with an irrelevant flag, one thread transmits two instructions before and after in parallel.
A multi-transmission instruction parallel processing apparatus for a microengine according to claim 6, wherein

The multi-port kernel registers are divided into two sets of registers according to threads, each set of registers including four register units; two source operands of one instruction are respectively in two different register units; the purpose operation of two instructions of one thread The numbers are in two different register units;

The multi-port core register has eight data read ports and four data write ports, and supports four instruction accesses, each of which accesses two source operands and one destination operand.
The multi-transmission instruction parallel processing apparatus of the microengine according to any one of claims 6 to 9, wherein the instruction type of the instruction is largely classified into a logical calculation type instruction, a data upload/down Load class instructions, jump class instructions; each instruction type large class includes a plurality of instruction subclasses; each thread corresponds to a set of executable units, including: a logical computing class execution unit, a data upload/download class execution unit, Jump class execution unit;

The instruction allocating unit is further configured to allocate instructions of each group to respective corresponding executable units when two instructions of one thread are inconsistent; when two instructions of one thread are in a large class and the instruction class is inconsistent When processing a large class of logical computing class instructions, the thread allocates its own logical computing class execution unit; when the instruction is a generic class upload/download class instruction, the thread allocates its own data upload/download. Class execution unit; when one of the instructions is a jump class instruction, the respective executable unit is allocated according to the constraint.
A storage medium having stored therein computer executable instructions configured to perform the multi-transmission instruction parallel processing method of the microengine of any of claims 1-6.