WO2017088456A1

WO2017088456A1 - Pipeline data synchronization apparatus and method for multi-input multi-output processor

Info

Publication number: WO2017088456A1
Application number: PCT/CN2016/086100
Authority: WO
Inventors: 李震; 刘少礼; 张士锦; 罗韬; 钱诚; 陈云霁; 陈天石
Original assignee: 中国科学院计算技术研究所
Priority date: 2015-11-24
Filing date: 2016-06-17
Publication date: 2017-06-01
Also published as: CN105824604A; CN105824604B

Abstract

A pipeline data synchronization apparatus and method for a multi-input multi-output processor. The apparatus comprises: a multi-input multi-output function unit (6) having multiple operating pipeline levels for executing an operation on input operands to respond to an instruction; and a pipeline controller (5) for receiving an instruction, parsing the input operands required by the instruction, determining validity of the input operands, sending the instruction to the multi-input multi-output function unit if the input operands are all valid, and sending a dummy instruction to the multi-input multi-output function unit if at least one input operand is invalid; the pipeline controller (5) receives an output request of the multi-input multi-output function unit (6), determines feasibility thereof, receives, if the output request is feasible, the output request within a takt period of a chip and forwards the output request to memories (1, 2, 3, 4) within a certain time, and stops the output of the multi-input multi-output function unit (6) if the output request is infeasible. Thus, the invention not only solves a pipeline synchronization problem of the multi-input multi-output function unit (6), but also greatly reduces memory access costs of a processor and improves memory access efficiency of the processor.

Description

Multi-input multi-output processor pipeline data synchronization device and method

Technical field

The invention belongs to the field of computer electronics, and in particular relates to a multi-input multi-output processor pipeline data synchronization device and method.

Background technique

The data processing device adopts a pipeline processor that divides multiple pipeline stages, and has the advantage of being able to execute multiple instructions at the same time, thereby increasing the processor main frequency, improving the processor data throughput, and improving the processor computing performance. However, during execution of any particular instruction, the instruction will pass through the various pipeline stages of the processor. When the instruction passes through the last pipeline stage of the processor, the execution of the instruction ends, at which point the instruction modifies the processor data device to reflect the execution result of the instruction. .

The need for data synchronization requires that the processor's functional components interact with the memory during the instruction execution phase. For example, when a data read conflict occurs that causes the feature to fail to get all input operands within one clock cycle of the chip, the transmission of the instruction may be delayed; or the output data of the feature may not be written to the target storage address or cache, Can block the pipeline.

Patent Document 1 (Publication No. CN 103383641 A) discloses a multi-pipeline synchronization device in which a control status register transfer unit completes control state register transfer between pipelines, including a control register configuration operation of pipeline A to pipeline B. And the status register read operation of pipeline A on pipeline B. The pipeline register transfer unit completes the register transfer of the normal register in pipeline A and the pipeline configuration register in pipeline B, and the pipeline B obtains the register information required for its operation. The synchronization logic unit is responsible for receiving the scheduling instruction and the blocking information, and generating a pause signal for the pipeline A according to the control status register, and generating scheduling enable and delivery scheduling information for the pipeline B. However, the device only supports the inter-core synchronization of the multi-core CPU, and synchronizes the pipeline data of the single-input single-output processor, and synchronizes the multi-pipeline in the core by generating the scheduling enable and the pause signal only by the synchronization logic unit. When synchronizing multiple pipelines in a core, when the number of input operands or output operands of a functional component increases, the interaction between the functional component and the memory is more frequent, the data conflict is more serious, and how to solve it in an efficient manner. The problem of pipeline data synchronization for multi-input and multi-output features is even more pronounced.

Invention disclosure

The object of the present invention is to solve at least the above problems and drawbacks, and to provide a multi-input multi-output processor pipeline data synchronization device and method, which solves the multi-input and multi-output functional component flow-level data synchronization control and improves processor performance.

The multi-input multi-output processor pipeline data synchronizing apparatus of the present invention is disposed in a chip having an instruction transmitter that communicates with one or more memories, wherein the synchronization apparatus includes:

a multiple input multiple output function having a plurality of operational pipeline levels for responding to instructions by performing operations on a plurality of input operands;

A pipeline controller connected to the MIMO function:

Receiving an instruction sent by the instruction transmitter, parsing an input operand required by the instruction, and determining the validity of the input operand. If the input operand is all valid, sending the instruction into a beat of the chip cycle The multiple input multiple output function; if at least one of the input operands is invalid, sending a null instruction to enter the multiple input multiple output function; and,

Receiving a plurality of output requests of the multiple input multiple output function and determining the feasibility of the output request, if feasible, receiving the output request within one beat cycle of the chip, and at the chip The output request is forwarded to the memory during a beat cycle, and if it is determined that the output request is not feasible, blocking an output of the multiple input multiple output function component;

Prefetch input operand logic coupled to the memory and the MIMO function for calculating an input operand address required for the instruction, and transmitting an input operand prefetch request to the memory, and the cache is prefetched Back data

Wherein, the validity of the input operand means that the input operand required by the instruction that the pipeline controller is parsing has been logically prefetched by the prefetch input operand or is being read, and can be guaranteed in the next The input operand is valid when the chip cycle is sent to the multiple input multiple output function, otherwise it is invalid;

The feasibility of the output request refers to that the output operand output by the multiple input multiple output function component can be temporarily stored by the buffer or can be read into the memory, and the output request is feasible, otherwise Not feasible.

The multi-input multi-output processor pipeline data synchronization device of the present invention, wherein:

The pipeline controller is configured to set an input logic to parse an input operand required by the instruction and Judge the validity of the input operand;

The pipeline controller receives a plurality of output requests of the MIMO function and determines the feasibility of the output request by setting an output logic.

If the output logic blocks the output of the multiple input multiple output function from exceeding one beat of the chip cycle, the blocking instruction enters the multiple input multiple output function.

In addition, the present invention further provides a method for synchronizing pipeline data of a multiple input multiple output processor using the above-described multi-input multi-output processor pipeline data synchronization device, comprising the following steps:

Step 1. The pipeline controller receives an instruction sent by the instruction transmitter, parses the input operand required by the instruction, and determines the validity of the input operand. If the input operand is all valid, it is within one beat of the chip cycle. Sending an instruction to enter the multiple input multiple output function component, if at least one of the input operands is invalid, sending a null instruction to enter the multiple input multiple output function component;

Step 2, the multiple input multiple output function component performs an operation on a plurality of input operands through a computing pipeline level according to an instruction, and sends an output request to the pipeline controller;

Step 3, the pipeline controller receives a plurality of output requests of the multiple input multiple output function, and determines the feasibility of the output request, and if possible, receives the output request within one cycle of the chip, And forwarding the output request to a memory during a beat cycle of the chip, and blocking the output of the multiple input multiple output function component if the output request is not feasible;

Wherein, the validity of the input operand means that the input operand required by the instruction that the pipeline controller is parsing has been logically prefetched by the prefetch input operand or is being read, and can be guaranteed in the next Sending to the multi-input multi-output function component during the chip cycle, the input operand is valid, otherwise it is invalid;

The method for synchronizing pipeline data of a multiple input multiple output processor of the present invention further includes the following steps:

Step 10: The input logic receives the instruction sent by the instruction transmitter, parses the input operand required by the instruction, and determines the validity of the input operand. If the input operand is all valid, within one beat of the chip cycle, Sending an instruction to enter the multiple input multiple output function; or the input If at least one of the operands is invalid, then a null instruction is sent to enter the multiple input multiple output function;

Step 20: The multi-input multi-output function component performs an operation on a plurality of input operands through a computational pipeline level according to an instruction, and sends an output request to the output logic;

Step 30: The output logic receives the plurality of output requests of the multiple input multiple output function component, and determines the feasibility of the output request, and if possible, receives the output request within one beat cycle of the chip, and is certain The output request is forwarded to the memory within a time period; or if the output request is not feasible, blocking the output of the multiple input multiple output function.

Step 40: Prefetch the input operand logic to calculate the required operand address, and send a prefetch request to the memory storing the operand, and cache the prefetched instruction.

The method for synchronizing pipeline data of a multiple input multiple output processor according to the present invention, wherein if the output logic blocks the output of the multiple input multiple output function component from exceeding one beat of a chip cycle, the blocking instruction enters the multiple input multiple Output features.

In addition, the present invention also provides a data processing chip including the above-described multi-input multi-output processor pipeline data synchronizing apparatus.

BRIEF DESCRIPTION OF THE DRAWINGS

1 is a block diagram showing a configuration of a multiplexed pipeline data synchronization apparatus of an embodiment of the present invention.

2 is a block diagram showing a configuration of a multiplexer pipeline data synchronization apparatus according to another embodiment of the present invention.

3 is a block diagram showing the logic for parsing the validity of an instruction input operand (ie, input logic 7) in the multiple input multiple output processor pipeline data synchronization apparatus of FIG. 2.

4 is a block diagram showing the functional component output request arbitration and processing logic (i.e., output logic 8) in the multiple input multiple output processor pipeline data synchronization apparatus of FIG.

Figure 5 is a flow chart showing the method of pipeline data synchronization of the multiple input multiple output processor of the present invention.

6 is a flow chart showing a method of pipeline data synchronization of a multiple input multiple output processor according to another embodiment of the present invention.

FIG. 7 shows a data processing chip of an embodiment of the present invention.

Description of the reference numerals

1~4...RAM memory, 5...pipeline controller, 6...multiple input and multiple output function, 7...input logic,8...output logic, 9...two input and two outputs Function, 11...data prefetch logic for

input logic

7, 12...internal control logic for

input logic

7,13...data queue for

input logic

7, 14...data blank for input logic 7 Logic, 15...data bus interface of

input logic

7,16...internal control logic of output logic 8,17...output FIFO queue of logic 8, 18...data blanking logic of output logic 8, 19...output logic 8 data bus interface, 100,300... input control signal, 200...output control signal, 400...multiple input multiple output processor pipeline data synchronization device, 500...data Processing the chip.

The best way to implement the invention

In order to make the objects, technical solutions and advantages of the present invention more comprehensible, the multi-input multi-output processor pipeline data synchronization device and method of the present invention will be further described in detail below with reference to the accompanying drawings. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Fig. 1 is a block diagram showing a multi-input multi-output processor pipeline data synchronizing apparatus of the present invention. The multiple input means: a. the input data is derived from a plurality of different RAM memories; b. the input operands of the functional components in one chip cycle can be zero to more; the multiple outputs are: a. The destination address of the data may be a plurality of different RAM memories; b. the number of output operands of the function within one chip cycle beat may be zero to more; in particular, the synchronization device comprises: a multiple input multiple output function component 6, Corresponding to four

RAM memories

1, 2, 3, 4, a sequence of instructions can be executed, and the functional component 1 can have multiple input operands, multiple output operands, multiple pipeline stages, and multiple input operations by execution. The number of operations is responsive to the instruction; the pipeline controller 5 is disposed in the chip, the pipeline controller receives the instruction sent by the instruction transmitter in the chip, parses the input operand required by the instruction, and determines the validity of the input operand if The required data is valid, and a request is sent to the function component during one cycle of the chip, and if at least one of the input operands is invalid, the pipeline control The device sends a bubble command to the function, the empty command does not affect the data register inside the function; at the same time, the pipeline controller receives multiple output requests of the feature and determines the feasibility of the output request, if feasible, on the chip Receive a request within one beat cycle, and then forward the request to memory within a certain period of time; if not, block the feature pipeline, but when the pipeline controller When the function of the function is blocked, the pipeline controller is not blocked from parsing the current instruction and the data prefetch function. The validity of the input operand means that the input operand required by the instruction that the pipeline controller is parsing has been prefetched or is being read, and can be guaranteed to be sent to the function within the next chip cycle beat. The operand is valid, otherwise it is considered invalid; the feasibility of the output request is: zero or more output operands output by the function, which can be temporarily buffered by the buffer or read into the corresponding address of the RAM to be written. , the output request is considered feasible, otherwise the output request is considered not feasible; the validity of the input operand and the feasibility of the output request are all completed within one cycle of the chip. Additionally, the functional component 6 having a plurality of pipeline stages has a buffer buffer configured to buffer an input instruction operation and data. In addition, the pipeline controller analyzes the input operands required by the plurality of instruction functions and prefetches the data while parsing the validity of the input data required by the current instruction.

2 is a block diagram showing a configuration of a multiplexer pipeline data synchronization apparatus according to another embodiment of the present invention. In the MIMO pipeline data synchronizing apparatus of this embodiment, as the MIMO function, a two-input and two-output function is employed as 9, which corresponds to four

RAM memories

1, 2, 3, and 4. , but its pipeline controller includes: logic 7 for analyzing the validity of the instruction input operand (hereinafter referred to as "input logic"), function output request arbitration and processing logic 8 (hereinafter referred to as "output logic"), and prefetch instruction Enter the logic 10 of the operand (hereafter referred to as "prefetch input operand logic").

The functional component 6 includes a plurality of pipeline stages configured to execute a response instruction by the input operand, and: A, the register registers an instruction that is entered after logical parsing of the validity of the parsing instruction input operand; When the output request of the output data is rejected by the output request arbitration and processing logic, the stall function block pipeline enters the logic instruction that parses the validity of the instruction input operand. The two-input and two-output functional component 6 uses a pipeline structure for arithmetic processing to quantize data. This functional component can be used to process algorithms such as neural network algorithms and image processing algorithms.

Enter logic 7, confirm the number of input operands required by the feature, and each input operand comes from the location in the four RAMs and whether the data is valid, that is, whether all input operands are prefetched into the cache or can be One beat of the chip cycle is input to the function, and: A. If all input operands are prefetched into the cache or can be input to the function within one beat of the chip cycle, the command is sent to the feature . B. Otherwise, an empty command that does not affect the state of the feature enters the feature. The purpose of sending an empty instruction is to not change the function part. The internal data registers are protected from data flow errors. In addition, the specific structure of the input logic 7 will be explained below.

Output logic 8, confirm the number of operands output by the function and the output operand should be written to the four RAM locations and whether the corresponding RAM address or cache can write data, and: A, if any of the output operands The corresponding memory or cache unwritable data blocks the functional stream output. B. Otherwise, multiple output operands are written to their respective RAM addresses or caches. After the output logic 8 blocks the flow level output of the functional component 6, the functional component 6 can cache an instruction that has been logically parsed after parsing the validity of the input operand, and does not immediately block the input logic 7 from entering the functional component. The output logic 8 blocks the function stream level output beyond one cycle of the chip cycle and blocks the input logic 7 command into the feature. When the function component 6 blocks the input logic 7 instruction from entering, the logic function of the prefetch instruction input operand is normally executed. Additionally, the specific structure for this output logic 8 will be explained below.

Prefetching input operand logic 10 is configured to calculate an operand address required for a number of subsequent instructions through the instruction field and register information, and to send a prefetch request value corresponding to the RAM; and to cache the prefetched instruction.

In addition, FIG. 3 shows the structure of the input logic 7 in FIG. 2. The internal control logic unit receives the external control and determines the control command sent to the arithmetic unit according to the internal register state, and determines the operand required corresponding thereto. When the internal control logic 12 confirms that the data required by the arithmetic unit is ready, the control command and the input operand are sent to the arithmetic component to implement synchronization control of the pipeline input data. In order not to cause the computational component flow level to stall due to waiting for input data, the data prefetch logic will interpret the input operation operand required by the arithmetic component by several clock cycles in advance with the internal control logic and send data prefetch to the corresponding storage block. request. Therefore, in the input data synchronization processing logic, the FIFO queue 13 buffers the prefetched data. At the same time, an input data arbitration logic unit 14 determines whether the data of the input arithmetic component is derived from the FIFO queue or directly from the data bus.

Figure 4 shows the structure of the output logic 8 of Figure 2. The internal control logic unit 16 receives the control signal transmitted by the arithmetic unit and confirms the number of operands output by the arithmetic unit according to the internal register status and the position of the memory block to which the output operand should be written, and whether the corresponding memory block address or the cache can write data. . At the same time, the output logic 8 determines whether to block the computation component pipeline in accordance with whether the output data can be written to and received. Similarly, in order to prevent the arithmetic component from being blocked due to the output data, the output data is synchronized. The processing logic 8 contains data in which the FIFO queue 17 buffer cannot be written to the memory block in time. Moreover, in order to maintain output data consistency, a data arbitration logic 18 is required to determine that the current clock cycle writes the stored data from the FIFO queue or directly from the bus.

In addition, the present invention further provides a method for synchronizing pipeline data of a multiple input multiple output processor using the above-described multi-input multi-output processor pipeline data synchronization device. As shown in FIG. 5, the method includes the following steps:

Step 3, the pipeline controller receives a plurality of output requests of the multiple input multiple output function, and determines the feasibility of the output request, and if possible, receives the output request within one cycle of the chip, And forwarding the output request to the memory within a certain time, and blocking the output of the multiple input multiple output function component if not feasible;

Wherein, the validity of the input operand means that the input operand required by the instruction that the pipeline controller is parsing has been prefetched or is being read, and can be guaranteed to be sent to the next chip cycle tempo The multi-input multi-output function, the input operand is valid, otherwise it is invalid; the feasibility of the output request means that the output operand output by the multi-input multi-output function can be temporarily buffered The output request is feasible if it is stored or can be read in, otherwise it is not feasible.

6 is a flow chart showing a method of pipeline data synchronization of a multiple input multiple output processor according to another embodiment of the present invention. Including the following steps:

Step 10: input an input operand required by the logic parsing instruction, and determine whether all input operands have been prefetched into the cache or can be input to the MIMO function within one beat of the chip cycle. If yes, sending an instruction to enter the multiple input multiple output function component; if not, sending an empty instruction to enter the multiple input multiple output function component;

Step 30, the output logic determines whether the memory address corresponding to all the output operands of the output request or whether the cache can write data, and if so, writes the output operand to the corresponding memory address or cache, respectively. No, the output of the MIMO function is blocked.

Step 40: Prefetch input operand logic calculates an operand address required by the instruction through the instruction field and the register information, and sends a prefetch request to the memory storing the operand, and caches the prefetched instruction.

The method for synchronizing pipeline data of a multiple input multiple output processor according to the present invention, wherein the output logic blocks an output of the multiple input multiple output function component to exceed one beat of a chip cycle, and the blocking instruction enters the multiple input multiple Output features.

Hereinafter, several specific embodiments will be described in detail to describe a method for synchronizing the IO data of the MIMO processor of the present invention.

In one example, all required data for the instruction has been prefetched, and the feature output data can be written to the corresponding RAM or buffer buffer, and the feature can accept a new instruction every one beat of the chip cycle. The feature runs at high speed and this is an excellent way, but it's not just that.

In another example, one of the input operation data required by the instruction is not prefetched, and the logic that parses the validity of the instruction input operand immediately sends a read request to the RAM where the operation data is located, and operates in the case where the RAM does not malfunction. The number can be acquired by the function at the next chip cycle, so the command can be entered into the feature without blocking.

In another example, all required input operation data is valid, but the feature output data has one or none of the current chip cycle beats or can be written to the corresponding RAM or buffer buffer, but at the next chip cycle. It can be written. Since the function can buffer an input command, the function can still receive commands and data in the next chip cycle without causing blocking.

In another example, the logic of the prefetch instruction input operand calculates the location address of the operands required for the subsequent instructions based on the current instruction specific field and the register information. In the case that the function does not conflict with reading the current instruction input operation data or writing the output operation data, it will prefetch the input operation data required after the instruction, which is especially effective for the neural network algorithm because the neural network The data of the network is numerous and the spatial locality of the data is extremely high. In this manner, the multi-input and multi-output component pipeline synchronization device of the present invention can greatly reduce processor memory access overhead and improve processor performance.

In addition, as shown in FIG. 7, the present invention also provides a data processing chip 500 including the above-described multiple input multiple output processor pipeline data synchronization device 400.

In addition, it should be noted that the "instructions" of the present invention refer to computer program instructions, which may also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner. The instructions stored in the computer readable memory are caused to produce an article of manufacture comprising an instruction device that implements the functions specified in one or more blocks of the flow or in a flow or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Industrial applicability

According to the multi-input and multi-output function pipeline synchronization device of the present invention, parsing and prefetching of input data and output request response of a plurality of output operands of the functional component are realized, thereby being able to solve the multi-input and multi-output functional component flow level data. Synchronous control to improve processor performance.

Claims

A multi-input multi-output processor pipeline data synchronizing device disposed in a chip having an instruction transmitter that communicates with one or more memories, wherein the synchronization device comprises:

a multiple input multiple output function having a plurality of computational flow levels for performing operational response instructions to a plurality of input operands;

A pipeline controller connected to the MIMO function:

Receiving an instruction sent by the instruction transmitter, parsing an input operand required by the instruction, and determining the validity of the input operand. If the input operand is all valid, sending the instruction into a beat of the chip cycle The multiple input multiple output function; if at least one of the input operands is invalid, sending a null instruction to enter the multiple input multiple output function; and,

Receiving a plurality of output requests of the multiple input multiple output function and determining the feasibility of the output request, if feasible, receiving the output request within one beat cycle of the chip, and at the chip The output request is forwarded to the memory during a beat cycle, and if it is determined that the output request is not feasible, blocking an output of the multiple input multiple output function component;

Prefetch input operand logic coupled to the memory and the MIMO function for calculating an input operand address required for the instruction, and transmitting an input operand prefetch request to the memory, and the cache is prefetched Back to the data.
The pipeline device for synchronizing multiple input and multiple output processors according to claim 1, wherein:

The pipeline controller determines an input operand required for an instruction and determines the validity of the input operand by setting an input logic;

Wherein, the validity of the input operand means that the input operand required by the instruction that the pipeline controller is parsing has been logically prefetched by the prefetch input operand or is being read, and can be guaranteed in the next The input operand is valid when the chip cycle is sent to the multiple input multiple output function, otherwise it is invalid;

The pipeline controller receives a plurality of output requests of the multiple input multiple output function component by determining an output logic, and determines a feasibility of the output request, wherein the output request The feasibility means that the output operand output by the MIMO function can be temporarily stored by the buffer or can be read into the memory, and the output request is feasible, otherwise it is not feasible.
A pipelined data synchronization device for a multiple input multiple output processor according to claim 2, wherein:

If the output logic blocks the output of the multiple input multiple output function from exceeding one beat of the chip cycle, the blocking instruction enters the multiple input multiple output function.
A method for synchronizing pipeline data of a multiple input multiple output processor using the multi-input multi-output processor pipeline data synchronization device according to any one of claims 1 to 3, comprising the steps of:

Step 1. The pipeline controller receives an instruction sent by the instruction transmitter, parses the input operand required by the instruction, and determines the validity of the input operand. If the input operand is all valid, it is within one beat of the chip cycle. Sending an instruction to enter the multiple input multiple output function component, if at least one of the input operands is invalid, sending a null instruction to enter the multiple input multiple output function component;

Step 2, the multiple input multiple output function component performs an operation on a plurality of input operands through a computing pipeline level according to an instruction, and sends an output request to the pipeline controller;

Step 3, the pipeline controller receives a plurality of output requests of the multiple input multiple output function, and determines the feasibility of the output request, and if possible, receives the output request within one cycle of the chip, The output request is forwarded to the memory during one ticks of the chip, and if the output request is not feasible, the output of the MIMO function is blocked.
The method for synchronizing pipeline data of a multiple input multiple output processor according to claim 4, further comprising the steps of:

Step 10: The input logic receives the instruction sent by the instruction transmitter, parses the input operand required by the instruction, and determines the validity of the input operand. If the input operand is all valid, within one beat of the chip cycle, Sending an instruction to enter the multiple input multiple output function; or if the input operand has at least one invalid, sending an empty instruction to enter the multiple input multiple output function;

Step 20: The multi-input multi-output function component performs an operation on a plurality of input operands through a computational pipeline level according to an instruction, and sends an output request to the output logic;

Step 30: The output logic receives the plurality of output requests of the multiple input multiple output function component, and determines the feasibility of the output request, and if possible, receives the output request within one beat cycle of the chip, and is certain Forwarding the output request to the memory for a time; or if the output If the request is not feasible, the output of the MIMO function is blocked.
The method for synchronizing pipeline data of a multiple input multiple output processor according to claim 5, further comprising the steps of:

Step 40: Prefetch the input operand logic to calculate the required operand address, and send a prefetch request to the memory storing the operand, and cache the prefetched instruction.
A method for synchronizing pipeline data of a multiple input multiple output processor according to claim 5, characterized in that

If the output logic blocks the output of the multiple input multiple output function from exceeding one beat of the chip cycle, the blocking instruction enters the multiple input multiple output function.
A data processing chip comprising the MIMO pipeline data synchronization device according to any one of claims 1 to 3.