CN105824604B

CN105824604B - Multiple-input and multiple-output processor pipeline data synchronization unit and method

Info

Publication number: CN105824604B
Application number: CN201510824901.7A
Authority: CN
Inventors: 李震; 刘少礼; 张士锦; 罗韬; 钱诚; 陈云霁; 陈天石
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2015-11-24
Filing date: 2015-11-24
Publication date: 2017-08-29
Anticipated expiration: 2035-11-24
Also published as: WO2017088456A1; CN105824604A

Abstract

The present invention discloses a kind of multiple-input and multiple-output processor pipeline data synchronization unit and method, and described device includes：MIMO function part with multiple computing pipelining-stages, is responded by the computing performed to input operand and instructed；Pipeline controller, receives instruction, the input operand needed for analysis instruction and the validity for judging input operand, if all effectively, sending instructions enter into functional part, if at least one is invalid, sends do-nothing instruction entered function part；The output request of Pipeline controller receive capabilities part, and judge feasibility, if feasible, the output request is then received within a beat cycle of chip, and output request is forwarded to memory within a certain period of time, and if infeasible, the output of blocking function part.Thus, the Pipeline Synchronisation problem of MIMO function part is not only solved, while processor memory access expense can be reduced largely, the memory access efficiency of processor is improved.

Description

Multiple-input and multiple-output processor pipeline data synchronization unit and method

Technical field

The invention belongs to computer electronic applications, more particularly to a kind of multiple-input and multiple-output processor pipeline data syn-chronization Device and method.

Background technology

Data processing equipment uses the pipeline processor of the multiple pipelining-stages of cutting, with can perform a plurality of instruction simultaneously Advantage, so as to increase processor host frequency, improves processor data handling capacity, improves processor operational performance.However, performing During any specific instruction, the instruction all by by each pipelining-stage of processor, instructs last stream via processor During water level, instruction execution terminates, and reflects the implementing result of instruction in this instruction modification processor data equipment.

The functional part that processor is come to the synchronized demand of data enters line number in execution phase need and memory According to interaction.Occur in that data read conflict causes functional part to be obtained in a clock cycle beat of chip for example, working as All input operands, then can be with the transmitting of delay instruction；Or the output data of functional part can not write target storage In address or caching, streamline can be blocked.

Patent document 1 (Publication No. CN 103383641A) discloses a kind of multiple pipeline sychronisation, in the device In, state of a control register transfer unit completes the state of a control register transfer between streamline, including assembly line A is to flowing water Line B control register configuration operation, and status register read operation of the assembly line A to streamline B.Pipeline register is passed Unit is sent to complete the register transfer of general register and pipeline configuration register in streamline B in assembly line A, with this flowing water Line B obtains the register information needed for its operating.Synchronous logic unit is responsible for receiving dispatch command and blocks information, and according to State of a control register, is that assembly line A produces halted signals, is that streamline B produces scheduling enable and transmission schedule information.But should Device only supports the internuclear synchronization of multi-core CPU, and is the synchronization of single-input single-output processor pipeline data, and only passes through Synchronous logic unit produces scheduling and enables with halted signals to realize the synchronization of multiple pipeline in core.Realizing multiple pipeline in core Synchronization when, when the input operand of functional part or when exporting the number increase of operand, functional part and memory Interaction is more frequent, and data collision is even more serious, as how a kind of efficient mode solves MIMO function part It is more prominent that pipeline data synchronizes problem.

The content of the invention

It is an object of the present invention to solve at least the above and defect, there is provided a kind of multiple-input and multiple-output processor stream Line data sychronisation and method, are solved at the control of MIMO function part pipelining-stage data synchronizing, raising Manage device performance.

The multiple-input and multiple-output processor pipeline data synchronization unit of the present invention, is arranged on the core with instruction issue device In piece, the sychronisation is in communication with each other with one or more memories, wherein, the sychronisation includes：

MIMO function part with multiple computing pipelining-stages, for by performing to multiple input operands Computing response instruction；

The Pipeline controller being connected with the MIMO function part：

For receiving the instruction that the instruction issue device is sent, input operand needed for analysis instruction simultaneously judges input behaviour The validity counted, if the input operand is all effective, in a beat of chip period, sends instructions enter into institute State MIMO function part；If at least one is invalid for the input operand, sends do-nothing instruction and enter described how defeated Enter multi output functional part；And,

For receive the MIMO function part multiple outputs ask, and judge it is described output request can Row, if feasible, receives the output request within a beat cycle of chip, and in a beat week of the chip The output request is forwarded to the memory in phase, the output request, to be infeasible, blocks described how defeated if judging Enter the output of multi output functional part；

With memory and the MIMO function part are connected prefetches input operand logic, for calculating Required input operand address is instructed, and input operand is sent to the memory and prefetches request, and caching is prefetched back Data；

Wherein, the validity of the input operand refers to, needed for the instruction that the Pipeline controller is being parsed Input operand is prefetched or read by the input operand logic that prefetches, and ensure that in next chip Sent in cycle beat to the work(MIMO function part, then the input operand is effective, otherwise to be invalid；

Wherein, the feasibility of the output request refers to, the output operation of the MIMO function part output The memory that number can be kept in or can be read into by buffer, then the output request is feasible, otherwise to be infeasible.

The multiple-input and multiple-output processor pipeline data synchronization unit of the present invention, wherein：

The Pipeline controller is by setting input operand needed for an input logic, analysis instruction and to input The validity of operand is judged；

The Pipeline controller is to export logic by setting one, receives many of the MIMO function part Individual output request is simultaneously judged the feasibility for exporting request.

If the output logic blocks a section of the output more than chip period of the MIMO function part More than bat, then obstruction instruction enters the MIMO function part.

In addition, the present invention also provides a kind of real using above-mentioned multiple-input and multiple-output processor pipeline data synchronization unit The synchronized method of existing multiple-input and multiple-output processor pipeline data, comprises the following steps：

Step 1, the Pipeline controller receives the instruction that instruction issue device is sent, the input operation needed for analysis instruction The validity of input operand is counted and judges, if the input operand is all effective, in a beat of chip period, The MIMO function part is sent instructions enter into, if the input operand has at least one invalid, sends empty Instruction enters the MIMO function part；

Step 2, the MIMO function part, is performed by computing pipelining-stage according to instruction and multiple inputs is grasped The computing counted, and send output request to the Pipeline controller；

Step 3, the Pipeline controller receives multiple outputs request of the MIMO function part, and sentences The feasibility of the disconnected output request, if feasible, receives the output request within a beat cycle of chip, and in institute State in beat cycle of chip and the output request is forwarded to memory, if the output request is infeasible, block The output of the MIMO function part；

The present invention multiple-input and multiple-output processor pipeline data synchronized method, wherein, further comprise as Lower step：

Step 10, input logic receives the instruction that the instruction issue device is sent, the input operand needed for analysis instruction And judge the validity of input operand, and if the input operand is all effective, in a beat of chip period, hair Instruction is sent to enter the MIMO function part；Or the input operand has at least one invalid, then send empty Instruction enters the MIMO function part；

Step 20, the MIMO function part, is performed to multiple inputs according to instruction by computing pipelining-stage The computing of operand, and send output request to output logic；

Step 30, output logic receives multiple outputs request of the MIMO function part, and judges described The feasibility of request is exported, if feasible, the output request is received within a beat cycle of chip, and in certain time It is interior that the output request is forwarded to the memory；If or the output request is infeasible, blocks the multi input many The output of output function part.

Step 40, operand address needed for the instruction of input operand logical calculated is prefetched, and to the operand that is stored with Memory send and prefetch request, and cache the instruction being prefetched back.

The synchronized method of the multiple-input and multiple-output processor pipeline data of the present invention, wherein, if the output is patrolled The output for collecting the obstruction MIMO function part exceedes more than a beat of chip period, then obstruction, which is instructed, enters The MIMO function part.

In addition, the present invention, which is also provided, a kind of includes the number of above-mentioned multiple-input and multiple-output processor pipeline data synchronization unit According to process chip.

The effect of the present invention is as follows：

According to the synchronized device of MIMO function parting lines of the present invention, the parsing of input data is realized With prefetch and to output request-replies of the functional part containing multiple output operands, it is how defeated therefore, it is possible to solve multi input Go out the control of functional part pipelining-stage data synchronizing, improve processor performance.

Brief description of the drawings

Fig. 1 shows the composition of the multiple-input and multiple-output processor pipeline data synchronization unit of embodiments of the present invention Figure.

Fig. 2 shows the multiple-input and multiple-output processor pipeline data synchronization unit of another embodiment of the present invention Pie graph.

Fig. 3 shows the analysis instruction input behaviour in Fig. 2 multiple-input and multiple-output processor pipeline data synchronization unit The structure chart of the logic (i.e. input logic 7) for validity of counting.

Fig. 4 shows that the functional part output in Fig. 2 multiple-input and multiple-output processor pipeline data synchronization unit please Seek the structure chart of arbitration and processing logic (exporting logic 8).

Fig. 5 shows the flow chart of the multiple-input and multiple-output processor pipeline method of data synchronization of the present invention.

Fig. 6 shows the stream of the multiple-input and multiple-output processor pipeline method of data synchronization of another embodiment of the present invention Cheng Tu.

Fig. 7 shows the data processing chip of embodiments of the present invention.

Description of reference numerals

1~4 ... RAM memory, 5 ... Pipeline controllers, 6 ... MIMO function parts, 7 ... input logics, 8 ... output logics, 9 ... two two output function parts of input, the data pre-fetching logic of 11 ... input logics 7,12 ... input logics 7 Inner control logic, the fifo queue of 13 ... input logics 7,14 ... input logics 7 data punching logic, 15 ... input patrol It is volumes 7 data bus interface, the inner control logic of 16 ... output logics 8, the fifo queue of 17 ... output logics 8,18 ... defeated The data for going out logic 8 are punched logic, the data bus interface of 19 ... output logics 8,100,300 ... input control signals, 200 ... Output control signal, 400 ... multiple-input and multiple-output processor pipeline data synchronization units, 500 ... data processing chips.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, below in conjunction with accompanying drawing to the present invention's Multiple-input and multiple-output processor pipeline data synchronization unit and method are further elaborated.It should be appreciated that this place The specific embodiment of description only to explain the present invention, is not intended to limit the present invention.

The pie graph of the multiple-input and multiple-output processor pipeline data synchronization unit of Fig. 1 present invention.The multi input is Refer to：A. input data derives from multiple different RAM memories；B. in a chip period beat functional part input operation Number can be zero to multiple；The multi output refers to：A. the destination address of output data can be multiple different RAM storages Device；B. the output operand of functional part can be zero to multiple in a chip period beat；Specifically, the sychronisation Including：MIMO function part 6, itself and four corresponding, executable instruction sequences of RAM memory 1,2,3,4, function Part 1 can have multiple input operands, multiple output operands, with multiple pipelining-stages, by performing to multiple input operations Several computing response instructions；Pipeline controller 5, it is set in the chips, and the Pipeline controller is received in chip and instructed The instruction that transmitter is sent, input operand needed for analysis instruction and the validity for judging input operand, if required Data are all effective, then within a beat cycle of chip, send and ask to functional part, if the input operand has At least one is invalid, then Pipeline controller sends out a bubble instruction to functional part, and this do-nothing instruction does not interfere with functional part Internal data register；Multiple outputs request of this Pipeline controller receive capabilities part, and judge output request simultaneously Feasibility, if feasible, within a beat cycle of chip receive request, then again within a certain period of time will request forwarding To memory；If infeasible, blocking function parting lines, but when Pipeline controller blocking function part calculation function When, Pipeline controller parsing present instruction and data pre-fetch function are not blocked.The validity of wherein input operand refers to： When the input operand needed for the instruction that Pipeline controller is being parsed has been prefetched or is being read, it ensure that Sent in next chip period beat to functional part, then it is assumed that input operand is effective, otherwise it is assumed that invalid；It is defeated The feasibility for going out request refers to：The zero of functional part output to multiple output operands, they can be kept in by buffer or Person has been read into RAM to be written appropriate address, then it is assumed that output request is feasible, otherwise it is assumed that output request is infeasible；It is defeated Entering the validity of operand and the feasibility of output request is completed in a cycle beat of chip.In addition, this has There is the functional part 6 of multiple pipelining-stages one to be configured as one input instruction operation of caching and the buffer of data is buffered Device.In addition, Pipeline controller is while the validity of the input data needed for parsing present instruction, it is a plurality of after parsing to refer to The input operand needed for functional part is made, and prefetches data.

Fig. 2 shows the multiple-input and multiple-output processor pipeline data synchronization unit of another embodiment of the present invention Pie graph.In multiple-input and multiple-output processor pipeline data synchronization unit preferably, multiple-input and multiple-output is used as Functional part and use two input two output function parts as 9, it is corresponding with four RAM memories 1,2,3,4, but its Pipeline controller includes：The logic 7 (hereinafter referred to as " input logic ") of analysis instruction input operand validity, function part Part output request arbitration and processing logic 8 (hereinafter referred to as " output logic ") and the logic 10 of prefetched instruction input operand (hereinafter referred to as " prefetching input operand logic ").

The functional part 6 includes multiple pipelining-stages, is configured as performing response instruction by the input operand, And：The instruction that A, register enter after depositing the logical analysis by analysis instruction input operand validity；B, when When the output request of output data is output request arbitration and processing logic refusal, the parsing of pause functional part pipeline blocking refers to The logical order of input operand validity is made to enter.Two two output function parts 6 of input carry out computing using pipeline organization Quantized data is thought in processing, and this functional part can be used for processing neural network algorithm, image processing algorithm scheduling algorithm.

Input logic 7, input operand number and each input operand needed for confirmation functional part come from four Whether position and data are effective in individual RAM, that is, judge whether all input operands are pre-fetched into caching or whether can Functional part is input into a beat of chip period, and：If whether A, all input operands are pre-fetched into In caching or whether functional part can be input into a beat of chip period, then send instructions enter into function part Part.B, otherwise, sends the do-nothing instruction entered function part for not interfering with functional part state.Wherein send the purpose of do-nothing instruction It is not change the data register inside functional part to avoid traffic error.In addition, for the tool of the input logic 7 Body structure will be described below.

Logic 8 is exported, four RAM middle positions should be write by confirming the operand number and output operand of functional part output Put and corresponding address ram or caching whether writable data, and：If had in A, output operand corresponding to any one Memory either cache can not write data, then blocking function part pipelining-stage export.B, otherwise, multiple outputs are operated Number is respectively written into respective address ram or caching.After the pipelining-stage of 8 blocking function part of output logic 6 output, because work( The instruction that energy part 6 enters after caching the logical analysis by analysis instruction input operand validity, will not be immediately Block the instruction entered function part of input logic 7.The blocking function part pipelining-stage of output logic 8 output exceedes chip period one When more than individual beat, obstruction input logic 7 is instructed into entered function part.The functional part 6 obstruction input logic 7 instruct into Fashionable, the logic function of the prefetched instruction input operand is normally performed.In addition, for the concrete structure of the output logic 8 It will be described below.

Input operand logic 10 is prefetched, it is configured as some fingers after being calculated by domain of instruction and register information Operand address needed for order, and transmission prefetches the corresponding RAM of value request；And the instruction that caching is prefetched back.

In addition, Fig. 3 shows the structure of the input logic 7 in Fig. 2.Its inner control logic unit connects control outside 12 receipts And determine to send to the control instruction of arithmetic element according to internal buffer status, and determine to correspond required operation Number.Control instruction and input operand are sent out after DSR needed for inner control logic 12 confirms arithmetic element Arithmetic unit is given to realize the Synchronous control of streamline input data.In order to do not make arithmetic unit pipelining-stage because etc. it is to be entered Data and cause pause, data pre-fetching logic will with inner control logic interaction shift to an earlier date several clock cycle parse arithmetic unit Required input operand simultaneously sends data pre-fetching request to corresponding memory block.Therefore, patrolled in input data synchronization process There is fifo queue 13 to cache the data being prefetched in volume.Meanwhile, also one input data arbitration logic unit 14 determines input The data source of arithmetic unit is read in fifo queue or directly from data/address bus.

Fig. 4 shows the structure of the output logic 8 in Fig. 2.Its inner control logic unit 16 receives arithmetic unit transmission Control signal and according to internal register state confirmation arithmetic unit export operand number and output operand should write The memory block position entered and respective stored block address or caching whether writable data.Meanwhile, output logic 8 is according to output number According to whether can be written into reception judge whether block arithmetic unit streamline.Likewise, in order to avoid arithmetic unit is because of output number According to being blocked, the number of memory block can not be write in time by being cached in output data synchronization process logic 8 containing fifo queue 17 According to.And in order to safeguard output data uniformity, it is desirable to have a data arbitrated logic 18 determines that present clock period write-in is deposited The data source of storage is read in fifo queue or directly from bus.

In addition, the present invention also provides a kind of real using above-mentioned multiple-input and multiple-output processor pipeline data synchronization unit The synchronized method of existing multiple-input and multiple-output processor pipeline data, as shown in figure 5, comprising the following steps：

Step 1, Pipeline controller receives the instruction that instruction issue device is sent, and the input operand needed for analysis instruction is simultaneously Judge the validity of input operand, if the input operand is all effective, in a beat of chip period, send Instruction enters MIMO function part, if the input operand has at least one invalid, sends do-nothing instruction and enters The MIMO function part；

Step 3, the Pipeline controller receives multiple outputs request of the MIMO function part, and sentences The feasibility of the disconnected output request, if feasible, receives the output request within a beat cycle of chip, and one Fix time it is interior by it is described output request be forwarded to memory, if infeasible, block the MIMO function part Output；

Wherein, the validity of the input operand refers to, needed for the instruction that the Pipeline controller is being parsed Input operand has been prefetched or read, and ensure that and is sent in next chip period beat to described many Multi output functional part is inputted, then the input operand is effective, otherwise to be invalid；It is described output request feasibility be Refer to, storage can be kept in or can be read into the output operand of the MIMO function part output by buffer Device, then the output request is feasible, otherwise to be infeasible.

Fig. 6 shows the stream of the multiple-input and multiple-output processor pipeline method of data synchronization of another embodiment of the present invention Cheng Tu.Comprise the following steps：

Step 10, the input operand needed for input logic analysis instruction, and judge all input operands whether by It is prefetched in caching or whether can be input into the MIMO function part in a beat of chip period, If so, the MIMO function part is then sent instructions enter into, if it is not, it is more into the multi input then to send do-nothing instruction Output function part；

Step 30, exported described in output logic judgment storage address corresponding to all output operands of request or Be cache whether writable data, if so, the output operand then is respectively written into corresponding storage address or caching, If it is not, then blocking the output of the MIMO function part.

Step 40, input operand logic is prefetched by operand needed for domain of instruction and register information computations Location, and prefetch request, and the instruction that caching is prefetched back to the memory transmission for the input operand that is stored with.

The synchronized method of the multiple-input and multiple-output processor pipeline data of the present invention, wherein, the output logic, The output for blocking the MIMO function part exceedes more than a beat of chip period, then blocks instruction and enter institute State MIMO function part.

Hereinafter, the synchronization that several specific embodiments carry out the multiple-input and multiple-output processor pipeline data to the present invention is enumerated The method of change is described in detail.

In one example, all required data are instructed all to be prefetched, while functional part output data can be by Corresponding RAM either buffer buffers are write, then functional part can often receive one newly in one beat of chip period Instruction, functional part high-speed cruising, and this is a kind of splendid mode, but this is not only in this.

In another example, there is one not to be prefetched in the input operation data needed for instruction, analysis instruction input operation The logic of number validity sends read request so far RAM where peration data, in the case where this RAM occurs without failure, behaviour immediately Counting can be obtained in next chip period beat by functional part, therefore can make instruction entered function part without causing resistance Plug.

In another example, all required input operation data are instructed all effectively, but functional part exports number According to current chip cycle beat have one or can not all be written into corresponding RAM either buffer buffers, but Next chip period beat can write, because functional part can cache the instruction of an input, and functional part can still can be Next chip period beat receives instruction and data, does not cause obstruction.

In another example, the logic of prefetched instruction input operand is according to present instruction special domain and register information The location address of operand needed for some instructions after calculating.Present instruction input operand is read in functional part discord According to or write-in output peration data clash in the case of, its will prefetch after instruct needed for input operation data, this Kind of mode is particularly effective for neural network algorithm, because the data class of neutral net is various and spatial localities of data It is high.In this manner, MIMO function parting lines synchronising apparatus energy high degree of the present invention Reduce processor memory access expense, improve processor performance.

In addition, as shown in fig. 7, the present invention also provides a kind of same including above-mentioned multiple-input and multiple-output processor pipeline data Walk the data processing chip 500 of device 400.

Additionally, it should be noted that " instruction " of the present invention refers to computer program instructions, the computer program instructions Computer or other programmable data processing devices can be guided to work in a particular manner computer-readable is may be alternatively stored in deposit In reservoir so that the instruction being stored in the computer-readable memory, which is produced, includes the manufacture of command device, instruction dress Put the function of realizing and specified in one flow of flow chart or multiple flows and/or one square frame of block diagram or multiple square frames.

These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.

, but those skilled in the art once know basic creation although preferred embodiments of the present invention have been described Property concept, then can make other change and modification to these embodiments.So, appended claims are intended to be construed to include excellent Select embodiment and fall into having altered and changing for the scope of the invention.

Multiple-input and multiple-output processor pipeline data synchronization unit provided by the present invention and method are carried out above It is discussed in detail, specific case used herein is set forth to the principle and embodiment of the present invention, above example Illustrate the method and its core concept for being only intended to help to understand the present invention；Simultaneously for those of ordinary skill in the art, according to According to the thought of the present invention, it will change in specific embodiments and applications, in summary, this specification content It should not be construed as the limit value to the present invention.

Claims

1. a kind of multiple-input and multiple-output processor pipeline data synchronization unit, is arranged in the chip with instruction issue device, The sychronisation is in communication with each other with one or more memories, is characterised by, the sychronisation includes：

MIMO function part with multiple computing pipelining-stages, for by performing the fortune to multiple input operands Calculate response instruction；

The Pipeline controller being connected with the MIMO function part：

For receiving the instruction that the instruction issue device is sent, input operand needed for analysis instruction simultaneously judges input operand Validity, if the input operand is all effective, in a beat of chip period, send instructions enter into described many Input multi output functional part；If at least one is invalid for the input operand, do-nothing instruction is sent many into the multi input Output function part；And,

Multiple outputs for receiving the MIMO function part are asked, and judge the feasible of the output request Property, if feasible, the output request is received within a beat cycle of chip, and in a beat cycle of the chip Interior that the output request is forwarded into the memory, if judging, the output request, to be infeasible, blocks the multi input The output of multi output functional part；

With memory and the MIMO function part are connected prefetches input operand logic controller, based on Instruction required input operand address is calculated, and input operand is sent to the memory and prefetches request, and caching is prefetched The data returned；

Wherein, the validity of the input operand refers to, the input needed for the instruction that the Pipeline controller is being parsed Operand is prefetched or read by the input operand logic controller that prefetches, and ensure that in next core Sent in piece cycle beat to the MIMO function part, then the input operand is effective, otherwise to be invalid；

Wherein, the feasibility of the output request refers to, the output operand of the MIMO function part output is all It can be kept in by buffer or memory can be read into, then the output request is feasible, otherwise to be infeasible.

2. multiple-input and multiple-output processor pipeline data synchronization unit according to claim 1, it is characterised in that：

The Pipeline controller is the input operand needed for analysis instruction and to defeated by setting an input logic controller The validity for entering operand is judged；

The Pipeline controller is to export logic controller by setting one, receives the MIMO function part Multiple output requests are simultaneously judged the feasibility for exporting request.

3. multiple-input and multiple-output processor pipeline data synchronization unit according to claim 2, it is characterised in that

If the output logic controller blocks one of output more than chip period of the MIMO function part More than beat, then obstruction instruction enters the MIMO function part.

4. a kind of multiple-input and multiple-output processor pipeline data synchronization unit realizes multiple-input and multiple-output processor pipeline number According to synchronized method, it is characterised in that comprise the following steps：

Step 1, Pipeline controller receives the instruction that instruction issue device is sent, input operand and judgement needed for analysis instruction The validity of input operand, if the input operand is all effective, in a beat of chip period, sends instruction Into MIMO function part, if the input operand has at least one invalid, send do-nothing instruction and enter described MIMO function part；

Step 2, the MIMO function part, is performed to multiple input operands according to instruction by computing pipelining-stage Computing, and to the Pipeline controller send output request；

Step 3, the Pipeline controller receives multiple outputs request of the MIMO function part, and judges institute The feasibility of output request is stated, if feasible, the output request is received within a beat cycle of chip, and in the core The output request is forwarded to memory in one beat cycle of piece, if the output request is infeasible, obstruction is described The output of MIMO function part；

Wherein, the validity of the input operand refers to, the input needed for the instruction that the Pipeline controller is being parsed Operand has been prefetched or read, and ensure that and is sent in next chip period beat to the multi input Multi output functional part, then the input operand is effective, otherwise to be invalid；

5. the synchronized method of the multiple-input and multiple-output processor pipeline data according to right wants 4, it is characterised in that Further comprise following steps：

Step 10, input logic controller receives the instruction that the instruction issue device is sent, the input operation needed for analysis instruction The validity of input operand is counted and judges, if the input operand is all effective, in a beat of chip period, Send instructions enter into the MIMO function part；Or the input operand has at least one invalid, then sends Do-nothing instruction enters the MIMO function part；

Step 20, the MIMO function part, is performed to multiple input operations according to instruction by computing pipelining-stage Several computings, and send output request to output logic controller；

Step 30, output logic controller receives multiple outputs request of the MIMO function part, and judges institute The feasibility of output request is stated, if feasible, the output request is received within a beat cycle of chip, and in a timing It is interior that the output request is forwarded to the memory；If or the output request is infeasible, blocks the multi input The output of multi output functional part.

6. the synchronized method of the multiple-input and multiple-output processor pipeline data according to right wants 5, it is characterised in that Further comprise following steps：

Step 40, operand address needed for input operand logic controller computations is prefetched, and to the input that is stored with The memory of operand, which is sent, prefetches request, and the instruction that caching is prefetched back.

7. the synchronized method of the multiple-input and multiple-output processor pipeline data according to right wants 5, it is characterised in that

8. the multiple-input and multiple-output processor pipeline data synchronization unit described in a kind of any one including claims 1 to 3 Data processing chip.