CN105824604B - Multiple-input and multiple-output processor pipeline data synchronization unit and method - Google Patents
Multiple-input and multiple-output processor pipeline data synchronization unit and method Download PDFInfo
- Publication number
- CN105824604B CN105824604B CN201510824901.7A CN201510824901A CN105824604B CN 105824604 B CN105824604 B CN 105824604B CN 201510824901 A CN201510824901 A CN 201510824901A CN 105824604 B CN105824604 B CN 105824604B
- Authority
- CN
- China
- Prior art keywords
- input
- output
- instruction
- function part
- operand
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
- Multi Processors (AREA)
Abstract
The present invention discloses a kind of multiple-input and multiple-output processor pipeline data synchronization unit and method, and described device includes:MIMO function part with multiple computing pipelining-stages, is responded by the computing performed to input operand and instructed;Pipeline controller, receives instruction, the input operand needed for analysis instruction and the validity for judging input operand, if all effectively, sending instructions enter into functional part, if at least one is invalid, sends do-nothing instruction entered function part;The output request of Pipeline controller receive capabilities part, and judge feasibility, if feasible, the output request is then received within a beat cycle of chip, and output request is forwarded to memory within a certain period of time, and if infeasible, the output of blocking function part.Thus, the Pipeline Synchronisation problem of MIMO function part is not only solved, while processor memory access expense can be reduced largely, the memory access efficiency of processor is improved.
Description
Technical field
The invention belongs to computer electronic applications, more particularly to a kind of multiple-input and multiple-output processor pipeline data syn-chronization
Device and method.
Background technology
Data processing equipment uses the pipeline processor of the multiple pipelining-stages of cutting, with can perform a plurality of instruction simultaneously
Advantage, so as to increase processor host frequency, improves processor data handling capacity, improves processor operational performance.However, performing
During any specific instruction, the instruction all by by each pipelining-stage of processor, instructs last stream via processor
During water level, instruction execution terminates, and reflects the implementing result of instruction in this instruction modification processor data equipment.
The functional part that processor is come to the synchronized demand of data enters line number in execution phase need and memory
According to interaction.Occur in that data read conflict causes functional part to be obtained in a clock cycle beat of chip for example, working as
All input operands, then can be with the transmitting of delay instruction;Or the output data of functional part can not write target storage
In address or caching, streamline can be blocked.
Patent document 1 (Publication No. CN 103383641A) discloses a kind of multiple pipeline sychronisation, in the device
In, state of a control register transfer unit completes the state of a control register transfer between streamline, including assembly line A is to flowing water
Line B control register configuration operation, and status register read operation of the assembly line A to streamline B.Pipeline register is passed
Unit is sent to complete the register transfer of general register and pipeline configuration register in streamline B in assembly line A, with this flowing water
Line B obtains the register information needed for its operating.Synchronous logic unit is responsible for receiving dispatch command and blocks information, and according to
State of a control register, is that assembly line A produces halted signals, is that streamline B produces scheduling enable and transmission schedule information.But should
Device only supports the internuclear synchronization of multi-core CPU, and is the synchronization of single-input single-output processor pipeline data, and only passes through
Synchronous logic unit produces scheduling and enables with halted signals to realize the synchronization of multiple pipeline in core.Realizing multiple pipeline in core
Synchronization when, when the input operand of functional part or when exporting the number increase of operand, functional part and memory
Interaction is more frequent, and data collision is even more serious, as how a kind of efficient mode solves MIMO function part
It is more prominent that pipeline data synchronizes problem.
The content of the invention
It is an object of the present invention to solve at least the above and defect, there is provided a kind of multiple-input and multiple-output processor stream
Line data sychronisation and method, are solved at the control of MIMO function part pipelining-stage data synchronizing, raising
Manage device performance.
The multiple-input and multiple-output processor pipeline data synchronization unit of the present invention, is arranged on the core with instruction issue device
In piece, the sychronisation is in communication with each other with one or more memories, wherein, the sychronisation includes:
MIMO function part with multiple computing pipelining-stages, for by performing to multiple input operands
Computing response instruction;
The Pipeline controller being connected with the MIMO function part:
For receiving the instruction that the instruction issue device is sent, input operand needed for analysis instruction simultaneously judges input behaviour
The validity counted, if the input operand is all effective, in a beat of chip period, sends instructions enter into institute
State MIMO function part;If at least one is invalid for the input operand, sends do-nothing instruction and enter described how defeated
Enter multi output functional part;And,
For receive the MIMO function part multiple outputs ask, and judge it is described output request can
Row, if feasible, receives the output request within a beat cycle of chip, and in a beat week of the chip
The output request is forwarded to the memory in phase, the output request, to be infeasible, blocks described how defeated if judging
Enter the output of multi output functional part;
With memory and the MIMO function part are connected prefetches input operand logic, for calculating
Required input operand address is instructed, and input operand is sent to the memory and prefetches request, and caching is prefetched back
Data;
Wherein, the validity of the input operand refers to, needed for the instruction that the Pipeline controller is being parsed
Input operand is prefetched or read by the input operand logic that prefetches, and ensure that in next chip
Sent in cycle beat to the work(MIMO function part, then the input operand is effective, otherwise to be invalid;
Wherein, the feasibility of the output request refers to, the output operation of the MIMO function part output
The memory that number can be kept in or can be read into by buffer, then the output request is feasible, otherwise to be infeasible.
The multiple-input and multiple-output processor pipeline data synchronization unit of the present invention, wherein:
The Pipeline controller is by setting input operand needed for an input logic, analysis instruction and to input
The validity of operand is judged;
The Pipeline controller is to export logic by setting one, receives many of the MIMO function part
Individual output request is simultaneously judged the feasibility for exporting request.
The multiple-input and multiple-output processor pipeline data synchronization unit of the present invention, wherein:
If the output logic blocks a section of the output more than chip period of the MIMO function part
More than bat, then obstruction instruction enters the MIMO function part.
In addition, the present invention also provides a kind of real using above-mentioned multiple-input and multiple-output processor pipeline data synchronization unit
The synchronized method of existing multiple-input and multiple-output processor pipeline data, comprises the following steps:
Step 1, the Pipeline controller receives the instruction that instruction issue device is sent, the input operation needed for analysis instruction
The validity of input operand is counted and judges, if the input operand is all effective, in a beat of chip period,
The MIMO function part is sent instructions enter into, if the input operand has at least one invalid, sends empty
Instruction enters the MIMO function part;
Step 2, the MIMO function part, is performed by computing pipelining-stage according to instruction and multiple inputs is grasped
The computing counted, and send output request to the Pipeline controller;
Step 3, the Pipeline controller receives multiple outputs request of the MIMO function part, and sentences
The feasibility of the disconnected output request, if feasible, receives the output request within a beat cycle of chip, and in institute
State in beat cycle of chip and the output request is forwarded to memory, if the output request is infeasible, block
The output of the MIMO function part;
Wherein, the validity of the input operand refers to, needed for the instruction that the Pipeline controller is being parsed
Input operand is prefetched or read by the input operand logic that prefetches, and ensure that in next chip
Sent in cycle beat to the work(MIMO function part, then the input operand is effective, otherwise to be invalid;
Wherein, the feasibility of the output request refers to, the output operation of the MIMO function part output
The memory that number can be kept in or can be read into by buffer, then the output request is feasible, otherwise to be infeasible.
The present invention multiple-input and multiple-output processor pipeline data synchronized method, wherein, further comprise as
Lower step:
Step 10, input logic receives the instruction that the instruction issue device is sent, the input operand needed for analysis instruction
And judge the validity of input operand, and if the input operand is all effective, in a beat of chip period, hair
Instruction is sent to enter the MIMO function part;Or the input operand has at least one invalid, then send empty
Instruction enters the MIMO function part;
Step 20, the MIMO function part, is performed to multiple inputs according to instruction by computing pipelining-stage
The computing of operand, and send output request to output logic;
Step 30, output logic receives multiple outputs request of the MIMO function part, and judges described
The feasibility of request is exported, if feasible, the output request is received within a beat cycle of chip, and in certain time
It is interior that the output request is forwarded to the memory;If or the output request is infeasible, blocks the multi input many
The output of output function part.
The present invention multiple-input and multiple-output processor pipeline data synchronized method, wherein, further comprise as
Lower step:
Step 40, operand address needed for the instruction of input operand logical calculated is prefetched, and to the operand that is stored with
Memory send and prefetch request, and cache the instruction being prefetched back.
The synchronized method of the multiple-input and multiple-output processor pipeline data of the present invention, wherein, if the output is patrolled
The output for collecting the obstruction MIMO function part exceedes more than a beat of chip period, then obstruction, which is instructed, enters
The MIMO function part.
In addition, the present invention, which is also provided, a kind of includes the number of above-mentioned multiple-input and multiple-output processor pipeline data synchronization unit
According to process chip.
The effect of the present invention is as follows:
According to the synchronized device of MIMO function parting lines of the present invention, the parsing of input data is realized
With prefetch and to output request-replies of the functional part containing multiple output operands, it is how defeated therefore, it is possible to solve multi input
Go out the control of functional part pipelining-stage data synchronizing, improve processor performance.
Brief description of the drawings
Fig. 1 shows the composition of the multiple-input and multiple-output processor pipeline data synchronization unit of embodiments of the present invention
Figure.
Fig. 2 shows the multiple-input and multiple-output processor pipeline data synchronization unit of another embodiment of the present invention
Pie graph.
Fig. 3 shows the analysis instruction input behaviour in Fig. 2 multiple-input and multiple-output processor pipeline data synchronization unit
The structure chart of the logic (i.e. input logic 7) for validity of counting.
Fig. 4 shows that the functional part output in Fig. 2 multiple-input and multiple-output processor pipeline data synchronization unit please
Seek the structure chart of arbitration and processing logic (exporting logic 8).
Fig. 5 shows the flow chart of the multiple-input and multiple-output processor pipeline method of data synchronization of the present invention.
Fig. 6 shows the stream of the multiple-input and multiple-output processor pipeline method of data synchronization of another embodiment of the present invention
Cheng Tu.
Fig. 7 shows the data processing chip of embodiments of the present invention.
Description of reference numerals
1~4 ... RAM memory, 5 ... Pipeline controllers, 6 ... MIMO function parts, 7 ... input logics,
8 ... output logics, 9 ... two two output function parts of input, the data pre-fetching logic of 11 ... input logics 7,12 ... input logics 7
Inner control logic, the fifo queue of 13 ... input logics 7,14 ... input logics 7 data punching logic, 15 ... input patrol
It is volumes 7 data bus interface, the inner control logic of 16 ... output logics 8, the fifo queue of 17 ... output logics 8,18 ... defeated
The data for going out logic 8 are punched logic, the data bus interface of 19 ... output logics 8,100,300 ... input control signals, 200 ...
Output control signal, 400 ... multiple-input and multiple-output processor pipeline data synchronization units, 500 ... data processing chips.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, below in conjunction with accompanying drawing to the present invention's
Multiple-input and multiple-output processor pipeline data synchronization unit and method are further elaborated.It should be appreciated that this place
The specific embodiment of description only to explain the present invention, is not intended to limit the present invention.
The pie graph of the multiple-input and multiple-output processor pipeline data synchronization unit of Fig. 1 present invention.The multi input is
Refer to:A. input data derives from multiple different RAM memories;B. in a chip period beat functional part input operation
Number can be zero to multiple;The multi output refers to:A. the destination address of output data can be multiple different RAM storages
Device;B. the output operand of functional part can be zero to multiple in a chip period beat;Specifically, the sychronisation
Including:MIMO function part 6, itself and four corresponding, executable instruction sequences of RAM memory 1,2,3,4, function
Part 1 can have multiple input operands, multiple output operands, with multiple pipelining-stages, by performing to multiple input operations
Several computing response instructions;Pipeline controller 5, it is set in the chips, and the Pipeline controller is received in chip and instructed
The instruction that transmitter is sent, input operand needed for analysis instruction and the validity for judging input operand, if required
Data are all effective, then within a beat cycle of chip, send and ask to functional part, if the input operand has
At least one is invalid, then Pipeline controller sends out a bubble instruction to functional part, and this do-nothing instruction does not interfere with functional part
Internal data register;Multiple outputs request of this Pipeline controller receive capabilities part, and judge output request simultaneously
Feasibility, if feasible, within a beat cycle of chip receive request, then again within a certain period of time will request forwarding
To memory;If infeasible, blocking function parting lines, but when Pipeline controller blocking function part calculation function
When, Pipeline controller parsing present instruction and data pre-fetch function are not blocked.The validity of wherein input operand refers to:
When the input operand needed for the instruction that Pipeline controller is being parsed has been prefetched or is being read, it ensure that
Sent in next chip period beat to functional part, then it is assumed that input operand is effective, otherwise it is assumed that invalid;It is defeated
The feasibility for going out request refers to:The zero of functional part output to multiple output operands, they can be kept in by buffer or
Person has been read into RAM to be written appropriate address, then it is assumed that output request is feasible, otherwise it is assumed that output request is infeasible;It is defeated
Entering the validity of operand and the feasibility of output request is completed in a cycle beat of chip.In addition, this has
There is the functional part 6 of multiple pipelining-stages one to be configured as one input instruction operation of caching and the buffer of data is buffered
Device.In addition, Pipeline controller is while the validity of the input data needed for parsing present instruction, it is a plurality of after parsing to refer to
The input operand needed for functional part is made, and prefetches data.
Fig. 2 shows the multiple-input and multiple-output processor pipeline data synchronization unit of another embodiment of the present invention
Pie graph.In multiple-input and multiple-output processor pipeline data synchronization unit preferably, multiple-input and multiple-output is used as
Functional part and use two input two output function parts as 9, it is corresponding with four RAM memories 1,2,3,4, but its
Pipeline controller includes:The logic 7 (hereinafter referred to as " input logic ") of analysis instruction input operand validity, function part
Part output request arbitration and processing logic 8 (hereinafter referred to as " output logic ") and the logic 10 of prefetched instruction input operand
(hereinafter referred to as " prefetching input operand logic ").
The functional part 6 includes multiple pipelining-stages, is configured as performing response instruction by the input operand,
And:The instruction that A, register enter after depositing the logical analysis by analysis instruction input operand validity;B, when
When the output request of output data is output request arbitration and processing logic refusal, the parsing of pause functional part pipeline blocking refers to
The logical order of input operand validity is made to enter.Two two output function parts 6 of input carry out computing using pipeline organization
Quantized data is thought in processing, and this functional part can be used for processing neural network algorithm, image processing algorithm scheduling algorithm.
Input logic 7, input operand number and each input operand needed for confirmation functional part come from four
Whether position and data are effective in individual RAM, that is, judge whether all input operands are pre-fetched into caching or whether can
Functional part is input into a beat of chip period, and:If whether A, all input operands are pre-fetched into
In caching or whether functional part can be input into a beat of chip period, then send instructions enter into function part
Part.B, otherwise, sends the do-nothing instruction entered function part for not interfering with functional part state.Wherein send the purpose of do-nothing instruction
It is not change the data register inside functional part to avoid traffic error.In addition, for the tool of the input logic 7
Body structure will be described below.
Logic 8 is exported, four RAM middle positions should be write by confirming the operand number and output operand of functional part output
Put and corresponding address ram or caching whether writable data, and:If had in A, output operand corresponding to any one
Memory either cache can not write data, then blocking function part pipelining-stage export.B, otherwise, multiple outputs are operated
Number is respectively written into respective address ram or caching.After the pipelining-stage of 8 blocking function part of output logic 6 output, because work(
The instruction that energy part 6 enters after caching the logical analysis by analysis instruction input operand validity, will not be immediately
Block the instruction entered function part of input logic 7.The blocking function part pipelining-stage of output logic 8 output exceedes chip period one
When more than individual beat, obstruction input logic 7 is instructed into entered function part.The functional part 6 obstruction input logic 7 instruct into
Fashionable, the logic function of the prefetched instruction input operand is normally performed.In addition, for the concrete structure of the output logic 8
It will be described below.
Input operand logic 10 is prefetched, it is configured as some fingers after being calculated by domain of instruction and register information
Operand address needed for order, and transmission prefetches the corresponding RAM of value request;And the instruction that caching is prefetched back.
In addition, Fig. 3 shows the structure of the input logic 7 in Fig. 2.Its inner control logic unit connects control outside 12 receipts
And determine to send to the control instruction of arithmetic element according to internal buffer status, and determine to correspond required operation
Number.Control instruction and input operand are sent out after DSR needed for inner control logic 12 confirms arithmetic element
Arithmetic unit is given to realize the Synchronous control of streamline input data.In order to do not make arithmetic unit pipelining-stage because etc. it is to be entered
Data and cause pause, data pre-fetching logic will with inner control logic interaction shift to an earlier date several clock cycle parse arithmetic unit
Required input operand simultaneously sends data pre-fetching request to corresponding memory block.Therefore, patrolled in input data synchronization process
There is fifo queue 13 to cache the data being prefetched in volume.Meanwhile, also one input data arbitration logic unit 14 determines input
The data source of arithmetic unit is read in fifo queue or directly from data/address bus.
Fig. 4 shows the structure of the output logic 8 in Fig. 2.Its inner control logic unit 16 receives arithmetic unit transmission
Control signal and according to internal register state confirmation arithmetic unit export operand number and output operand should write
The memory block position entered and respective stored block address or caching whether writable data.Meanwhile, output logic 8 is according to output number
According to whether can be written into reception judge whether block arithmetic unit streamline.Likewise, in order to avoid arithmetic unit is because of output number
According to being blocked, the number of memory block can not be write in time by being cached in output data synchronization process logic 8 containing fifo queue 17
According to.And in order to safeguard output data uniformity, it is desirable to have a data arbitrated logic 18 determines that present clock period write-in is deposited
The data source of storage is read in fifo queue or directly from bus.
In addition, the present invention also provides a kind of real using above-mentioned multiple-input and multiple-output processor pipeline data synchronization unit
The synchronized method of existing multiple-input and multiple-output processor pipeline data, as shown in figure 5, comprising the following steps:
Step 1, Pipeline controller receives the instruction that instruction issue device is sent, and the input operand needed for analysis instruction is simultaneously
Judge the validity of input operand, if the input operand is all effective, in a beat of chip period, send
Instruction enters MIMO function part, if the input operand has at least one invalid, sends do-nothing instruction and enters
The MIMO function part;
Step 2, the MIMO function part, is performed by computing pipelining-stage according to instruction and multiple inputs is grasped
The computing counted, and send output request to the Pipeline controller;
Step 3, the Pipeline controller receives multiple outputs request of the MIMO function part, and sentences
The feasibility of the disconnected output request, if feasible, receives the output request within a beat cycle of chip, and one
Fix time it is interior by it is described output request be forwarded to memory, if infeasible, block the MIMO function part
Output;
Wherein, the validity of the input operand refers to, needed for the instruction that the Pipeline controller is being parsed
Input operand has been prefetched or read, and ensure that and is sent in next chip period beat to described many
Multi output functional part is inputted, then the input operand is effective, otherwise to be invalid;It is described output request feasibility be
Refer to, storage can be kept in or can be read into the output operand of the MIMO function part output by buffer
Device, then the output request is feasible, otherwise to be infeasible.
Fig. 6 shows the stream of the multiple-input and multiple-output processor pipeline method of data synchronization of another embodiment of the present invention
Cheng Tu.Comprise the following steps:
Step 10, the input operand needed for input logic analysis instruction, and judge all input operands whether by
It is prefetched in caching or whether can be input into the MIMO function part in a beat of chip period,
If so, the MIMO function part is then sent instructions enter into, if it is not, it is more into the multi input then to send do-nothing instruction
Output function part;
Step 20, the MIMO function part, is performed to multiple inputs according to instruction by computing pipelining-stage
The computing of operand, and send output request to output logic;
Step 30, exported described in output logic judgment storage address corresponding to all output operands of request or
Be cache whether writable data, if so, the output operand then is respectively written into corresponding storage address or caching,
If it is not, then blocking the output of the MIMO function part.
The present invention multiple-input and multiple-output processor pipeline data synchronized method, wherein, further comprise as
Lower step:
Step 40, input operand logic is prefetched by operand needed for domain of instruction and register information computations
Location, and prefetch request, and the instruction that caching is prefetched back to the memory transmission for the input operand that is stored with.
The synchronized method of the multiple-input and multiple-output processor pipeline data of the present invention, wherein, the output logic,
The output for blocking the MIMO function part exceedes more than a beat of chip period, then blocks instruction and enter institute
State MIMO function part.
Hereinafter, the synchronization that several specific embodiments carry out the multiple-input and multiple-output processor pipeline data to the present invention is enumerated
The method of change is described in detail.
In one example, all required data are instructed all to be prefetched, while functional part output data can be by
Corresponding RAM either buffer buffers are write, then functional part can often receive one newly in one beat of chip period
Instruction, functional part high-speed cruising, and this is a kind of splendid mode, but this is not only in this.
In another example, there is one not to be prefetched in the input operation data needed for instruction, analysis instruction input operation
The logic of number validity sends read request so far RAM where peration data, in the case where this RAM occurs without failure, behaviour immediately
Counting can be obtained in next chip period beat by functional part, therefore can make instruction entered function part without causing resistance
Plug.
In another example, all required input operation data are instructed all effectively, but functional part exports number
According to current chip cycle beat have one or can not all be written into corresponding RAM either buffer buffers, but
Next chip period beat can write, because functional part can cache the instruction of an input, and functional part can still can be
Next chip period beat receives instruction and data, does not cause obstruction.
In another example, the logic of prefetched instruction input operand is according to present instruction special domain and register information
The location address of operand needed for some instructions after calculating.Present instruction input operand is read in functional part discord
According to or write-in output peration data clash in the case of, its will prefetch after instruct needed for input operation data, this
Kind of mode is particularly effective for neural network algorithm, because the data class of neutral net is various and spatial localities of data
It is high.In this manner, MIMO function parting lines synchronising apparatus energy high degree of the present invention
Reduce processor memory access expense, improve processor performance.
In addition, as shown in fig. 7, the present invention also provides a kind of same including above-mentioned multiple-input and multiple-output processor pipeline data
Walk the data processing chip 500 of device 400.
Additionally, it should be noted that " instruction " of the present invention refers to computer program instructions, the computer program instructions
Computer or other programmable data processing devices can be guided to work in a particular manner computer-readable is may be alternatively stored in deposit
In reservoir so that the instruction being stored in the computer-readable memory, which is produced, includes the manufacture of command device, instruction dress
Put the function of realizing and specified in one flow of flow chart or multiple flows and/or one square frame of block diagram or multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
, but those skilled in the art once know basic creation although preferred embodiments of the present invention have been described
Property concept, then can make other change and modification to these embodiments.So, appended claims are intended to be construed to include excellent
Select embodiment and fall into having altered and changing for the scope of the invention.
Multiple-input and multiple-output processor pipeline data synchronization unit provided by the present invention and method are carried out above
It is discussed in detail, specific case used herein is set forth to the principle and embodiment of the present invention, above example
Illustrate the method and its core concept for being only intended to help to understand the present invention;Simultaneously for those of ordinary skill in the art, according to
According to the thought of the present invention, it will change in specific embodiments and applications, in summary, this specification content
It should not be construed as the limit value to the present invention.
Claims (8)
1. a kind of multiple-input and multiple-output processor pipeline data synchronization unit, is arranged in the chip with instruction issue device,
The sychronisation is in communication with each other with one or more memories, is characterised by, the sychronisation includes:
MIMO function part with multiple computing pipelining-stages, for by performing the fortune to multiple input operands
Calculate response instruction;
The Pipeline controller being connected with the MIMO function part:
For receiving the instruction that the instruction issue device is sent, input operand needed for analysis instruction simultaneously judges input operand
Validity, if the input operand is all effective, in a beat of chip period, send instructions enter into described many
Input multi output functional part;If at least one is invalid for the input operand, do-nothing instruction is sent many into the multi input
Output function part;And,
Multiple outputs for receiving the MIMO function part are asked, and judge the feasible of the output request
Property, if feasible, the output request is received within a beat cycle of chip, and in a beat cycle of the chip
Interior that the output request is forwarded into the memory, if judging, the output request, to be infeasible, blocks the multi input
The output of multi output functional part;
With memory and the MIMO function part are connected prefetches input operand logic controller, based on
Instruction required input operand address is calculated, and input operand is sent to the memory and prefetches request, and caching is prefetched
The data returned;
Wherein, the validity of the input operand refers to, the input needed for the instruction that the Pipeline controller is being parsed
Operand is prefetched or read by the input operand logic controller that prefetches, and ensure that in next core
Sent in piece cycle beat to the MIMO function part, then the input operand is effective, otherwise to be invalid;
Wherein, the feasibility of the output request refers to, the output operand of the MIMO function part output is all
It can be kept in by buffer or memory can be read into, then the output request is feasible, otherwise to be infeasible.
2. multiple-input and multiple-output processor pipeline data synchronization unit according to claim 1, it is characterised in that:
The Pipeline controller is the input operand needed for analysis instruction and to defeated by setting an input logic controller
The validity for entering operand is judged;
The Pipeline controller is to export logic controller by setting one, receives the MIMO function part
Multiple output requests are simultaneously judged the feasibility for exporting request.
3. multiple-input and multiple-output processor pipeline data synchronization unit according to claim 2, it is characterised in that
If the output logic controller blocks one of output more than chip period of the MIMO function part
More than beat, then obstruction instruction enters the MIMO function part.
4. a kind of multiple-input and multiple-output processor pipeline data synchronization unit realizes multiple-input and multiple-output processor pipeline number
According to synchronized method, it is characterised in that comprise the following steps:
Step 1, Pipeline controller receives the instruction that instruction issue device is sent, input operand and judgement needed for analysis instruction
The validity of input operand, if the input operand is all effective, in a beat of chip period, sends instruction
Into MIMO function part, if the input operand has at least one invalid, send do-nothing instruction and enter described
MIMO function part;
Step 2, the MIMO function part, is performed to multiple input operands according to instruction by computing pipelining-stage
Computing, and to the Pipeline controller send output request;
Step 3, the Pipeline controller receives multiple outputs request of the MIMO function part, and judges institute
The feasibility of output request is stated, if feasible, the output request is received within a beat cycle of chip, and in the core
The output request is forwarded to memory in one beat cycle of piece, if the output request is infeasible, obstruction is described
The output of MIMO function part;
Wherein, the validity of the input operand refers to, the input needed for the instruction that the Pipeline controller is being parsed
Operand has been prefetched or read, and ensure that and is sent in next chip period beat to the multi input
Multi output functional part, then the input operand is effective, otherwise to be invalid;
Wherein, the feasibility of the output request refers to, the output operand of the MIMO function part output is all
It can be kept in by buffer or memory can be read into, then the output request is feasible, otherwise to be infeasible.
5. the synchronized method of the multiple-input and multiple-output processor pipeline data according to right wants 4, it is characterised in that
Further comprise following steps:
Step 10, input logic controller receives the instruction that the instruction issue device is sent, the input operation needed for analysis instruction
The validity of input operand is counted and judges, if the input operand is all effective, in a beat of chip period,
Send instructions enter into the MIMO function part;Or the input operand has at least one invalid, then sends
Do-nothing instruction enters the MIMO function part;
Step 20, the MIMO function part, is performed to multiple input operations according to instruction by computing pipelining-stage
Several computings, and send output request to output logic controller;
Step 30, output logic controller receives multiple outputs request of the MIMO function part, and judges institute
The feasibility of output request is stated, if feasible, the output request is received within a beat cycle of chip, and in a timing
It is interior that the output request is forwarded to the memory;If or the output request is infeasible, blocks the multi input
The output of multi output functional part.
6. the synchronized method of the multiple-input and multiple-output processor pipeline data according to right wants 5, it is characterised in that
Further comprise following steps:
Step 40, operand address needed for input operand logic controller computations is prefetched, and to the input that is stored with
The memory of operand, which is sent, prefetches request, and the instruction that caching is prefetched back.
7. the synchronized method of the multiple-input and multiple-output processor pipeline data according to right wants 5, it is characterised in that
If the output logic controller blocks one of output more than chip period of the MIMO function part
More than beat, then obstruction instruction enters the MIMO function part.
8. the multiple-input and multiple-output processor pipeline data synchronization unit described in a kind of any one including claims 1 to 3
Data processing chip.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510824901.7A CN105824604B (en) | 2015-11-24 | 2015-11-24 | Multiple-input and multiple-output processor pipeline data synchronization unit and method |
PCT/CN2016/086100 WO2017088456A1 (en) | 2015-11-24 | 2016-06-17 | Pipeline data synchronization apparatus and method for multi-input multi-output processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510824901.7A CN105824604B (en) | 2015-11-24 | 2015-11-24 | Multiple-input and multiple-output processor pipeline data synchronization unit and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105824604A CN105824604A (en) | 2016-08-03 |
CN105824604B true CN105824604B (en) | 2017-08-29 |
Family
ID=56513438
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510824901.7A Active CN105824604B (en) | 2015-11-24 | 2015-11-24 | Multiple-input and multiple-output processor pipeline data synchronization unit and method |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105824604B (en) |
WO (1) | WO2017088456A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108733585B (en) * | 2017-04-17 | 2022-05-13 | 伊姆西Ip控股有限责任公司 | Cache system and related method |
CN110134630B (en) * | 2019-05-27 | 2023-02-10 | 西安电子工程研究所 | Design method of multi-input single-output transmission cache controller |
CN111258657B (en) * | 2020-01-23 | 2020-11-20 | 上海燧原智能科技有限公司 | Pipeline control method and related equipment |
CN112559053B (en) * | 2020-12-21 | 2022-06-03 | 清华大学 | Data synchronization processing method and device for reconfigurable processor |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3147432B2 (en) * | 1991-10-09 | 2001-03-19 | 株式会社日立製作所 | Pipeline processing equipment |
US5987620A (en) * | 1997-09-19 | 1999-11-16 | Thang Tran | Method and apparatus for a self-timed and self-enabled distributed clock |
US8074053B2 (en) * | 2006-07-11 | 2011-12-06 | Harman International Industries, Incorporated | Dynamic instruction and data updating architecture |
CN103383641A (en) * | 2013-04-19 | 2013-11-06 | 中国科学院自动化研究所 | Synchronous device for multi-assembly lines |
CN104915322B (en) * | 2015-06-09 | 2018-05-01 | 中国人民解放军国防科学技术大学 | A kind of hardware-accelerated method of convolutional neural networks |
-
2015
- 2015-11-24 CN CN201510824901.7A patent/CN105824604B/en active Active
-
2016
- 2016-06-17 WO PCT/CN2016/086100 patent/WO2017088456A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2017088456A1 (en) | 2017-06-01 |
CN105824604A (en) | 2016-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102473112B (en) | The pre-filled method of cache, product and system about thread migration | |
CN105393240B (en) | Method and apparatus with the asynchronous processor for aiding in asynchronous vector processor | |
CN103647807B (en) | A kind of method for caching information, device and communication equipment | |
CN105824604B (en) | Multiple-input and multiple-output processor pipeline data synchronization unit and method | |
CN1316392C (en) | Prioritized bus request scheduling mechanism for processing devices | |
CN102822810B (en) | The integrated circuit passed on for the key word of adaptive prediction and method | |
US6594713B1 (en) | Hub interface unit and application unit interfaces for expanded direct memory access processor | |
CN103543954B (en) | A kind of data storage and management method and device | |
CN102934076B (en) | Instruction issue and control device and method | |
CN106155814B (en) | A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method | |
US20160275015A1 (en) | Computing architecture with peripherals | |
CN102207916A (en) | Instruction prefetch-based multi-core shared memory control equipment | |
US10884756B2 (en) | System and method for variable lane architecture | |
US6654834B1 (en) | Method and apparatus for data transfer employing closed loop of memory nodes | |
KR20140084155A (en) | Multi-core interconnect in a network processor | |
KR20130065957A (en) | Control apparatus of memory and control method thereof, and recording medium storing program for executing method of the same in computer | |
JP2021528929A (en) | Data transit gateway | |
CN106575220A (en) | Multiple clustered very long instruction word processing core | |
US8667199B2 (en) | Data processing apparatus and method for performing multi-cycle arbitration | |
EP1083487A2 (en) | Configuration bus reconfigurable/reprogrammable interface for expanded direct memory access processor | |
US6801985B1 (en) | Data bus using synchronous fixed latency loop including read address and data busses and write address and data busses | |
CN111653317A (en) | Gene comparison accelerating device, method and system | |
CN110235113B (en) | Memory controller and system and method for data processing | |
JPH09244951A (en) | Computer system for irregular execution of instructions | |
EP1069511B1 (en) | Data Transfer Controller with Plural Ports |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |