WO2017088456A1 - Pipeline data synchronization apparatus and method for multi-input multi-output processor - Google Patents

Pipeline data synchronization apparatus and method for multi-input multi-output processor Download PDF

Info

Publication number
WO2017088456A1
WO2017088456A1 PCT/CN2016/086100 CN2016086100W WO2017088456A1 WO 2017088456 A1 WO2017088456 A1 WO 2017088456A1 CN 2016086100 W CN2016086100 W CN 2016086100W WO 2017088456 A1 WO2017088456 A1 WO 2017088456A1
Authority
WO
WIPO (PCT)
Prior art keywords
output
input
instruction
pipeline
operand
Prior art date
Application number
PCT/CN2016/086100
Other languages
French (fr)
Chinese (zh)
Inventor
李震
刘少礼
张士锦
罗韬
钱诚
陈云霁
陈天石
Original Assignee
中国科学院计算技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算技术研究所 filed Critical 中国科学院计算技术研究所
Publication of WO2017088456A1 publication Critical patent/WO2017088456A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • G06F9/3871Asynchronous instruction pipeline, e.g. using handshake signals between stages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores

Definitions

  • the invention belongs to the field of computer electronics, and in particular relates to a multi-input multi-output processor pipeline data synchronization device and method.
  • the data processing device adopts a pipeline processor that divides multiple pipeline stages, and has the advantage of being able to execute multiple instructions at the same time, thereby increasing the processor main frequency, improving the processor data throughput, and improving the processor computing performance.
  • the instruction will pass through the various pipeline stages of the processor.
  • the execution of the instruction ends, at which point the instruction modifies the processor data device to reflect the execution result of the instruction. .
  • the need for data synchronization requires that the processor's functional components interact with the memory during the instruction execution phase. For example, when a data read conflict occurs that causes the feature to fail to get all input operands within one clock cycle of the chip, the transmission of the instruction may be delayed; or the output data of the feature may not be written to the target storage address or cache, Can block the pipeline.
  • Patent Document 1 (Publication No. CN 103383641 A) discloses a multi-pipeline synchronization device in which a control status register transfer unit completes control state register transfer between pipelines, including a control register configuration operation of pipeline A to pipeline B. And the status register read operation of pipeline A on pipeline B.
  • the pipeline register transfer unit completes the register transfer of the normal register in pipeline A and the pipeline configuration register in pipeline B, and the pipeline B obtains the register information required for its operation.
  • the synchronization logic unit is responsible for receiving the scheduling instruction and the blocking information, and generating a pause signal for the pipeline A according to the control status register, and generating scheduling enable and delivery scheduling information for the pipeline B.
  • the device only supports the inter-core synchronization of the multi-core CPU, and synchronizes the pipeline data of the single-input single-output processor, and synchronizes the multi-pipeline in the core by generating the scheduling enable and the pause signal only by the synchronization logic unit.
  • the device When synchronizing multiple pipelines in a core, when the number of input operands or output operands of a functional component increases, the interaction between the functional component and the memory is more frequent, the data conflict is more serious, and how to solve it in an efficient manner.
  • the problem of pipeline data synchronization for multi-input and multi-output features is even more pronounced.
  • the object of the present invention is to solve at least the above problems and drawbacks, and to provide a multi-input multi-output processor pipeline data synchronization device and method, which solves the multi-input and multi-output functional component flow-level data synchronization control and improves processor performance.
  • the multi-input multi-output processor pipeline data synchronizing apparatus of the present invention is disposed in a chip having an instruction transmitter that communicates with one or more memories, wherein the synchronization apparatus includes:
  • a multiple input multiple output function having a plurality of operational pipeline levels for responding to instructions by performing operations on a plurality of input operands
  • a pipeline controller connected to the MIMO function :
  • Receiving an instruction sent by the instruction transmitter parsing an input operand required by the instruction, and determining the validity of the input operand. If the input operand is all valid, sending the instruction into a beat of the chip cycle The multiple input multiple output function; if at least one of the input operands is invalid, sending a null instruction to enter the multiple input multiple output function; and,
  • Prefetch input operand logic coupled to the memory and the MIMO function for calculating an input operand address required for the instruction, and transmitting an input operand prefetch request to the memory, and the cache is prefetched Back data
  • the validity of the input operand means that the input operand required by the instruction that the pipeline controller is parsing has been logically prefetched by the prefetch input operand or is being read, and can be guaranteed in the next The input operand is valid when the chip cycle is sent to the multiple input multiple output function, otherwise it is invalid;
  • the feasibility of the output request refers to that the output operand output by the multiple input multiple output function component can be temporarily stored by the buffer or can be read into the memory, and the output request is feasible, otherwise Not feasible.
  • the pipeline controller is configured to set an input logic to parse an input operand required by the instruction and Judge the validity of the input operand;
  • the pipeline controller receives a plurality of output requests of the MIMO function and determines the feasibility of the output request by setting an output logic.
  • the blocking instruction enters the multiple input multiple output function.
  • the present invention further provides a method for synchronizing pipeline data of a multiple input multiple output processor using the above-described multi-input multi-output processor pipeline data synchronization device, comprising the following steps:
  • Step 1 The pipeline controller receives an instruction sent by the instruction transmitter, parses the input operand required by the instruction, and determines the validity of the input operand. If the input operand is all valid, it is within one beat of the chip cycle. Sending an instruction to enter the multiple input multiple output function component, if at least one of the input operands is invalid, sending a null instruction to enter the multiple input multiple output function component;
  • Step 2 the multiple input multiple output function component performs an operation on a plurality of input operands through a computing pipeline level according to an instruction, and sends an output request to the pipeline controller;
  • Step 3 the pipeline controller receives a plurality of output requests of the multiple input multiple output function, and determines the feasibility of the output request, and if possible, receives the output request within one cycle of the chip, And forwarding the output request to a memory during a beat cycle of the chip, and blocking the output of the multiple input multiple output function component if the output request is not feasible;
  • the validity of the input operand means that the input operand required by the instruction that the pipeline controller is parsing has been logically prefetched by the prefetch input operand or is being read, and can be guaranteed in the next Sending to the multi-input multi-output function component during the chip cycle, the input operand is valid, otherwise it is invalid;
  • the feasibility of the output request refers to that the output operand output by the multiple input multiple output function component can be temporarily stored by the buffer or can be read into the memory, and the output request is feasible, otherwise Not feasible.
  • the method for synchronizing pipeline data of a multiple input multiple output processor of the present invention further includes the following steps:
  • Step 10 The input logic receives the instruction sent by the instruction transmitter, parses the input operand required by the instruction, and determines the validity of the input operand. If the input operand is all valid, within one beat of the chip cycle, Sending an instruction to enter the multiple input multiple output function; or the input If at least one of the operands is invalid, then a null instruction is sent to enter the multiple input multiple output function;
  • Step 20 The multi-input multi-output function component performs an operation on a plurality of input operands through a computational pipeline level according to an instruction, and sends an output request to the output logic;
  • Step 30 The output logic receives the plurality of output requests of the multiple input multiple output function component, and determines the feasibility of the output request, and if possible, receives the output request within one beat cycle of the chip, and is certain The output request is forwarded to the memory within a time period; or if the output request is not feasible, blocking the output of the multiple input multiple output function.
  • the method for synchronizing pipeline data of a multiple input multiple output processor of the present invention further includes the following steps:
  • Step 40 Prefetch the input operand logic to calculate the required operand address, and send a prefetch request to the memory storing the operand, and cache the prefetched instruction.
  • the method for synchronizing pipeline data of a multiple input multiple output processor according to the present invention, wherein if the output logic blocks the output of the multiple input multiple output function component from exceeding one beat of a chip cycle, the blocking instruction enters the multiple input multiple Output features.
  • the present invention also provides a data processing chip including the above-described multi-input multi-output processor pipeline data synchronizing apparatus.
  • FIG. 1 is a block diagram showing a configuration of a multiplexed pipeline data synchronization apparatus of an embodiment of the present invention.
  • FIG. 2 is a block diagram showing a configuration of a multiplexer pipeline data synchronization apparatus according to another embodiment of the present invention.
  • FIG. 3 is a block diagram showing the logic for parsing the validity of an instruction input operand (ie, input logic 7) in the multiple input multiple output processor pipeline data synchronization apparatus of FIG. 2.
  • FIG. 4 is a block diagram showing the functional component output request arbitration and processing logic (i.e., output logic 8) in the multiple input multiple output processor pipeline data synchronization apparatus of FIG.
  • Figure 5 is a flow chart showing the method of pipeline data synchronization of the multiple input multiple output processor of the present invention.
  • FIG. 6 is a flow chart showing a method of pipeline data synchronization of a multiple input multiple output processor according to another embodiment of the present invention.
  • FIG. 7 shows a data processing chip of an embodiment of the present invention.
  • 1 ⁇ 4...RAM memory 5...pipeline controller, 6...multiple input and multiple output function, 7...input logic,8...output logic, 9...two input and two outputs Function, 11...data prefetch logic for input logic 7, 12...internal control logic for input logic 7,13...data queue for input logic 7, 14...data blank for input logic 7 Logic, 15...data bus interface of input logic 7,16...internal control logic of output logic 8,17...output FIFO queue of logic 8, 18...data blanking logic of output logic 8, 19...output logic 8 data bus interface, 100,300... input control signal, 200...output control signal, 400...multiple input multiple output processor pipeline data synchronization device, 500...data Processing the chip.
  • Fig. 1 is a block diagram showing a multi-input multi-output processor pipeline data synchronizing apparatus of the present invention.
  • the multiple input means : a. the input data is derived from a plurality of different RAM memories; b. the input operands of the functional components in one chip cycle can be zero to more; the multiple outputs are: a.
  • the destination address of the data may be a plurality of different RAM memories; b.
  • the number of output operands of the function within one chip cycle beat may be zero to more; in particular, the synchronization device comprises: a multiple input multiple output function component 6, Corresponding to four RAM memories 1, 2, 3, 4, a sequence of instructions can be executed, and the functional component 1 can have multiple input operands, multiple output operands, multiple pipeline stages, and multiple input operations by execution.
  • the number of operations is responsive to the instruction; the pipeline controller 5 is disposed in the chip, the pipeline controller receives the instruction sent by the instruction transmitter in the chip, parses the input operand required by the instruction, and determines the validity of the input operand if The required data is valid, and a request is sent to the function component during one cycle of the chip, and if at least one of the input operands is invalid, the pipeline control
  • the device sends a bubble command to the function, the empty command does not affect the data register inside the function; at the same time, the pipeline controller receives multiple output requests of the feature and determines the feasibility of the output request, if feasible, on the chip Receive a request within one beat cycle, and then forward the request to memory within a certain period of time; if not, block the feature pipeline, but when the pipeline controller When the function of the function is blocked, the pipeline controller is not blocked from parsing the current instruction and the data prefetch function.
  • the validity of the input operand means that the input operand required by the instruction that the pipeline controller is parsing has been prefetched or is being read, and can be guaranteed to be sent to the function within the next chip cycle beat.
  • the operand is valid, otherwise it is considered invalid;
  • the feasibility of the output request is: zero or more output operands output by the function, which can be temporarily buffered by the buffer or read into the corresponding address of the RAM to be written. , the output request is considered feasible, otherwise the output request is considered not feasible; the validity of the input operand and the feasibility of the output request are all completed within one cycle of the chip.
  • the functional component 6 having a plurality of pipeline stages has a buffer buffer configured to buffer an input instruction operation and data.
  • the pipeline controller analyzes the input operands required by the plurality of instruction functions and prefetches the data while parsing the validity of the input data required by the current instruction.
  • FIG. 2 is a block diagram showing a configuration of a multiplexer pipeline data synchronization apparatus according to another embodiment of the present invention.
  • MIMO pipeline data synchronizing apparatus As the MIMO function, a two-input and two-output function is employed as 9, which corresponds to four RAM memories 1, 2, 3, and 4. , but its pipeline controller includes: logic 7 for analyzing the validity of the instruction input operand (hereinafter referred to as "input logic”), function output request arbitration and processing logic 8 (hereinafter referred to as "output logic”), and prefetch instruction Enter the logic 10 of the operand (hereafter referred to as "prefetch input operand logic").
  • the functional component 6 includes a plurality of pipeline stages configured to execute a response instruction by the input operand, and: A, the register registers an instruction that is entered after logical parsing of the validity of the parsing instruction input operand; When the output request of the output data is rejected by the output request arbitration and processing logic, the stall function block pipeline enters the logic instruction that parses the validity of the instruction input operand.
  • the two-input and two-output functional component 6 uses a pipeline structure for arithmetic processing to quantize data. This functional component can be used to process algorithms such as neural network algorithms and image processing algorithms.
  • Enter logic 7 confirm the number of input operands required by the feature, and each input operand comes from the location in the four RAMs and whether the data is valid, that is, whether all input operands are prefetched into the cache or can be One beat of the chip cycle is input to the function, and: A. If all input operands are prefetched into the cache or can be input to the function within one beat of the chip cycle, the command is sent to the feature . B. Otherwise, an empty command that does not affect the state of the feature enters the feature. The purpose of sending an empty instruction is to not change the function part. The internal data registers are protected from data flow errors. In addition, the specific structure of the input logic 7 will be explained below.
  • Output logic 8 confirm the number of operands output by the function and the output operand should be written to the four RAM locations and whether the corresponding RAM address or cache can write data, and: A, if any of the output operands The corresponding memory or cache unwritable data blocks the functional stream output. B. Otherwise, multiple output operands are written to their respective RAM addresses or caches.
  • the output logic 8 blocks the flow level output of the functional component 6, the functional component 6 can cache an instruction that has been logically parsed after parsing the validity of the input operand, and does not immediately block the input logic 7 from entering the functional component.
  • the output logic 8 blocks the function stream level output beyond one cycle of the chip cycle and blocks the input logic 7 command into the feature. When the function component 6 blocks the input logic 7 instruction from entering, the logic function of the prefetch instruction input operand is normally executed. Additionally, the specific structure for this output logic 8 will be explained below.
  • Prefetching input operand logic 10 is configured to calculate an operand address required for a number of subsequent instructions through the instruction field and register information, and to send a prefetch request value corresponding to the RAM; and to cache the prefetched instruction.
  • FIG. 3 shows the structure of the input logic 7 in FIG. 2.
  • the internal control logic unit receives the external control and determines the control command sent to the arithmetic unit according to the internal register state, and determines the operand required corresponding thereto.
  • the control command and the input operand are sent to the arithmetic component to implement synchronization control of the pipeline input data.
  • the data prefetch logic will interpret the input operation operand required by the arithmetic component by several clock cycles in advance with the internal control logic and send data prefetch to the corresponding storage block. request.
  • the FIFO queue 13 buffers the prefetched data.
  • an input data arbitration logic unit 14 determines whether the data of the input arithmetic component is derived from the FIFO queue or directly from the data bus.
  • Figure 4 shows the structure of the output logic 8 of Figure 2.
  • the internal control logic unit 16 receives the control signal transmitted by the arithmetic unit and confirms the number of operands output by the arithmetic unit according to the internal register status and the position of the memory block to which the output operand should be written, and whether the corresponding memory block address or the cache can write data. .
  • the output logic 8 determines whether to block the computation component pipeline in accordance with whether the output data can be written to and received.
  • the output data is synchronized.
  • the processing logic 8 contains data in which the FIFO queue 17 buffer cannot be written to the memory block in time.
  • a data arbitration logic 18 is required to determine that the current clock cycle writes the stored data from the FIFO queue or directly from the bus.
  • the present invention further provides a method for synchronizing pipeline data of a multiple input multiple output processor using the above-described multi-input multi-output processor pipeline data synchronization device. As shown in FIG. 5, the method includes the following steps:
  • Step 1 The pipeline controller receives an instruction sent by the instruction transmitter, parses the input operand required by the instruction, and determines the validity of the input operand. If the input operand is all valid, it is within one beat of the chip cycle. Sending an instruction to enter the multiple input multiple output function component, if at least one of the input operands is invalid, sending a null instruction to enter the multiple input multiple output function component;
  • Step 2 the multiple input multiple output function component performs an operation on a plurality of input operands through a computing pipeline level according to an instruction, and sends an output request to the pipeline controller;
  • Step 3 the pipeline controller receives a plurality of output requests of the multiple input multiple output function, and determines the feasibility of the output request, and if possible, receives the output request within one cycle of the chip, And forwarding the output request to the memory within a certain time, and blocking the output of the multiple input multiple output function component if not feasible;
  • the validity of the input operand means that the input operand required by the instruction that the pipeline controller is parsing has been prefetched or is being read, and can be guaranteed to be sent to the next chip cycle tempo
  • the feasibility of the output request means that the output operand output by the multi-input multi-output function can be temporarily buffered
  • the output request is feasible if it is stored or can be read in, otherwise it is not feasible.
  • FIG. 6 is a flow chart showing a method of pipeline data synchronization of a multiple input multiple output processor according to another embodiment of the present invention. Including the following steps:
  • Step 10 input an input operand required by the logic parsing instruction, and determine whether all input operands have been prefetched into the cache or can be input to the MIMO function within one beat of the chip cycle. If yes, sending an instruction to enter the multiple input multiple output function component; if not, sending an empty instruction to enter the multiple input multiple output function component;
  • Step 20 The multi-input multi-output function component performs an operation on a plurality of input operands through a computational pipeline level according to an instruction, and sends an output request to the output logic;
  • Step 30 the output logic determines whether the memory address corresponding to all the output operands of the output request or whether the cache can write data, and if so, writes the output operand to the corresponding memory address or cache, respectively. No, the output of the MIMO function is blocked.
  • the method for synchronizing pipeline data of a multiple input multiple output processor of the present invention further includes the following steps:
  • Step 40 Prefetch input operand logic calculates an operand address required by the instruction through the instruction field and the register information, and sends a prefetch request to the memory storing the operand, and caches the prefetched instruction.
  • the method for synchronizing pipeline data of a multiple input multiple output processor wherein the output logic blocks an output of the multiple input multiple output function component to exceed one beat of a chip cycle, and the blocking instruction enters the multiple input multiple Output features.
  • the feature output data can be written to the corresponding RAM or buffer buffer, and the feature can accept a new instruction every one beat of the chip cycle.
  • the feature runs at high speed and this is an excellent way, but it's not just that.
  • one of the input operation data required by the instruction is not prefetched, and the logic that parses the validity of the instruction input operand immediately sends a read request to the RAM where the operation data is located, and operates in the case where the RAM does not malfunction.
  • the number can be acquired by the function at the next chip cycle, so the command can be entered into the feature without blocking.
  • all required input operation data is valid, but the feature output data has one or none of the current chip cycle beats or can be written to the corresponding RAM or buffer buffer, but at the next chip cycle. It can be written. Since the function can buffer an input command, the function can still receive commands and data in the next chip cycle without causing blocking.
  • the logic of the prefetch instruction input operand calculates the location address of the operands required for the subsequent instructions based on the current instruction specific field and the register information.
  • the function does not conflict with reading the current instruction input operation data or writing the output operation data, it will prefetch the input operation data required after the instruction, which is especially effective for the neural network algorithm because the neural network The data of the network is numerous and the spatial locality of the data is extremely high. In this manner, the multi-input and multi-output component pipeline synchronization device of the present invention can greatly reduce processor memory access overhead and improve processor performance.
  • the present invention also provides a data processing chip 500 including the above-described multiple input multiple output processor pipeline data synchronization device 400.
  • the "instructions" of the present invention refer to computer program instructions, which may also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner.
  • the instructions stored in the computer readable memory are caused to produce an article of manufacture comprising an instruction device that implements the functions specified in one or more blocks of the flow or in a flow or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • multi-input and multi-output function pipeline synchronization device of the present invention parsing and prefetching of input data and output request response of a plurality of output operands of the functional component are realized, thereby being able to solve the multi-input and multi-output functional component flow level data. Synchronous control to improve processor performance.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Multi Processors (AREA)

Abstract

A pipeline data synchronization apparatus and method for a multi-input multi-output processor. The apparatus comprises: a multi-input multi-output function unit (6) having multiple operating pipeline levels for executing an operation on input operands to respond to an instruction; and a pipeline controller (5) for receiving an instruction, parsing the input operands required by the instruction, determining validity of the input operands, sending the instruction to the multi-input multi-output function unit if the input operands are all valid, and sending a dummy instruction to the multi-input multi-output function unit if at least one input operand is invalid; the pipeline controller (5) receives an output request of the multi-input multi-output function unit (6), determines feasibility thereof, receives, if the output request is feasible, the output request within a takt period of a chip and forwards the output request to memories (1, 2, 3, 4) within a certain time, and stops the output of the multi-input multi-output function unit (6) if the output request is infeasible. Thus, the invention not only solves a pipeline synchronization problem of the multi-input multi-output function unit (6), but also greatly reduces memory access costs of a processor and improves memory access efficiency of the processor.

Description

多输入多输出处理器流水线数据同步装置及方法Multi-input multi-output processor pipeline data synchronization device and method 技术领域Technical field
本发明属于计算机电子领域,特别涉及一种多输入多输出处理器流水线数据同步装置及方法。The invention belongs to the field of computer electronics, and in particular relates to a multi-input multi-output processor pipeline data synchronization device and method.
背景技术Background technique
数据处理设备采用切分多个流水级的流水线处理器,具有能同时执行多条指令的优势,从而可增加处理器主频,提高处理器数据吞吐量,提高处理器运算性能。然而,在执行任何特定指令期间,该指令都将经过处理器的各个流水级,指令经由处理器的最后一个流水级时,指令执行结束,在该点指令修改处理器数据设备反映指令的执行结果。The data processing device adopts a pipeline processor that divides multiple pipeline stages, and has the advantage of being able to execute multiple instructions at the same time, thereby increasing the processor main frequency, improving the processor data throughput, and improving the processor computing performance. However, during execution of any particular instruction, the instruction will pass through the various pipeline stages of the processor. When the instruction passes through the last pipeline stage of the processor, the execution of the instruction ends, at which point the instruction modifies the processor data device to reflect the execution result of the instruction. .
对数据同步需求源于处理器的功能部件在指令执行阶段需和存储器进行数据交互。例如,当出现了数据读冲突导致功能部件不能在芯片的一个时钟周期节拍内获得所有的输入操作数,则可以延迟指令的发射;或者功能部件的输出数据不能写入目标存储地址或者缓存内,可以阻塞流水线。The need for data synchronization requires that the processor's functional components interact with the memory during the instruction execution phase. For example, when a data read conflict occurs that causes the feature to fail to get all input operands within one clock cycle of the chip, the transmission of the instruction may be delayed; or the output data of the feature may not be written to the target storage address or cache, Can block the pipeline.
专利文件1(公开号为CN 103383641A)公开了一种多流水线同步装置,在该装置中,控制状态寄存器传送单元完成流水线之间的控制状态寄存器传送,包括流水线A对流水线B的控制寄存器配置操作,以及流水线A对流水线B的状态寄存器读操作。流水线寄存器传送单元完成流水线A中普通寄存器与流水线B中流水线配置寄存器的寄存器传送,以此流水线B得到其运转所需的寄存器信息。同步逻辑单元负责接收调度指令以及阻塞信息,并根据控制状态寄存器,为流水线A产生停顿信号,为流水线B产生调度使能及传递调度信息。但该装置仅支持多核CPU的核间同步,且为单输入单输出处理器流水线数据的同步,并且仅通过同步逻辑单元产生调度使能和停顿信号来实现核内多流水线的同步。在实现核内多流水线的同步时,当功能部件的输入操作数或者输出操作数的个数增加时,功能部件和存储器的交互更加频繁,数据冲突更为严重,如何以一种高效的方式解决多输入多输出功能部件的流水线数据同步化问题更为突出。 Patent Document 1 (Publication No. CN 103383641 A) discloses a multi-pipeline synchronization device in which a control status register transfer unit completes control state register transfer between pipelines, including a control register configuration operation of pipeline A to pipeline B. And the status register read operation of pipeline A on pipeline B. The pipeline register transfer unit completes the register transfer of the normal register in pipeline A and the pipeline configuration register in pipeline B, and the pipeline B obtains the register information required for its operation. The synchronization logic unit is responsible for receiving the scheduling instruction and the blocking information, and generating a pause signal for the pipeline A according to the control status register, and generating scheduling enable and delivery scheduling information for the pipeline B. However, the device only supports the inter-core synchronization of the multi-core CPU, and synchronizes the pipeline data of the single-input single-output processor, and synchronizes the multi-pipeline in the core by generating the scheduling enable and the pause signal only by the synchronization logic unit. When synchronizing multiple pipelines in a core, when the number of input operands or output operands of a functional component increases, the interaction between the functional component and the memory is more frequent, the data conflict is more serious, and how to solve it in an efficient manner. The problem of pipeline data synchronization for multi-input and multi-output features is even more pronounced.
发明公开Invention disclosure
本发明的目的在于,解决至少上述问题和缺陷,提供一种多输入多输出处理器流水线数据同步装置及方法,解决多输入多输出功能部件流水级数据同步控制,提高处理器性能。The object of the present invention is to solve at least the above problems and drawbacks, and to provide a multi-input multi-output processor pipeline data synchronization device and method, which solves the multi-input and multi-output functional component flow-level data synchronization control and improves processor performance.
本发明的多输入多输出处理器流水线数据同步装置,设置在具有指令发射器的芯片中,所述同步装置与一个或多个存储器相互通信,其中,所述同步装置包括:The multi-input multi-output processor pipeline data synchronizing apparatus of the present invention is disposed in a chip having an instruction transmitter that communicates with one or more memories, wherein the synchronization apparatus includes:
具有多个运算流水级的多输入多输出功能部件,用于通过执行对多个输入操作数的运算响应指令;a multiple input multiple output function having a plurality of operational pipeline levels for responding to instructions by performing operations on a plurality of input operands;
与所述多输入多输出功能部件相连的流水线控制器:A pipeline controller connected to the MIMO function:
用于接收所述指令发射器发送的指令,解析指令所需的输入操作数并判断输入操作数的有效性,若所述输入操作数全部有效,则在芯片周期的一个节拍内,发送指令进入所述多输入多输出功能部件;若所述输入操作数至少一个无效,则发送空指令进入所述多输入多输出功能部件;以及,Receiving an instruction sent by the instruction transmitter, parsing an input operand required by the instruction, and determining the validity of the input operand. If the input operand is all valid, sending the instruction into a beat of the chip cycle The multiple input multiple output function; if at least one of the input operands is invalid, sending a null instruction to enter the multiple input multiple output function; and,
用于接收所述多输入多输出功能部件的多个输出请求,并判断所述输出请求的可行性,若可行,则在芯片的一个节拍周期内接收所述输出请求,并在所述芯片的一个节拍周期内将所述输出请求转发至所述存储器,若判断所述输出请求为不可行,则阻塞所述多输入多输出功能部件的输出;Receiving a plurality of output requests of the multiple input multiple output function and determining the feasibility of the output request, if feasible, receiving the output request within one beat cycle of the chip, and at the chip The output request is forwarded to the memory during a beat cycle, and if it is determined that the output request is not feasible, blocking an output of the multiple input multiple output function component;
与存储器和所述多输入多输出功能部件相连接的预取输入操作数逻辑,用于计算指令所需输入操作数地址,并向所述存储器发送输入操作数预取请求,以及缓存被预取回的数据;Prefetch input operand logic coupled to the memory and the MIMO function for calculating an input operand address required for the instruction, and transmitting an input operand prefetch request to the memory, and the cache is prefetched Back data
其中,所述输入操作数的有效性是指,所述流水线控制器正在解析的指令所需的输入操作数已经被所述预取输入操作数逻辑预取或者是正在读取,能够保证在下一个芯片周期节拍内发送至所述多输入多输出功能部件,则所述输入操作数为有效,否则为无效;Wherein, the validity of the input operand means that the input operand required by the instruction that the pipeline controller is parsing has been logically prefetched by the prefetch input operand or is being read, and can be guaranteed in the next The input operand is valid when the chip cycle is sent to the multiple input multiple output function, otherwise it is invalid;
其中,所述输出请求的可行性是指,所述多输入多输出功能部件输出的输出操作数都能被缓冲器暂存或者都能被读入的存储器,则所述输出请求为可行,否则为不可行。The feasibility of the output request refers to that the output operand output by the multiple input multiple output function component can be temporarily stored by the buffer or can be read into the memory, and the output request is feasible, otherwise Not feasible.
本发明的多输入多输出处理器流水线数据同步装置,其中:The multi-input multi-output processor pipeline data synchronization device of the present invention, wherein:
所述流水线控制器是通过设置一输入逻辑,解析指令所需的输入操作数并 对输入操作数的有效性进行判断;The pipeline controller is configured to set an input logic to parse an input operand required by the instruction and Judge the validity of the input operand;
所述流水线控制器是通过设置一输出逻辑,接收所述多输入多输出功能部件的多个输出请求并对所述输出请求的可行性进行判断。The pipeline controller receives a plurality of output requests of the MIMO function and determines the feasibility of the output request by setting an output logic.
本发明的多输入多输出处理器流水线数据同步装置,其中:The multi-input multi-output processor pipeline data synchronization device of the present invention, wherein:
若所述输出逻辑阻塞所述多输入多输出功能部件的输出超过芯片周期的一个节拍以上,则阻塞指令进入所述多输入多输出功能部件。If the output logic blocks the output of the multiple input multiple output function from exceeding one beat of the chip cycle, the blocking instruction enters the multiple input multiple output function.
另外,本发明还提供一种使用上述的多输入多输出处理器流水线数据同步装置实现多输入多输出处理器流水线数据的同步方法,包括如下步骤:In addition, the present invention further provides a method for synchronizing pipeline data of a multiple input multiple output processor using the above-described multi-input multi-output processor pipeline data synchronization device, comprising the following steps:
步骤1,所述流水线控制器接收指令发射器发送的指令,解析指令所需的输入操作数并判断输入操作数的有效性,若所述输入操作数全部有效,则在芯片周期的一个节拍内,发送指令进入所述多输入多输出功能部件,若所述输入操作数有至少一个无效,则发送空指令进入所述多输入多输出功能部件; Step 1. The pipeline controller receives an instruction sent by the instruction transmitter, parses the input operand required by the instruction, and determines the validity of the input operand. If the input operand is all valid, it is within one beat of the chip cycle. Sending an instruction to enter the multiple input multiple output function component, if at least one of the input operands is invalid, sending a null instruction to enter the multiple input multiple output function component;
步骤2,所述多输入多输出功能部件,按照指令通过运算流水级执行对多个输入操作数的运算,并向所述流水线控制器发送输出请求; Step 2, the multiple input multiple output function component performs an operation on a plurality of input operands through a computing pipeline level according to an instruction, and sends an output request to the pipeline controller;
步骤3,所述流水线控制器接收所述多输入多输出功能部件的多个输出请求,并判断所述输出请求的可行性,若可行,则在芯片的一个节拍周期内接收所述输出请求,并在所述芯片的一个节拍周期内将所述输出请求转发至存储器,若所述输出请求不可行,则阻塞所述多输入多输出功能部件的输出; Step 3, the pipeline controller receives a plurality of output requests of the multiple input multiple output function, and determines the feasibility of the output request, and if possible, receives the output request within one cycle of the chip, And forwarding the output request to a memory during a beat cycle of the chip, and blocking the output of the multiple input multiple output function component if the output request is not feasible;
其中,所述输入操作数的有效性是指,所述流水线控制器正在解析的指令所需的输入操作数已经被所述预取输入操作数逻辑预取或者是正在读取,能够保证在下一个芯片周期节拍内发送至所述功多输入多输出功能部件,则所述输入操作数为有效,否则为无效;Wherein, the validity of the input operand means that the input operand required by the instruction that the pipeline controller is parsing has been logically prefetched by the prefetch input operand or is being read, and can be guaranteed in the next Sending to the multi-input multi-output function component during the chip cycle, the input operand is valid, otherwise it is invalid;
其中,所述输出请求的可行性是指,所述多输入多输出功能部件输出的输出操作数都能被缓冲器暂存或者都能被读入的存储器,则所述输出请求为可行,否则为不可行。The feasibility of the output request refers to that the output operand output by the multiple input multiple output function component can be temporarily stored by the buffer or can be read into the memory, and the output request is feasible, otherwise Not feasible.
本发明的多输入多输出处理器流水线数据的同步方法,其中,进一步包括如下步骤:The method for synchronizing pipeline data of a multiple input multiple output processor of the present invention further includes the following steps:
步骤10,输入逻辑接收所述指令发射器发送的指令,解析指令所需的输入操作数并判断输入操作数的有效性,若所述输入操作数全部有效,则在芯片周期的一个节拍内,发送指令进入所述多输入多输出功能部件;或者所述输入 操作数有至少一个无效,则发送空指令进入所述多输入多输出功能部件;Step 10: The input logic receives the instruction sent by the instruction transmitter, parses the input operand required by the instruction, and determines the validity of the input operand. If the input operand is all valid, within one beat of the chip cycle, Sending an instruction to enter the multiple input multiple output function; or the input If at least one of the operands is invalid, then a null instruction is sent to enter the multiple input multiple output function;
步骤20,所述多输入多输出功能部件,按照指令通过运算流水级执行对多个输入操作数的运算,并向输出逻辑发送输出请求;Step 20: The multi-input multi-output function component performs an operation on a plurality of input operands through a computational pipeline level according to an instruction, and sends an output request to the output logic;
步骤30,输出逻辑接收所述多输入多输出功能部件的多个输出请求,并判断所述输出请求的可行性,若可行,则在芯片的一个节拍周期内接收所述输出请求,并在一定时间内将所述输出请求转发至所述存储器;或者若所述输出请求不可行,则阻塞所述多输入多输出功能部件的输出。Step 30: The output logic receives the plurality of output requests of the multiple input multiple output function component, and determines the feasibility of the output request, and if possible, receives the output request within one beat cycle of the chip, and is certain The output request is forwarded to the memory within a time period; or if the output request is not feasible, blocking the output of the multiple input multiple output function.
本发明的多输入多输出处理器流水线数据的同步方法,其中,进一步包括如下步骤:The method for synchronizing pipeline data of a multiple input multiple output processor of the present invention further includes the following steps:
步骤40,预取输入操作数逻辑计算指令所需操作数地址,并向存储有所述操作数的存储器发送预取请求,以及缓存被预取回的指令。Step 40: Prefetch the input operand logic to calculate the required operand address, and send a prefetch request to the memory storing the operand, and cache the prefetched instruction.
本发明的多输入多输出处理器流水线数据的同步方法,其中,若所述输出逻辑阻塞所述多输入多输出功能部件的输出超过芯片周期的一个节拍以上,则阻塞指令进入所述多输入多输出功能部件。The method for synchronizing pipeline data of a multiple input multiple output processor according to the present invention, wherein if the output logic blocks the output of the multiple input multiple output function component from exceeding one beat of a chip cycle, the blocking instruction enters the multiple input multiple Output features.
另外,本发明还提供一种包括上述多输入多输出处理器流水线数据同步装置的数据处理芯片。In addition, the present invention also provides a data processing chip including the above-described multi-input multi-output processor pipeline data synchronizing apparatus.
附图简要说明BRIEF DESCRIPTION OF THE DRAWINGS
图1示出了本发明的实施方式的多输入多输出处理器流水线数据同步装置的构成图。1 is a block diagram showing a configuration of a multiplexed pipeline data synchronization apparatus of an embodiment of the present invention.
图2示出了本发明的另一实施方式的多输入多输出处理器流水线数据同步装置的构成图。2 is a block diagram showing a configuration of a multiplexer pipeline data synchronization apparatus according to another embodiment of the present invention.
图3示出了图2的多输入多输出处理器流水线数据同步装置中的解析指令输入操作数有效性的逻辑(即输入逻辑7)的结构图。3 is a block diagram showing the logic for parsing the validity of an instruction input operand (ie, input logic 7) in the multiple input multiple output processor pipeline data synchronization apparatus of FIG. 2.
图4示出了图2的多输入多输出处理器流水线数据同步装置中的功能部件输出请求仲裁和处理逻辑(即输出逻辑8)的结构图。4 is a block diagram showing the functional component output request arbitration and processing logic (i.e., output logic 8) in the multiple input multiple output processor pipeline data synchronization apparatus of FIG.
图5示出本发明的多输入多输出处理器流水线数据同步方法的流程图。Figure 5 is a flow chart showing the method of pipeline data synchronization of the multiple input multiple output processor of the present invention.
图6示出本发明的另一实施方式的多输入多输出处理器流水线数据同步方法的流程图。6 is a flow chart showing a method of pipeline data synchronization of a multiple input multiple output processor according to another embodiment of the present invention.
图7示出本发明的实施方式的数据处理芯片。 FIG. 7 shows a data processing chip of an embodiment of the present invention.
附图标记说明Description of the reference numerals
1~4...RAM存储器、5...流水线控制器、6...多输入多输出功能部件、7...输入逻辑、8...输出逻辑、9...两输入两输出功能部件、11...输入逻辑7的数据预取逻辑、12...输入逻辑7的内部控制逻辑、13...输入逻辑7的FIFO队列、14...输入逻辑7的数据冲裁逻辑、15...输入逻辑7的数据总线接口、16...输出逻辑8的内部控制逻辑、17...输出逻辑8的FIFO队列、18...输出逻辑8的数据冲裁逻辑、19...输出逻辑8的数据总线接口、100,300...输入控制信号、200...输出控制信号、400...多输入多输出处理器流水线数据同步装置、500...数据处理芯片。1~4...RAM memory, 5...pipeline controller, 6...multiple input and multiple output function, 7...input logic,8...output logic, 9...two input and two outputs Function, 11...data prefetch logic for input logic 7, 12...internal control logic for input logic 7,13...data queue for input logic 7, 14...data blank for input logic 7 Logic, 15...data bus interface of input logic 7,16...internal control logic of output logic 8,17...output FIFO queue of logic 8, 18...data blanking logic of output logic 8, 19...output logic 8 data bus interface, 100,300... input control signal, 200...output control signal, 400...multiple input multiple output processor pipeline data synchronization device, 500...data Processing the chip.
实现本发明的最佳方式The best way to implement the invention
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图对本发明的多输入多输出处理器流水线数据同步装置及方法进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the objects, technical solutions and advantages of the present invention more comprehensible, the multi-input multi-output processor pipeline data synchronization device and method of the present invention will be further described in detail below with reference to the accompanying drawings. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
图1本发明的多输入多输出处理器流水线数据同步装置的构成图。所述多输入是指:a.输入数据来源于多个不同的RAM存储器;b.一个芯片周期节拍内功能部件的输入操作数可以是零个至多个;所述多输出是指:a.输出数据的目的地址可以是多个不同的RAM存储器;b.一个芯片周期节拍内功能部件的输出操作数可以是零个至多个;具体地,该同步装置包括:多输入多输出功能部件6,其与四个RAM存储器1、2、3、4相对应,可执行指令序列,功能部件1可有多个输入操作数、多个输出操作数,具有多个流水级,通过执行对多个输入操作数的运算响应指令;流水线控制器5,它设置在芯片中,所述流水线控制器接收芯片中指令发射器发送的指令,解析指令所需的输入操作数并判断输入操作数的有效性,如果所需的数据都是有效的,则在芯片的一个节拍周期内,给功能部件发送请求,若所述输入操作数有至少一个无效,则流水线控制器给功能部件发一条气泡指令,此空指令不会影响功能部件内部的数据寄存器;同时此流水线控制器接收功能部件的多个输出请求,并判断输出请求的可行性,若可行,则在芯片的一个节拍周期内接收请求,然后再在一定时间内将请求转发至存储器;若不可行,则阻塞功能部件流水线,但是当流水线控制器 阻塞功能部件运算功能时,不阻塞流水线控制器解析当前指令以及数据预取功能。其中输入操作数的有效性是指:当流水线控制器正在解析的指令所需的输入操作数已经被预取或者是正在读取,能够保证在下一个芯片周期节拍内发送至功能部件,则认为输入操作数是有效的,否则认为无效;输出请求的可行性是指:功能部件输出的零个至多个输出操作数,它们都能被buffer暂存或者被读入了待写入的RAM的相应地址,则认为输出请求可行,否则认为输出请求不可行;输入操作数的有效性和输出请求的可行性都是在芯片的一个周期节拍内完成。另外,该具有多个流水级的功能部件6具有一项被配置为缓存一个输入指令操作和数据的buffer缓冲器。另外,流水线控制器在解析当前指令所需的输入数据的有效性的同时,解析之后多条指令功能部件所需的输入操作数,并预取数据。Fig. 1 is a block diagram showing a multi-input multi-output processor pipeline data synchronizing apparatus of the present invention. The multiple input means: a. the input data is derived from a plurality of different RAM memories; b. the input operands of the functional components in one chip cycle can be zero to more; the multiple outputs are: a. The destination address of the data may be a plurality of different RAM memories; b. the number of output operands of the function within one chip cycle beat may be zero to more; in particular, the synchronization device comprises: a multiple input multiple output function component 6, Corresponding to four RAM memories 1, 2, 3, 4, a sequence of instructions can be executed, and the functional component 1 can have multiple input operands, multiple output operands, multiple pipeline stages, and multiple input operations by execution. The number of operations is responsive to the instruction; the pipeline controller 5 is disposed in the chip, the pipeline controller receives the instruction sent by the instruction transmitter in the chip, parses the input operand required by the instruction, and determines the validity of the input operand if The required data is valid, and a request is sent to the function component during one cycle of the chip, and if at least one of the input operands is invalid, the pipeline control The device sends a bubble command to the function, the empty command does not affect the data register inside the function; at the same time, the pipeline controller receives multiple output requests of the feature and determines the feasibility of the output request, if feasible, on the chip Receive a request within one beat cycle, and then forward the request to memory within a certain period of time; if not, block the feature pipeline, but when the pipeline controller When the function of the function is blocked, the pipeline controller is not blocked from parsing the current instruction and the data prefetch function. The validity of the input operand means that the input operand required by the instruction that the pipeline controller is parsing has been prefetched or is being read, and can be guaranteed to be sent to the function within the next chip cycle beat. The operand is valid, otherwise it is considered invalid; the feasibility of the output request is: zero or more output operands output by the function, which can be temporarily buffered by the buffer or read into the corresponding address of the RAM to be written. , the output request is considered feasible, otherwise the output request is considered not feasible; the validity of the input operand and the feasibility of the output request are all completed within one cycle of the chip. Additionally, the functional component 6 having a plurality of pipeline stages has a buffer buffer configured to buffer an input instruction operation and data. In addition, the pipeline controller analyzes the input operands required by the plurality of instruction functions and prefetches the data while parsing the validity of the input data required by the current instruction.
图2示出了本发明的另一实施方式的多输入多输出处理器流水线数据同步装置的构成图。在该实施方式的多输入多输出处理器流水线数据同步装置中,作为多输入多输出功能部件而采用两输入两输出功能部件作为9,其与四个RAM存储器1、2、3、4相对应,但是其流水线控制器包括:解析指令输入操作数有效性的逻辑7(以下简称为“输入逻辑”)、功能部件输出请求仲裁和处理逻辑8(以下简称为“输出逻辑”)以及预取指令输入操作数的逻辑10(以下简称“预取输入操作数逻辑”)。2 is a block diagram showing a configuration of a multiplexer pipeline data synchronization apparatus according to another embodiment of the present invention. In the MIMO pipeline data synchronizing apparatus of this embodiment, as the MIMO function, a two-input and two-output function is employed as 9, which corresponds to four RAM memories 1, 2, 3, and 4. , but its pipeline controller includes: logic 7 for analyzing the validity of the instruction input operand (hereinafter referred to as "input logic"), function output request arbitration and processing logic 8 (hereinafter referred to as "output logic"), and prefetch instruction Enter the logic 10 of the operand (hereafter referred to as "prefetch input operand logic").
所述功能部件6包括多个流水级,被配置为通过所述输入操作数来执行响应指令,以及:A、寄存器寄存一条经过解析指令输入操作数有效性的逻辑解析后进入的指令;B、当输出数据的输出请求被输出请求仲裁和处理逻辑拒绝时,停顿功能部件流水线阻塞解析指令输入操作数有效性的逻辑指令进入。两输入两输出功能部件6采用流水线结构进行运算处理想量化数据,此功能部件可用于处理神经网络算法,图像处理算法等算法。The functional component 6 includes a plurality of pipeline stages configured to execute a response instruction by the input operand, and: A, the register registers an instruction that is entered after logical parsing of the validity of the parsing instruction input operand; When the output request of the output data is rejected by the output request arbitration and processing logic, the stall function block pipeline enters the logic instruction that parses the validity of the instruction input operand. The two-input and two-output functional component 6 uses a pipeline structure for arithmetic processing to quantize data. This functional component can be used to process algorithms such as neural network algorithms and image processing algorithms.
输入逻辑7,确认功能部件所需的输入操作数个数以及每个输入操作数来自于四个RAM中位置以及数据是否有效,即判断所有输入操作数是否被预取到缓存中或者是否能在芯片周期的一个节拍内被输入到功能部件,以及:A、如果所有输入操作数是否被预取到缓存中或者是否能在芯片周期的一个节拍内被输入到功能部件,则发送指令进入功能部件。B、否则,发送不会影响功能部件状态的空指令进入功能部件。其中发送空指令的目的是,不改变功能部 件内部的数据寄存器以免发生数据流错误。另外,对于该输入逻辑7的具体结构将在下文进行阐述。Enter logic 7, confirm the number of input operands required by the feature, and each input operand comes from the location in the four RAMs and whether the data is valid, that is, whether all input operands are prefetched into the cache or can be One beat of the chip cycle is input to the function, and: A. If all input operands are prefetched into the cache or can be input to the function within one beat of the chip cycle, the command is sent to the feature . B. Otherwise, an empty command that does not affect the state of the feature enters the feature. The purpose of sending an empty instruction is to not change the function part. The internal data registers are protected from data flow errors. In addition, the specific structure of the input logic 7 will be explained below.
输出逻辑8,确认功能部件输出的操作数个数以及输出操作数应写入四个RAM中位置以及相应RAM地址或缓存是否可写入数据,以及:A、如果输出操作数中有任何一个所对应的存储器或者是缓存不可写入数据,则阻塞功能部件流水级输出。B、否则,将多个输出操作数分别写入各自的RAM地址或者是缓存。该输出逻辑8阻塞功能部件6流水级输出后,因为功能部件6能缓存一条经过解析指令输入操作数有效性的逻辑解析后进入的指令,不会立即阻塞输入逻辑7指令进入功能部件。该输出逻辑8阻塞功能部件流水级输出超过芯片周期一个节拍以上时,将阻塞输入逻辑7指令进入功能部件。所述功能部件6阻塞输入逻辑7指令进入时,所述预取指令输入操作数的逻辑功能正常执行。另外,对于该输出逻辑8的具体结构将在下文进行阐述。Output logic 8, confirm the number of operands output by the function and the output operand should be written to the four RAM locations and whether the corresponding RAM address or cache can write data, and: A, if any of the output operands The corresponding memory or cache unwritable data blocks the functional stream output. B. Otherwise, multiple output operands are written to their respective RAM addresses or caches. After the output logic 8 blocks the flow level output of the functional component 6, the functional component 6 can cache an instruction that has been logically parsed after parsing the validity of the input operand, and does not immediately block the input logic 7 from entering the functional component. The output logic 8 blocks the function stream level output beyond one cycle of the chip cycle and blocks the input logic 7 command into the feature. When the function component 6 blocks the input logic 7 instruction from entering, the logic function of the prefetch instruction input operand is normally executed. Additionally, the specific structure for this output logic 8 will be explained below.
预取输入操作数逻辑10,其被配置为通过指令域和寄存器信息计算之后若干条指令所需操作数地址,并发送预取请求值相应RAM;以及缓存被预取回的指令。Prefetching input operand logic 10 is configured to calculate an operand address required for a number of subsequent instructions through the instruction field and register information, and to send a prefetch request value corresponding to the RAM; and to cache the prefetched instruction.
另外,图3示出了图2中的输入逻辑7的结构。其内部控制逻辑单元接12收外部控制并依据内部的寄存器状态确定发送至运算单元的控制指令,并确定与之相对应所需的操作数。当内部控制逻辑12确认运算单元所需的数据准备好之后将控制指令以及输入操作数发送给运算部件以实现流水线输入数据的同步化控制。为了不使运算部件流水级因等待输入数据而造成停顿,数据预取逻辑将和内部控制逻辑交互提前几个时钟周期解析出运算部件所需的输入操作数并给相应的存储块发送数据预取请求。因此,在输入数据同步化处理逻辑中有FIFO队列13缓存被预取的数据。同时,还有一个输入数据仲裁逻辑单元14确定输入运算部件的数据来源于FIFO队列还是直接从数据总线读取。In addition, FIG. 3 shows the structure of the input logic 7 in FIG. 2. The internal control logic unit receives the external control and determines the control command sent to the arithmetic unit according to the internal register state, and determines the operand required corresponding thereto. When the internal control logic 12 confirms that the data required by the arithmetic unit is ready, the control command and the input operand are sent to the arithmetic component to implement synchronization control of the pipeline input data. In order not to cause the computational component flow level to stall due to waiting for input data, the data prefetch logic will interpret the input operation operand required by the arithmetic component by several clock cycles in advance with the internal control logic and send data prefetch to the corresponding storage block. request. Therefore, in the input data synchronization processing logic, the FIFO queue 13 buffers the prefetched data. At the same time, an input data arbitration logic unit 14 determines whether the data of the input arithmetic component is derived from the FIFO queue or directly from the data bus.
图4示出了图2中的输出逻辑8的结构。其内部控制逻辑单元16接收运算部件传递的控制信号并依照内部寄存器状态确认运算部件输出的操作数个数以及输出操作数应写入的存储块位置以及相应存储块地址或缓存是否可写入数据。同时,输出逻辑8依照输出数据是否可被写入接收判断是否阻塞运算部件流水线。同样的,为了避免运算部件因输出数据被阻塞,输出数据同步化 处理逻辑8中含有FIFO队列17缓存不能被及时写入存储块的数据。而且为了维护输出数据一致性,需要有一个数据仲裁逻辑18确定当前时钟周期写入存储的数据来源于FIFO队列或直接从总线读取。Figure 4 shows the structure of the output logic 8 of Figure 2. The internal control logic unit 16 receives the control signal transmitted by the arithmetic unit and confirms the number of operands output by the arithmetic unit according to the internal register status and the position of the memory block to which the output operand should be written, and whether the corresponding memory block address or the cache can write data. . At the same time, the output logic 8 determines whether to block the computation component pipeline in accordance with whether the output data can be written to and received. Similarly, in order to prevent the arithmetic component from being blocked due to the output data, the output data is synchronized. The processing logic 8 contains data in which the FIFO queue 17 buffer cannot be written to the memory block in time. Moreover, in order to maintain output data consistency, a data arbitration logic 18 is required to determine that the current clock cycle writes the stored data from the FIFO queue or directly from the bus.
另外,本发明还提供一种使用上述的多输入多输出处理器流水线数据同步装置实现多输入多输出处理器流水线数据的同步方法,如图5所示,包括如下步骤:In addition, the present invention further provides a method for synchronizing pipeline data of a multiple input multiple output processor using the above-described multi-input multi-output processor pipeline data synchronization device. As shown in FIG. 5, the method includes the following steps:
步骤1,所述流水线控制器接收指令发射器发送的指令,解析指令所需的输入操作数并判断输入操作数的有效性,若所述输入操作数全部有效,则在芯片周期的一个节拍内,发送指令进入所述多输入多输出功能部件,若所述输入操作数有至少一个无效,则发送空指令进入所述多输入多输出功能部件; Step 1. The pipeline controller receives an instruction sent by the instruction transmitter, parses the input operand required by the instruction, and determines the validity of the input operand. If the input operand is all valid, it is within one beat of the chip cycle. Sending an instruction to enter the multiple input multiple output function component, if at least one of the input operands is invalid, sending a null instruction to enter the multiple input multiple output function component;
步骤2,所述多输入多输出功能部件,按照指令通过运算流水级执行对多个输入操作数的运算,并向所述流水线控制器发送输出请求; Step 2, the multiple input multiple output function component performs an operation on a plurality of input operands through a computing pipeline level according to an instruction, and sends an output request to the pipeline controller;
步骤3,所述流水线控制器接收所述多输入多输出功能部件的多个输出请求,并判断所述输出请求的可行性,若可行,则在芯片的一个节拍周期内接收所述输出请求,并在一定时间内将所述输出请求转发至存储器,若不可行,则阻塞所述多输入多输出功能部件的输出; Step 3, the pipeline controller receives a plurality of output requests of the multiple input multiple output function, and determines the feasibility of the output request, and if possible, receives the output request within one cycle of the chip, And forwarding the output request to the memory within a certain time, and blocking the output of the multiple input multiple output function component if not feasible;
其中,所述输入操作数的有效性是指,所述流水线控制器正在解析的指令所需的输入操作数已经被预取或者是正在读取,能够保证在下一个芯片周期节拍内发送至所述功多输入多输出功能部件,则所述输入操作数为有效,否则为无效;所述输出请求的可行性是指,所述多输入多输出功能部件输出的输出操作数都能被缓冲器暂存或者都能被读入的存储器,则所述输出请求为可行,否则为不可行。Wherein, the validity of the input operand means that the input operand required by the instruction that the pipeline controller is parsing has been prefetched or is being read, and can be guaranteed to be sent to the next chip cycle tempo The multi-input multi-output function, the input operand is valid, otherwise it is invalid; the feasibility of the output request means that the output operand output by the multi-input multi-output function can be temporarily buffered The output request is feasible if it is stored or can be read in, otherwise it is not feasible.
图6示出本发明的另一实施方式的多输入多输出处理器流水线数据同步方法的流程图。包括如下步骤:6 is a flow chart showing a method of pipeline data synchronization of a multiple input multiple output processor according to another embodiment of the present invention. Including the following steps:
步骤10,输入逻辑解析指令所需的输入操作数,并判断所有输入操作数是否已被预取到缓存中或者是否能在芯片周期的一个节拍内被输入到所述多输入多输出功能部件,若是,则发送指令进入所述多输入多输出功能部件,若否,则发送空指令进入所述多输入多输出功能部件;Step 10: input an input operand required by the logic parsing instruction, and determine whether all input operands have been prefetched into the cache or can be input to the MIMO function within one beat of the chip cycle. If yes, sending an instruction to enter the multiple input multiple output function component; if not, sending an empty instruction to enter the multiple input multiple output function component;
步骤20,所述多输入多输出功能部件,按照指令通过运算流水级执行对多个输入操作数的运算,并向输出逻辑发送输出请求; Step 20: The multi-input multi-output function component performs an operation on a plurality of input operands through a computational pipeline level according to an instruction, and sends an output request to the output logic;
步骤30,输出逻辑判断所述输出请求的所有输出操作数所对应的存储器地址或者是缓存是否可写入数据,若是,则将所述输出操作数分别写入所对应的存储器地址或缓存,若否,则阻塞所述多输入多输出功能部件的输出。Step 30, the output logic determines whether the memory address corresponding to all the output operands of the output request or whether the cache can write data, and if so, writes the output operand to the corresponding memory address or cache, respectively. No, the output of the MIMO function is blocked.
本发明的多输入多输出处理器流水线数据的同步方法,其中,进一步包括如下步骤:The method for synchronizing pipeline data of a multiple input multiple output processor of the present invention further includes the following steps:
步骤40,预取输入操作数逻辑通过指令域和寄存器信息计算指令所需操作数地址,并向存储有所述操作数的存储器发送预取请求,以及缓存被预取回的指令。Step 40: Prefetch input operand logic calculates an operand address required by the instruction through the instruction field and the register information, and sends a prefetch request to the memory storing the operand, and caches the prefetched instruction.
本发明的多输入多输出处理器流水线数据的同步方法,其中,所述输出逻辑,阻塞所述多输入多输出功能部件的输出超过芯片周期的一个节拍以上,则阻塞指令进入所述多输入多输出功能部件。The method for synchronizing pipeline data of a multiple input multiple output processor according to the present invention, wherein the output logic blocks an output of the multiple input multiple output function component to exceed one beat of a chip cycle, and the blocking instruction enters the multiple input multiple Output features.
以下,列举几个具体实施例来对本发明的多输入多输出处理器流水线数据的同步方法进行详细说明。Hereinafter, several specific embodiments will be described in detail to describe a method for synchronizing the IO data of the MIMO processor of the present invention.
一个实例中,指令所有所需的数据都已经被预取,同时功能部件输出数据都能被写入相应的RAM或者是buffer缓冲器,则功能部件能每在芯片周期一个节拍内接受一条新指令,功能部件高速运行,并且这是一种极佳的方式,但这不仅于此。In one example, all required data for the instruction has been prefetched, and the feature output data can be written to the corresponding RAM or buffer buffer, and the feature can accept a new instruction every one beat of the chip cycle. The feature runs at high speed and this is an excellent way, but it's not just that.
另一个实例中,指令所需的输入操作数据中有一个未被预取,解析指令输入操作数有效性的逻辑立即发送读请求至此操作数据所在RAM,在此RAM不出现故障的情况下,操作数能在下一个芯片周期节拍被功能部件获取,因此可使指令进入功能部件而不造成阻塞。In another example, one of the input operation data required by the instruction is not prefetched, and the logic that parses the validity of the instruction input operand immediately sends a read request to the RAM where the operation data is located, and operates in the case where the RAM does not malfunction. The number can be acquired by the function at the next chip cycle, so the command can be entered into the feature without blocking.
另一个实例中,指令所有所需的输入操作数据都已经有效,但是功能部件输出数据在当前芯片周期节拍有一个或者都不能被写入相应的RAM或者是buffer缓冲器,然而在下一个芯片周期节拍可以写入,由于功能部件能缓存一条输入的指令,功能部件能仍可在下一芯片周期节拍接收指令及数据,不造成阻塞。In another example, all required input operation data is valid, but the feature output data has one or none of the current chip cycle beats or can be written to the corresponding RAM or buffer buffer, but at the next chip cycle. It can be written. Since the function can buffer an input command, the function can still receive commands and data in the next chip cycle without causing blocking.
另一个实例中,预取指令输入操作数的逻辑根据当前指令特定域以及寄存器信息计算出之后若干条指令所需操作数的位置地址。在功能部件不和读取当前指令输入操作数据或者写入输出操作数据发生冲突的情况下,其将预取之后指令所需的输入操作数据,这种方式对于神经网络算法尤其有效,因为神经网 络的数据种类繁多而且数据的空间局部性极高。在这种方式下,本发明所述的多输入多输出功能部件流水线同步化装置能极大程度减小处理器访存开销,提高处理器性能。In another example, the logic of the prefetch instruction input operand calculates the location address of the operands required for the subsequent instructions based on the current instruction specific field and the register information. In the case that the function does not conflict with reading the current instruction input operation data or writing the output operation data, it will prefetch the input operation data required after the instruction, which is especially effective for the neural network algorithm because the neural network The data of the network is numerous and the spatial locality of the data is extremely high. In this manner, the multi-input and multi-output component pipeline synchronization device of the present invention can greatly reduce processor memory access overhead and improve processor performance.
另外,如图7所示,本发明还提供一种包括上述多输入多输出处理器流水线数据同步装置400的数据处理芯片500。In addition, as shown in FIG. 7, the present invention also provides a data processing chip 500 including the above-described multiple input multiple output processor pipeline data synchronization device 400.
另外,应当说明的是,本发明所述“指令”是指计算机程序指令,该计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定的方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。In addition, it should be noted that the "instructions" of the present invention refer to computer program instructions, which may also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner. The instructions stored in the computer readable memory are caused to produce an article of manufacture comprising an instruction device that implements the functions specified in one or more blocks of the flow or in a flow or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
工业应用性Industrial applicability
根据本发明的多输入多输出功能部件流水线同步装置,实现输入数据的解析和预取以及对功能部件的含有多个输出操作数的输出请求应答,因此能够解决多输入多输出功能部件流水级数据同步控制,提高处理器性能。 According to the multi-input and multi-output function pipeline synchronization device of the present invention, parsing and prefetching of input data and output request response of a plurality of output operands of the functional component are realized, thereby being able to solve the multi-input and multi-output functional component flow level data. Synchronous control to improve processor performance.

Claims (8)

  1. 一种多输入多输出处理器流水线数据同步装置,设置在具有指令发射器的芯片中,所述同步装置与一个或多个存储器相互通信,特征在于,所述同步装置包括:A multi-input multi-output processor pipeline data synchronizing device disposed in a chip having an instruction transmitter that communicates with one or more memories, wherein the synchronization device comprises:
    具有多个运算流水级的多输入多输出功能部件,用于执行对多个输入操作数的运算响应指令;a multiple input multiple output function having a plurality of computational flow levels for performing operational response instructions to a plurality of input operands;
    与所述多输入多输出功能部件相连的流水线控制器:A pipeline controller connected to the MIMO function:
    用于接收所述指令发射器发送的指令,解析指令所需的输入操作数并判断输入操作数的有效性,若所述输入操作数全部有效,则在芯片周期的一个节拍内,发送指令进入所述多输入多输出功能部件;若所述输入操作数至少一个无效,则发送空指令进入所述多输入多输出功能部件;以及,Receiving an instruction sent by the instruction transmitter, parsing an input operand required by the instruction, and determining the validity of the input operand. If the input operand is all valid, sending the instruction into a beat of the chip cycle The multiple input multiple output function; if at least one of the input operands is invalid, sending a null instruction to enter the multiple input multiple output function; and,
    用于接收所述多输入多输出功能部件的多个输出请求,并判断所述输出请求的可行性,若可行,则在芯片的一个节拍周期内接收所述输出请求,并在所述芯片的一个节拍周期内将所述输出请求转发至所述存储器,若判断所述输出请求为不可行,则阻塞所述多输入多输出功能部件的输出;Receiving a plurality of output requests of the multiple input multiple output function and determining the feasibility of the output request, if feasible, receiving the output request within one beat cycle of the chip, and at the chip The output request is forwarded to the memory during a beat cycle, and if it is determined that the output request is not feasible, blocking an output of the multiple input multiple output function component;
    与存储器和所述多输入多输出功能部件相连接的预取输入操作数逻辑,用于计算指令所需输入操作数地址,并向所述存储器发送输入操作数预取请求,以及缓存被预取回的数据。Prefetch input operand logic coupled to the memory and the MIMO function for calculating an input operand address required for the instruction, and transmitting an input operand prefetch request to the memory, and the cache is prefetched Back to the data.
  2. 根据权利要求1所述的多输入多输出处理器流水线数据同步装置,其特征在于:The pipeline device for synchronizing multiple input and multiple output processors according to claim 1, wherein:
    所述流水线控制器是通过设置一输入逻辑,解析指令所需的输入操作数并对输入操作数的有效性进行判断;The pipeline controller determines an input operand required for an instruction and determines the validity of the input operand by setting an input logic;
    其中,所述输入操作数的有效性是指,所述流水线控制器正在解析的指令所需的输入操作数已经被所述预取输入操作数逻辑预取或者是正在读取,能够保证在下一个芯片周期节拍内发送至所述多输入多输出功能部件,则所述输入操作数为有效,否则为无效;Wherein, the validity of the input operand means that the input operand required by the instruction that the pipeline controller is parsing has been logically prefetched by the prefetch input operand or is being read, and can be guaranteed in the next The input operand is valid when the chip cycle is sent to the multiple input multiple output function, otherwise it is invalid;
    所述流水线控制器是通过设置一输出逻辑,接收所述多输入多输出功能部件的多个输出请求并对所述输出请求的可行性进行判断,其中,所述输出请求 的可行性是指,所述多输入多输出功能部件输出的输出操作数都能被缓冲器暂存或者都能被读入的存储器,则所述输出请求为可行,否则为不可行。The pipeline controller receives a plurality of output requests of the multiple input multiple output function component by determining an output logic, and determines a feasibility of the output request, wherein the output request The feasibility means that the output operand output by the MIMO function can be temporarily stored by the buffer or can be read into the memory, and the output request is feasible, otherwise it is not feasible.
  3. 根据权利要求2所述的多输入多输出处理器流水线数据同步装置,其特征在于,A pipelined data synchronization device for a multiple input multiple output processor according to claim 2, wherein:
    若所述输出逻辑阻塞所述多输入多输出功能部件的输出超过芯片周期的一个节拍以上,则阻塞指令进入所述多输入多输出功能部件。If the output logic blocks the output of the multiple input multiple output function from exceeding one beat of the chip cycle, the blocking instruction enters the multiple input multiple output function.
  4. 一种使用权利要求1~3的任一项所述的多输入多输出处理器流水线数据同步装置实现多输入多输出处理器流水线数据的同步方法,其特征在于,包括如下步骤:A method for synchronizing pipeline data of a multiple input multiple output processor using the multi-input multi-output processor pipeline data synchronization device according to any one of claims 1 to 3, comprising the steps of:
    步骤1,所述流水线控制器接收指令发射器发送的指令,解析指令所需的输入操作数并判断输入操作数的有效性,若所述输入操作数全部有效,则在芯片周期的一个节拍内,发送指令进入所述多输入多输出功能部件,若所述输入操作数有至少一个无效,则发送空指令进入所述多输入多输出功能部件;Step 1. The pipeline controller receives an instruction sent by the instruction transmitter, parses the input operand required by the instruction, and determines the validity of the input operand. If the input operand is all valid, it is within one beat of the chip cycle. Sending an instruction to enter the multiple input multiple output function component, if at least one of the input operands is invalid, sending a null instruction to enter the multiple input multiple output function component;
    步骤2,所述多输入多输出功能部件,按照指令通过运算流水级执行对多个输入操作数的运算,并向所述流水线控制器发送输出请求;Step 2, the multiple input multiple output function component performs an operation on a plurality of input operands through a computing pipeline level according to an instruction, and sends an output request to the pipeline controller;
    步骤3,所述流水线控制器接收所述多输入多输出功能部件的多个输出请求,并判断所述输出请求的可行性,若可行,则在芯片的一个节拍周期内接收所述输出请求,并在所述芯片的一个节拍周期内将所述输出请求转发至存储器,若所述输出请求不可行,则阻塞所述多输入多输出功能部件的输出。Step 3, the pipeline controller receives a plurality of output requests of the multiple input multiple output function, and determines the feasibility of the output request, and if possible, receives the output request within one cycle of the chip, The output request is forwarded to the memory during one ticks of the chip, and if the output request is not feasible, the output of the MIMO function is blocked.
  5. 根据权利要4所述的多输入多输出处理器流水线数据的同步方法,其特征在于,进一步包括如下步骤:The method for synchronizing pipeline data of a multiple input multiple output processor according to claim 4, further comprising the steps of:
    步骤10,输入逻辑接收所述指令发射器发送的指令,解析指令所需的输入操作数并判断输入操作数的有效性,若所述输入操作数全部有效,则在芯片周期的一个节拍内,发送指令进入所述多输入多输出功能部件;或者所述输入操作数有至少一个无效,则发送空指令进入所述多输入多输出功能部件;Step 10: The input logic receives the instruction sent by the instruction transmitter, parses the input operand required by the instruction, and determines the validity of the input operand. If the input operand is all valid, within one beat of the chip cycle, Sending an instruction to enter the multiple input multiple output function; or if the input operand has at least one invalid, sending an empty instruction to enter the multiple input multiple output function;
    步骤20,所述多输入多输出功能部件,按照指令通过运算流水级执行对多个输入操作数的运算,并向输出逻辑发送输出请求;Step 20: The multi-input multi-output function component performs an operation on a plurality of input operands through a computational pipeline level according to an instruction, and sends an output request to the output logic;
    步骤30,输出逻辑接收所述多输入多输出功能部件的多个输出请求,并判断所述输出请求的可行性,若可行,则在芯片的一个节拍周期内接收所述输出请求,并在一定时间内将所述输出请求转发至所述存储器;或者若所述输出 请求不可行,则阻塞所述多输入多输出功能部件的输出。Step 30: The output logic receives the plurality of output requests of the multiple input multiple output function component, and determines the feasibility of the output request, and if possible, receives the output request within one beat cycle of the chip, and is certain Forwarding the output request to the memory for a time; or if the output If the request is not feasible, the output of the MIMO function is blocked.
  6. 根据权利要5所述的多输入多输出处理器流水线数据的同步方法,其特征在于,进一步包括如下步骤:The method for synchronizing pipeline data of a multiple input multiple output processor according to claim 5, further comprising the steps of:
    步骤40,预取输入操作数逻辑计算指令所需操作数地址,并向存储有所述操作数的存储器发送预取请求,以及缓存被预取回的指令。Step 40: Prefetch the input operand logic to calculate the required operand address, and send a prefetch request to the memory storing the operand, and cache the prefetched instruction.
  7. 根据权利要5所述的多输入多输出处理器流水线数据的同步方法,其特征在于,A method for synchronizing pipeline data of a multiple input multiple output processor according to claim 5, characterized in that
    若所述输出逻辑阻塞所述多输入多输出功能部件的输出超过芯片周期的一个节拍以上,则阻塞指令进入所述多输入多输出功能部件。If the output logic blocks the output of the multiple input multiple output function from exceeding one beat of the chip cycle, the blocking instruction enters the multiple input multiple output function.
  8. 一种包括权利要求1~3的任一项所述的多输入多输出处理器流水线数据同步装置的数据处理芯片。 A data processing chip comprising the MIMO pipeline data synchronization device according to any one of claims 1 to 3.
PCT/CN2016/086100 2015-11-24 2016-06-17 Pipeline data synchronization apparatus and method for multi-input multi-output processor WO2017088456A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510824901.7A CN105824604B (en) 2015-11-24 2015-11-24 Multiple-input and multiple-output processor pipeline data synchronization unit and method
CN201510824901.7 2015-11-24

Publications (1)

Publication Number Publication Date
WO2017088456A1 true WO2017088456A1 (en) 2017-06-01

Family

ID=56513438

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/086100 WO2017088456A1 (en) 2015-11-24 2016-06-17 Pipeline data synchronization apparatus and method for multi-input multi-output processor

Country Status (2)

Country Link
CN (1) CN105824604B (en)
WO (1) WO2017088456A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134630A (en) * 2019-05-27 2019-08-16 西安电子工程研究所 A kind of transmission cache controller design method of multiple input single output

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108733585B (en) * 2017-04-17 2022-05-13 伊姆西Ip控股有限责任公司 Cache system and related method
CN111258657B (en) * 2020-01-23 2020-11-20 上海燧原智能科技有限公司 Pipeline control method and related equipment
CN112559053B (en) * 2020-12-21 2022-06-03 清华大学 Data synchronization processing method and device for reconfigurable processor

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5987620A (en) * 1997-09-19 1999-11-16 Thang Tran Method and apparatus for a self-timed and self-enabled distributed clock
US6467004B1 (en) * 1991-10-09 2002-10-15 Hitachi, Ltd. Pipelined semiconductor devices suitable for ultra large scale integration
US20080016290A1 (en) * 2006-07-11 2008-01-17 Pennock James D Dynamic instruction and data updating architecture
CN103383641A (en) * 2013-04-19 2013-11-06 中国科学院自动化研究所 Synchronous device for multi-assembly lines
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6467004B1 (en) * 1991-10-09 2002-10-15 Hitachi, Ltd. Pipelined semiconductor devices suitable for ultra large scale integration
US5987620A (en) * 1997-09-19 1999-11-16 Thang Tran Method and apparatus for a self-timed and self-enabled distributed clock
US20080016290A1 (en) * 2006-07-11 2008-01-17 Pennock James D Dynamic instruction and data updating architecture
CN103383641A (en) * 2013-04-19 2013-11-06 中国科学院自动化研究所 Synchronous device for multi-assembly lines
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134630A (en) * 2019-05-27 2019-08-16 西安电子工程研究所 A kind of transmission cache controller design method of multiple input single output
CN110134630B (en) * 2019-05-27 2023-02-10 西安电子工程研究所 Design method of multi-input single-output transmission cache controller

Also Published As

Publication number Publication date
CN105824604A (en) 2016-08-03
CN105824604B (en) 2017-08-29

Similar Documents

Publication Publication Date Title
US10754658B2 (en) Information processing apparatus, information processing method, and computer-readable recording medium
US8004533B2 (en) Graphics input command stream scheduling method and apparatus
JP5764265B2 (en) Circuit devices, integrated circuit devices, program products and methods that utilize low-latency variable propagation networks for parallel processing of virtual threads across multiple hardware threads (grains of virtual threads across multiple hardware threads) Low latency variable transmission network for generalized parallel processing)
JP4678623B2 (en) Memory barrier element (Primitive) in an asymmetric heterogeneous multiprocessor environment (asymmetric heterogeneous multiprocessor environment)
US7313673B2 (en) Fine grained multi-thread dispatch block mechanism
KR101012744B1 (en) Pipeline accelerator for improved computing architecture and related system and method
US8892824B2 (en) Store-operate-coherence-on-value
US20110119469A1 (en) Balancing workload in a multiprocessor system responsive to programmable adjustments in a syncronization instruction
WO2017088456A1 (en) Pipeline data synchronization apparatus and method for multi-input multi-output processor
JP2779044B2 (en) Buffer storage control method
JP6450705B2 (en) Persistent commit processor, method, system and instructions
CN102822810A (en) Critical word forwarding with adaptive prediction
US4967338A (en) Loosely coupled pipeline processor
US20050044128A1 (en) Decoupled store address and data in a multiprocessor system
WO2005020088A2 (en) Decoupled store address and data in a multiprocessor system
US20150268985A1 (en) Low Latency Data Delivery
US9286129B2 (en) Termination of requests in a distributed coprocessor system
US6738837B1 (en) Digital system with split transaction memory access
CN112949847A (en) Neural network algorithm acceleration system, scheduling system and scheduling method
CN116243983A (en) Processor, integrated circuit chip, instruction processing method, electronic device, and medium
US20070156960A1 (en) Ordered combination of uncacheable writes
CN117501254A (en) Providing atomicity for complex operations using near-memory computation
US8307141B2 (en) Multi-core processor, control method thereof, and information processing apparatus
US10394653B1 (en) Computing in parallel processing environments
US11119149B2 (en) Debug command execution using existing datapath circuitry

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16867683

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16867683

Country of ref document: EP

Kind code of ref document: A1