WO2012151822A1 - 一种处理器的环回结构及数据环回处理方法 - Google Patents

一种处理器的环回结构及数据环回处理方法 Download PDF

Info

Publication number
WO2012151822A1
WO2012151822A1 PCT/CN2011/079663 CN2011079663W WO2012151822A1 WO 2012151822 A1 WO2012151822 A1 WO 2012151822A1 CN 2011079663 W CN2011079663 W CN 2011079663W WO 2012151822 A1 WO2012151822 A1 WO 2012151822A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
unit
register file
storage unit
read
Prior art date
Application number
PCT/CN2011/079663
Other languages
English (en)
French (fr)
Inventor
黎立煌
李炜
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to US14/117,244 priority Critical patent/US20140156685A1/en
Priority to EP11865214.8A priority patent/EP2709003B1/en
Publication of WO2012151822A1 publication Critical patent/WO2012151822A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/3826Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30025Format conversion instructions, e.g. Floating-Point to Integer, decimal conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • G06F9/3873Variable length pipelines, e.g. elastic pipeline
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3893Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator

Definitions

  • the present invention relates to the field of architecture design of a processor, and in particular, to a loopback structure of a processor and a data loopback processing method. Background technique
  • the processor is the core component in the chip.
  • the efficiency and power consumption of the processor greatly affect the efficiency and power consumption of the whole chip. Therefore, in the design of the processor architecture, it is necessary to consider how to improve the efficiency and reduce the efficiency of the processor.
  • the power consumption of the processor is the core component in the chip.
  • Channel 1 Data channel via "memory ⁇ data read unit ⁇ register file unit”
  • channel 2 via "register file unit ⁇
  • the data channel of the operation unit ⁇ register file unit which is also called the front door channel;
  • Channel 3 Via the data channel of "Register File Unit ⁇ Data Storage Unit ⁇ Memory".
  • the data read unit first reads the operands in the memory into the register file unit before starting the operation; then the operation unit reads the operand from the register file unit and starts the operation, and the operation The result is written back to the register file unit; the last data storage unit reads the result of the operation from the register file unit and stores the result in memory.
  • the data operation can be repeated in the front gate channel formed by "register file unit ⁇ arithmetic unit ⁇ register file unit".
  • the arithmetic unit can only perform arithmetic logic operations and cannot read data.
  • Special data transformations provided by the unit and data storage unit eg, rotational shifting of data, etc.
  • the processor must write the data back into memory and then read it back from the memory. Because the operation of the memory will consume the power consumption of the processor And time, so in this case, if the processor reads and writes memory frequently, it will have a great impact on the efficiency and power consumption of the entire processor. Summary of the invention
  • the main object of the present invention is to provide a loopback structure and a data loopback processing method for a processor to improve the efficiency of the processor and reduce the power consumption of the processor.
  • the present invention provides a loopback structure of a processor, including: a register file unit, a data storage unit, and a data reading unit, where
  • the register file unit is configured to provide read and write services for data for the data storage unit and the data reading unit;
  • the data storage unit is connected to the register file unit for reading data from a read port of the register file unit, and performing data transformation on the read data and feeding back to the data reading unit;
  • the data reading unit is connected to the register file unit and the data storage unit for converting data fed back by the data storage unit and writing to the register file unit through a write port of the register file unit.
  • a data operation and transformation unit is connected between the data storage unit and the data reading unit, and the data operation and transformation unit is configured to further perform operation and transformation processing on the data fed back by the data storage unit, and then provide the data to the Data reading unit.
  • the data storage unit is further configured to mask its own memory operation on the processor while processing data read from the read port.
  • the loopback structure further includes: an operation unit, connected to the register file unit, configured to read a source operand from the register file unit, perform a data operation according to the source operand, and write the obtained operation result into the In the register file unit.
  • the data storage unit is further configured to read from a read port of the register file unit Determining an operation result of the source operand, and performing data transformation on the read operation result, and then feeding back to the data reading unit;
  • the data reading unit is further configured to: convert the operation result fed back by the data storage unit into the register file unit through a write port of the register file unit.
  • the data is transformed into a rotational shift operation of the data.
  • the present invention also provides a data loopback processing method for a processor, the method comprising: the data storage unit reading data from a read port of a register file unit, and performing data transformation on the read data and feeding back to the data reading unit ;
  • the data reading unit converts the data fed back by the data storage unit and writes it into the register file unit through the write port of the register file unit.
  • the method further includes:
  • a data operation and transformation unit is connected between the data storage unit and the data reading unit, and the data operation and transformation unit further performs operation and transformation processing on the data fed back by the data storage unit, and then provides the data reading. unit.
  • the method further includes:
  • the data storage unit masks its own memory operations on the processor while processing data read from the read port.
  • the method further includes:
  • An arithmetic unit connected to the register file unit reads a source operand from the register file unit, performs a data operation based on the source operand, and writes the obtained operation result into the register file unit.
  • the method further includes:
  • the data storage unit reads the operation result of the source operand from the read port of the register file unit, and performs data transformation on the read operation result and feeds back to the data reading unit;
  • the data reading unit converts the operation result fed back by the data storage unit into the register file unit through the write port of the register file unit.
  • the data is transformed into a rotational shift operation of the data.
  • the loopback structure and data loopback processing method of a processor provides instructions and channels directly from the data storage unit to the data reading unit, and the operation unit completes the operation by providing the instruction and the channel. After the data is converted by the data storage unit, the data is not directly written into the memory, but the data is looped back and fed back to the data reading unit.
  • the channel reuses the special data conversion functions of the data storage unit and the data reading unit (including the rotation of the data, etc.), and their read and write ports to the register file unit, and can be in the data storage unit and when needed.
  • Other data operation and transformation units are added between the data reading units; the channels are independent of the channels of "register file unit ⁇ arithmetic unit ⁇ register file unit", and operations can be performed in parallel, and they can work independently without affecting each other. .
  • the invention avoids the processor's access operation to the memory and the read/write conflict caused by the read and write memory, thereby effectively improving the working efficiency of the processor and reducing the power consumption of the processor.
  • 1 is a schematic diagram of an existing processor architecture
  • FIG. 2 is a first schematic diagram of a processor loopback structure in an embodiment of the present invention
  • FIG. 3 is a second schematic diagram of a processor loopback structure in an embodiment of the present invention.
  • FIG. 4 is a sequence diagram of data loopback processing of a processor in an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of independent front door channels and rear door channels in a processor loopback structure according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of a front door channel and a rear door channel forming a closed loop body in a processor loopback structure according to an embodiment of the present invention. detailed description
  • the loopback structure of a processor mainly includes: a register file unit, a data storage unit, and a data reading unit.
  • the register file unit is configured to provide data read and write services for the data storage unit and the data read unit; the data storage unit, the connection register file unit, for reading data from the read port of the register file unit, and reading The data is fed back to the data reading unit after data conversion; the data reading unit, the connection register file unit and the data storage unit are used to convert the data fed back by the data storage unit and then write to the register through the write port of the register file unit In the file unit.
  • a data operation and transformation unit is further connected between the data storage unit and the data reading unit, and the data operation and transformation unit is configured to further perform operation and transformation processing on the data fed back by the data storage unit, and then provide the data for reading. Take the unit.
  • the data storage unit needs to mask its own memory operations on the processor when processing data read from the read port.
  • the loopback structure may further include: an operation unit, a connection register file unit, configured to read the source operand from the register file unit, perform a data operation according to the source operand, and write the obtained operation result into the register file unit.
  • the data storage unit is further configured to: read the operation result of the source operand from the read port of the register file unit, and perform data transformation on the read operation result, and then feed back to the data reading unit;
  • the data reading unit is further configured to: convert the operation result fed back by the data storage unit into the register file unit through the write port of the register file unit.
  • the data loopback processing method of the processor provided by the present invention mainly includes:
  • the data storage unit reads data from the read port of the register file unit, and performs data transformation on the read data, and then feeds back to the data reading unit; After the data reading unit converts the data fed back by the data storage unit, it is written into the register file unit through the write port of the register file unit.
  • the method further comprises: the operation unit connecting the register file unit reads the source operand from the register file unit, performs a data operation according to the source operand, and writes the obtained operation result into the register file unit.
  • the data storage unit can read the operation result of the source operand from the read port of the register file unit, and perform data transformation on the read operation result and feed back to the data reading unit; the data reading unit pairs the data storage unit The result of the feedback operation is converted and written into the register file unit through the write port of the register file unit.
  • the result of the operation may or may not be. If, in the specific implementation, only want to utilize the special data conversion function of the data storage unit and the data reading unit to avoid the operation of the memory, the data storage unit may not read the operation result of the operation unit from the register file unit.
  • the present invention provides an instruction and a channel directly from the data storage unit to the data reading unit.
  • the instruction and the channel By providing the instruction and the channel, after the operation unit completes the operation and the data is converted by the data storage unit, it is not directly The data is written to the memory, and the data is looped back and fed back to the data reading unit.
  • the channel reuses the special data conversion functions of the data storage unit and the data reading unit (including the rotation of the data, etc.), and their read and write ports to the register file unit; the channel and the "register file unit ⁇ arithmetic unit
  • the channels of the "register file unit" are independent of each other and can be operated in parallel, and they can work independently without affecting each other.
  • the channel of "register file unit ⁇ data storage unit ⁇ data reading unit ⁇ register file unit” and the channel of "register file unit ⁇ arithmetic unit ⁇ register file unit” can also work in coordination. Form a closed loop body. The following description will be made in conjunction with specific embodiments.
  • a processor loopback structure provided by the embodiment of the present invention mainly includes: a data reading unit, a register file unit, an arithmetic unit, and a data storage unit, wherein the first write port (the read port 1 shown in the figure), the arithmetic unit, and the first write port of the register file unit are That is, the data channel of the write port 1) shown in the figure constitutes the front gate channel, via the second read port of the register file unit (ie, the read port 2 shown in the figure), the data storage unit, the data read unit, and the register file unit.
  • the data channel of the second write port ie, write port 2 shown in the figure
  • the dashed arrow in Figure 2 shows the path of the data loopback.
  • the operation unit is configured to read the source operand from the read port 1 of the register file unit, perform data operation according to the source operand, and write the obtained operation result into the register file unit through the write port 1 of the register file unit;
  • a data storage unit configured to read the operation result through the read port 2 of the register file unit, and perform data transformation on the operation result, and then feed back to the data reading unit;
  • a data reading unit configured to convert the data fed back by the data storage unit and write the data into the register file unit through the write port 2 of the register file unit;
  • a register file unit that is used to provide read and write services for data for an arithmetic unit, a data storage unit, and a data reading unit.
  • the present invention provides instructions and channels directly from the data storage unit to the data reading unit (ie, the back door). aisle). That is, by providing the instruction and the channel, after the operation unit completes the operation and transforms the data through the data storage unit, the data is not directly written into the memory, but the data is looped back and fed back to the data reading unit.
  • the backdoor channel reuses special data conversion functions (such as rotation of data, etc.) of the data storage unit and data read unit, and their read and write ports to the register file unit. This data feedback strategy avoids processor access to memory and read and write conflicts due to read and write memory.
  • FIG. 3 The processor loopback structure of this embodiment is as shown in FIG. 3, wherein the first write port of the register file unit (ie, the read port 1 shown in the figure), the arithmetic unit, and the first write port of the register file unit (ie, The data channel of the write port 1) shown in the figure constitutes the front gate channel, via the second read port of the register file unit (ie, the read port 2 shown in the figure), the data storage unit, the data operation and transformation unit, the data reading unit, The data channel of the second write port of the register file unit (ie, write port 2 shown in the figure) constitutes the back door channel.
  • Figure 4 shows the instruction pipeline of the data loopback processing of the processor.
  • the instruction pipeline of the data loopback is started by the arithmetic unit reading data from the register file unit until the data is written back to the register file unit by the data read unit. A total of N clock cycles are required, and each cycle corresponds to a primary pipeline. The function of each pipeline is described below:
  • the first stage pipeline (Stage 1, or pipeline 1): The arithmetic unit reads the source operand from the read port 1 of the register file unit;
  • Stage 2-N-4 The arithmetic unit performs data operations based on the source operands
  • Stage N-3 The operation unit writes the obtained operation result to the register file unit through the write port 1 of the register file unit;
  • Stage N-2 The data storage unit reads the operation result through the read port 2 of the register file unit, and performs data conversion (such as rotation of the data) on the operation result and then puts it on the data storage bus;
  • Stage N-1 The data operation and transformation unit acquires data from the data storage bus, and after further calculation and transformation processing, copies the processed data onto the data read bus; meanwhile, the data storage unit needs to be shielded from the memory. Operation
  • Stage N The data reading unit acquires data from the data read bus, performs data conversion on the acquired data (such as rotational shift of data), and writes it to the register file unit through the write port 2 of the register file unit.
  • the front gate channel (register file unit ⁇ arithm unit ⁇ register file unit) and the backdoor channel (register file unit ⁇ data storage unit ⁇ data reading unit ⁇ register file unit) are in the entire processor.
  • the different stages of the pipeline so their operations are parallel and independent, they can operate on the same or different registers. That is, the registers used in the register file unit of the back gate channel and the front gate channel may be the same or different.
  • the current gate channel and the back gate channel operate on the same register (that is, the registers used by the front gate channel and the back gate channel in the register file unit), a closed loop body is formed between them, as shown in Fig. 6.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

本发明公开了一种处理器的环回结构及数据环回处理方法,环回结构包括:寄存器文件单元、数据存储单元、数据读取单元,其中,寄存器文件单元,用于为数据存储单元和数据读取单元提供数据的读写服务;数据存储单元,连接寄存器文件单元,用于从寄存器文件单元的读口读取数据,并对读取的数据进行数据变换后反馈到数据读取单元;数据读取单元,连接寄存器文件单元和数据存储单元,用于对数据存储单元反馈的数据进行变换后通过寄存器文件单元的写口写入到寄存器文件单元中。通过本发明,能够提高处理器的效率、降低处理器的功耗。

Description

一种处理器的环回结构及数据环回处理方法 技术领域
本发明涉及处理器的架构设计领域, 尤其涉及一种处理器的环回结构 及数据环回处理方法。 背景技术
处理器是芯片中的核心部件, 处理器的效率以及功耗在很大程度上影 响着整个芯片的效率和功耗, 因此在处理器架构的设计中, 需要考虑如何 提高处理器的效率和降低处理器的功耗。
如图 1所示, 在传统的处理器架构中, 提供以下三个数据通道: 通道一: 经由 "内存→数据读取单元→寄存器文件单元" 的数据通道; 通道二: 经由 "寄存器文件单元→运算单元→寄存器文件单元" 的数 据通道, 该通道也称前门通道;
通道三: 经由 "寄存器文件单元→数据存储单元→内存" 的数据通道。 在传统的处理器架构中, 运算开始之前首先要由数据读取单元将内存 中的操作数读入到寄存器文件单元中; 然后运算单元从寄存器文件单元中 读取操作数开始运算, 并将运算结果写回到寄存器文件单元中; 最后数据 存储单元从寄存器文件单元中读出运算结果, 并将运算结果存储到内存中。
在传统的处理器架构下, 数据运算虽然可以在 "寄存器文件单元→运 算单元→寄存器文件单元" 构成的前门通道内反复循环, 但是, 运算单元 只能进行算数逻辑运算, 而不能进行数据读取单元和数据存储单元提供的 特殊的数据变换(例如数据的旋转移位等)。 因此, 如果要进行数据读取单 元和数据存储单元提供的特殊的数据变换, 处理器就必须将数据写回到内 存中, 然后再从内存中读回来。 由于对内存的操作是会消耗处理器的功耗 及时间的, 因此在这种情况下, 如果处理器频繁的对内存进行读写, 会对 整个处理器的效率和功耗造成很大影响。 发明内容
有鉴于此, 本发明的主要目的在于提供一种处理器的环回结构及数据 环回处理方法, 以提高处理器的效率、 降低处理器的功耗。
为达到上述目的, 本发明的技术方案是这样实现的:
本发明提供了一种处理器的环回结构, 包括: 寄存器文件单元、 数据 存储单元、 数据读取单元, 其中,
所述寄存器文件单元, 用于为所述数据存储单元和数据读取单元提供 数据的读写服务;
所述数据存储单元, 连接所述寄存器文件单元, 用于从所述寄存器文 件单元的读口读取数据, 并对读取的数据进行数据变换后反馈到所述数据 读取单元;
所述数据读取单元, 连接所述寄存器文件单元和数据存储单元, 用于 对所述数据存储单元反馈的数据进行变换后通过寄存器文件单元的写口写 入到所述寄存器文件单元中。
所述数据存储单元和数据读取单元之间连接有数据运算和变换单元, 所述数据运算和变换单元, 用于对所述数据存储单元反馈的数据进一 步进行运算和变换处理后提供给所述数据读取单元。
所述数据存储单元进一步用于, 在处理从所述读口读取的数据时, 屏 蔽自身对处理器的内存操作。
所述环回结构还包括: 运算单元, 连接所述寄存器文件单元, 用于从 所述寄存器文件单元读取源操作数, 并根据源操作数进行数据运算, 将得 到的运算结果写入所述寄存器文件单元中。
所述数据存储单元进一步用于, 从所述寄存器文件单元的读口读取所 述源操作数的运算结果, 并对读取的运算结果进行数据变换后反馈到所述 数据读取单元;
相应的, 所述数据读取单元进一步用于, 对所述数据存储单元反馈的 运算结果进行变换后通过寄存器文件单元的写口写入到所述寄存器文件单 元中。
所述数据变换为数据的旋转移位操作。
本发明还提供了一种处理器的数据环回处理方法, 该方法包括: 数据存储单元从寄存器文件单元的读口读取数据, 并对读取的数据进 行数据变换后反馈到数据读取单元;
数据读取单元对数据存储单元反馈的数据进行变换后, 通过寄存器文 件单元的写口写入到寄存器文件单元中。
该方法进一步包括:
所述数据存储单元和数据读取单元之间连接有数据运算和变换单元, 所述数据运算和变换单元对所述数据存储单元反馈的数据进一步进行运算 和变换处理后提供给所述数据读取单元。
该方法进一步包括:
所述数据存储单元在处理从所述读口读取的数据时, 屏蔽自身对处理 器的内存操作。
该方法进一步包括:
连接所述寄存器文件单元的运算单元从所述寄存器文件单元读取源操 作数, 并根据源操作数进行数据运算, 将得到的运算结果写入所述寄存器 文件单元中。
该方法进一步包括:
所述数据存储单元从寄存器文件单元的读口读取所述源操作数的运算 结果, 并对读取的运算结果进行数据变换后反馈到所述数据读取单元; 所述数据读取单元对数据存储单元反馈的运算结果进行变换后通过寄 存器文件单元的写口写入到所述寄存器文件单元中。
所述数据变换为数据的旋转移位操作。
本发明所提供的一种处理器的环回结构及数据环回处理方法, 提供了 直接从数据存储单元到数据读取单元的指令和通道, 通过提供该指令和通 道, 在运算单元完成运算并通过数据存储单元的数据变换后, 并不直接将 数据写入内存, 而是将数据环回并反馈到数据读取单元。 该通道重用了数 据存储单元和数据读取单元的特殊的数据变换功能(包括数据的旋转移位 等等)、 以及它们对寄存器文件单元的读写端口, 并在需要时可以在数据存 储单元和数据读取单元之间加入其它的数据运算和变换单元;该通道与 "寄 存器文件单元→运算单元→寄存器文件单元" 的通道是相互独立的, 可以 并行执行操作, 它们可以互不影响的独立工作。
通过本发明, 避免了处理器对内存的存取操作、 以及由于读写内存造 成的读写沖突, 有效提高了处理器的工作效率、 降低了处理器的功耗。 附图说明
图 1为现有的处理器架构示意图;
图 2为本发明实施例中的处理器环回结构的示意图一;
图 3为本发明实施例中的处理器环回结构的示意图二;
图 4为本发明实施例中处理器的数据环回处理的时序图;
图 5 为本发明实施例的处理器环回结构中独立的前门通道和后门通道 的示意图;
图 6为本发明实施例的处理器环回结构中前门通道和后门通道形成关 闭循环体的示意图。 具体实施方式
下面结合附图和具体实施例对本发明的技术方案进一步详细阐述。 本发明所提供的一种处理器的环回结构, 主要包括: 寄存器文件单元、 数据存储单元、 数据读取单元。 其中, 寄存器文件单元, 用于为数据存储 单元和数据读取单元提供数据的读写服务; 数据存储单元, 连接寄存器文 件单元, 用于从寄存器文件单元的读口读取数据, 并对读取的数据进行数 据变换后反馈到数据读取单元; 数据读取单元, 连接寄存器文件单元和数 据存储单元, 用于对数据存储单元反馈的数据进行变换后通过寄存器文件 单元的写口写入到寄存器文件单元中。
较佳的, 数据存储单元和数据读取单元之间还可以连接有数据运算和 变换单元, 该数据运算和变换单元用于对数据存储单元反馈的数据进一步 进行运算和变换处理后提供给数据读取单元。
另外, 数据存储单元在处理从读口读取的数据时, 需要屏蔽自身对处 理器的内存操作。
进一步的, 环回结构还可以包括: 运算单元, 连接寄存器文件单元, 用于从寄存器文件单元读取源操作数, 并根据源操作数进行数据运算, 将 得到的运算结果写入寄存器文件单元中。
那么, 数据存储单元可进一步用于, 从寄存器文件单元的读口读取源 操作数的运算结果, 并对读取的运算结果进行数据变换后反馈到数据读取 单元;
相应的, 数据读取单元进一步用于, 对数据存储单元反馈的运算结果 进行变换后通过寄存器文件单元的写口写入到寄存器文件单元中。
本发明所提供的处理器的数据环回处理方法, 主要包括:
数据存储单元从寄存器文件单元的读口读取数据, 并对读取的数据进 行数据变换后反馈到数据读取单元; 数据读取单元对数据存储单元反馈的数据进行变换后, 通过寄存器文 件单元的写口写入到寄存器文件单元中。
较佳的, 该方法进一步包括: 连接寄存器文件单元的运算单元从寄存 器文件单元读取源操作数, 并根据源操作数进行数据运算, 将得到的运算 结果写入寄存器文件单元中。
那么相应的, 数据存储单元可以从寄存器文件单元的读口读取源操作 数的运算结果, 并对读取的运算结果进行数据变换后反馈到数据读取单元; 数据读取单元对数据存储单元反馈的运算结果进行变换后通过寄存器 文件单元的写口写入到寄存器文件单元中。
运算结果, 也可以不是。 如果在具体实施时, 只是想利用数据存储单元和 数据读取单元的特殊数据变换功能, 避免对内存的操作, 那么数据存储单 元从寄存器文件单元读取的可以不是运算单元的运算结果。
由此可以看出, 本发明提供了直接从数据存储单元到数据读取单元的 指令和通道, 通过提供该指令和通道, 在运算单元完成运算并通过数据存 储单元的数据变换后, 并不直接将数据写入内存, 而是将数据环回并反馈 到数据读取单元。 该通道重用了数据存储单元和数据读取单元的特殊的数 据变换功能(包括数据的旋转移位等等)、 以及它们对寄存器文件单元的读 写端口; 该通道与 "寄存器文件单元→运算单元→寄存器文件单元" 的通 道是相互独立的, 可以并行执行操作, 它们可以互不影响的独立工作。
需要说明的是, 本发明中 "寄存器文件单元→数据存储单元→数据读 取单元寄存器文件单元" 的通道, 与 "寄存器文件单元运算单元寄 存器文件单元" 的通道, 也可以协调工作, 形成关闭的循环体。 下面结合 具体实施例进行说明。
本发明实施例所提供的一种处理器环回结构, 如图 2所示, 主要包括: 数据读取单元、 寄存器文件单元、 运算单元、 数据存储单元, 其中, 经由 寄存器文件单元的第一读口 (即图中所示读口 1 )、 运算单元、 寄存器文件 单元的第一写口 (即图中所示写口 1 )的数据通道构成前门通道, 经由寄存 器文件单元的第二读口 (即图中所示读口 2 )、 数据存储单元、 数据读取单 元、 寄存器文件单元的第二写口 (即图中所示写口 2 )的数据通道构成后门 通道。 图 2中的虚线箭头示出了数据环回的路径。
运算单元, 用于从寄存器文件单元的读口 1读取源操作数, 并根据源 操作数进行数据运算, 将得到的运算结果通过寄存器文件单元的写口 1 写 入到寄存器文件单元中;
数据存储单元, 用于通过寄存器文件单元的读口 2读取运算结果, 并 对运算结果进行数据变换后反馈到数据读取单元;
数据读取单元, 用于对数据存储单元反馈的数据进行变换后通过寄存 器文件单元的写口 2写入到寄存器文件单元中;
寄存器文件单元, 用于为运算单元、 数据存储单元、 数据读取单元提 供数据的读写服务。
从图 2所示的处理器环回结构可以看出, 本发明为了提高处理器的效 率并减少处理器的功耗, 提供了直接从数据存储单元到数据读取单元的指 令和通道(即后门通道)。 即通过提供该指令和通道, 在运算单元完成运算 并通过数据存储单元的数据变换后, 并不直接将数据写入内存, 而是将数 据环回并反馈到数据读取单元。 后门通道重用了数据存储单元和数据读取 单元的特殊的数据变换功能(例如数据的旋转移位等)、 以及它们对寄存器 文件单元的读写端口。 这种数据反馈策略避免了处理器对内存的存取操作、 以及由于读写内存造成的读写沖突。
另外, 作为本发明的另一种实施例, 可以在数据存储单元和数据读取 单元之间增设其他部件(如数据运算和变换单元), 用以进行额外的数据运 算和数据变换。 该实施例的处理器环回结构如图 3 所示, 其中, 经由寄存 器文件单元的第一读口 (即图中所示读口 1 )、 运算单元、 寄存器文件单元 的第一写口 (即图中所示写口 1 )的数据通道构成前门通道, 经由寄存器文 件单元的第二读口 (即图中所示读口 2 )、 数据存储单元、 数据运算和变换 单元、 数据读取单元、 寄存器文件单元的第二写口 (即图中所示写口 2 )的 数据通道构成后门通道。 图 3 中的虚线箭头示出了数据环回的路径。 图 4 示出了处理器的数据环回处理的指令流水线, 数据环回的指令流水线由运 算单元从寄存器文件单元读取数据开始, 一直到数据通过数据读取单元写 回到寄存器文件单元结束, 总共需要 N个时钟周期, 每个周期对应一级流 水线, 下面对每级流水线的功能说明如下:
第一级流水线(Stage 1 , 或称流水线 1 ): 运算单元从寄存器文件单元 的读口 1读取源操作数;
Stage 2-N-4: 运算单元根据源操作数进行数据运算;
Stage N-3: 运算单元将得到的运算结果通过寄存器文件单元的写口 1 写入到寄存器文件单元中;
Stage N-2: 数据存储单元通过寄存器文件单元的读口 2读取运算结果, 并对运算结果进行数据变换(如数据的旋转移位)后放到数据存储总线上;
Stage N-1: 数据运算和变换单元从数据存储总线上获取数据, 并进行 进一步的运算和变换处理后, 将处理后的数据拷贝到数据读取总线上; 同 时, 数据存储单元需要屏蔽对内存的操作;
Stage N: 数据读取单元从数据读取总线上获取数据, 对获取的数据进 行数据变换(如数据的旋转移位)后通过寄存器文件单元的写口 2写入到 寄存器文件单元中。
假设 N=9, 这样一个环回指令就需要 9个周期才能完成操作。 在没有 环回指令的情况下, 要完成同样功能的操作, 需要额外增加对内存访问需 要的周期。 假设对内存的写操作需要 1个周期, 对内存的读操作需要 3个 周期, 这样一共就需要 13个周期。 由此可以看到, 在这种情况下, 利用数 据环回指令和环回结构可以将处理器的效率提高 30%左右。 也就是说, 本 发明所采用的环回结构使得所有数据可以在处理器核内部循环, 能有效提 高处理器的性能, 减少处理器的功耗。
需要说明的是, 如图 5 所示, 前门通道(寄存器文件单元→运算单元 →寄存器文件单元)和后门通道(寄存器文件单元→数据存储单元→数据 读取单元→寄存器文件单元)处于整个处理器流水线的不同阶段, 因此它 们的操作是并行的、 相互独立的, 它们可以对相同或不同的寄存器进行操 作。 即后门通道和前门通道在寄存器文件单元中所使用的寄存器可以相同, 也可以不同。 当前门通道和后门通道对相同寄存器进行操作时 (即前门通 道与后门通道在寄存器文件单元中所使用的寄存器一致), 它们之间就会形 成关闭的循环体, 如图 6所示。
如果只是想利用数据存储单元和数据读取单元的特殊的数据变换功 能, 避免对内存的操作, 可以不必形成图 6所示关闭的循环体。 但是, 如 果在数据量较小的某些运算中, 前门通道和后面通道形成的这种关闭的循 环体会使运算的数据完全在处理核内部循环, 并且使用了非常少的寄存器 文件资源。 可以将多个独立的运算封装起来, 填充整个环回结构的流水线。 在这种情况下, 可以进一步提高性能和减少功耗, 吞吐率比运算封装前可 提高六到七陪, 使得运算单元的使用率接近 100%。
以上所述, 仅为本发明的较佳实施例而已, 并非用于限定本发明的保 护范围。

Claims

权利要求书
1、 一种处理器的环回结构, 其特征在于, 包括: 寄存器文件单元、 数 据存储单元、 数据读取单元, 其中,
所述寄存器文件单元, 用于为所述数据存储单元和数据读取单元提供 数据的读写服务;
所述数据存储单元, 连接所述寄存器文件单元, 用于从所述寄存器文 件单元的读口读取数据, 并对读取的数据进行数据变换后反馈到所述数据 读取单元;
所述数据读取单元, 连接所述寄存器文件单元和数据存储单元, 用于 对所述数据存储单元反馈的数据进行变换后通过寄存器文件单元的写口写 入到所述寄存器文件单元中。
2、 根据权利要求 1所述处理器的环回结构, 其特征在于, 所述数据存 储单元和数据读取单元之间连接有数据运算和变换单元,
所述数据运算和变换单元, 用于对所述数据存储单元反馈的数据进一 步进行运算和变换处理后提供给所述数据读取单元。
3、 根据权利要求 1或 2所述处理器的环回结构, 其特征在于, 所述数 据存储单元进一步用于, 在处理从所述读口读取的数据时, 屏蔽自身对处 理器的内存操作。
4、 根据权利要求 1或 2所述处理器的环回结构, 其特征在于, 所述环 回结构还包括: 运算单元, 连接所述寄存器文件单元, 用于从所述寄存器 文件单元读取源操作数, 并根据源操作数进行数据运算, 将得到的运算结 果写入所述寄存器文件单元中。
5、 根据权利要求 4所述处理器的环回结构, 其特征在于, 所述数据存 储单元进一步用于, 从所述寄存器文件单元的读口读取所述源操作数的运 算结果, 并对读取的运算结果进行数据变换后反馈到所述数据读取单元; 相应的, 所述数据读取单元进一步用于, 对所述数据存储单元反馈的 运算结果进行变换后通过寄存器文件单元的写口写入到所述寄存器文件单 元中。
6、 根据权利要求 1或 2所述处理器的环回结构, 其特征在于, 所述数 据变换为数据的旋转移位操作。
7、 一种处理器的数据环回处理方法, 其特征在于, 该方法包括: 数据存储单元从寄存器文件单元的读口读取数据, 并对读取的数据进 行数据变换后反馈到数据读取单元;
数据读取单元对数据存储单元反馈的数据进行变换后, 通过寄存器文 件单元的写口写入到寄存器文件单元中。
8、 根据权利要求 7所述处理器的数据环回处理方法, 其特征在于, 该 方法进一步包括:
所述数据存储单元和数据读取单元之间连接有数据运算和变换单元, 所述数据运算和变换单元对所述数据存储单元反馈的数据进一步进行运算 和变换处理后提供给所述数据读取单元。
9、根据权利要求 7或 8所述处理器的数据环回处理方法,其特征在于, 该方法进一步包括:
所述数据存储单元在处理从所述读口读取的数据时, 屏蔽自身对处理 器的内存操作。
10、 根据权利要求 7或 8所述处理器的数据环回处理方法, 其特征在 于, 该方法进一步包括:
连接所述寄存器文件单元的运算单元从所述寄存器文件单元读取源操 作数, 并根据源操作数进行数据运算, 将得到的运算结果写入所述寄存器 文件单元中。
11、 根据权利要求 10所述处理器的数据环回处理方法, 其特征在于, 该方法进一步包括:
所述数据存储单元从寄存器文件单元的读口读取所述源操作数的运算 结果, 并对读取的运算结果进行数据变换后反馈到所述数据读取单元; 所述数据读取单元对数据存储单元反馈的运算结果进行变换后通过寄 存器文件单元的写口写入到所述寄存器文件单元中。
12、 根据权利要求 7或 8所述处理器的数据环回处理方法, 其特征在 于, 所述数据变换为数据的旋转移位操作。
PCT/CN2011/079663 2011-05-12 2011-09-15 一种处理器的环回结构及数据环回处理方法 WO2012151822A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/117,244 US20140156685A1 (en) 2011-05-12 2011-09-15 Loopback structure and data loopback processing method of processor
EP11865214.8A EP2709003B1 (en) 2011-05-12 2011-09-15 Loopback structure and data loopback processing method for processor

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2011101224025A CN102779023A (zh) 2011-05-12 2011-05-12 一种处理器的环回结构及数据环回处理方法
CN201110122402.5 2011-05-12

Publications (1)

Publication Number Publication Date
WO2012151822A1 true WO2012151822A1 (zh) 2012-11-15

Family

ID=47123946

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/079663 WO2012151822A1 (zh) 2011-05-12 2011-09-15 一种处理器的环回结构及数据环回处理方法

Country Status (4)

Country Link
US (1) US20140156685A1 (zh)
EP (1) EP2709003B1 (zh)
CN (1) CN102779023A (zh)
WO (1) WO2012151822A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE112013005188B4 (de) 2012-12-27 2023-08-03 Intel Corporation Prozessor und vefrahren zur vektorisierung von zusammengeführten, mehrfach geschachtelten schleifen

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107682446B (zh) * 2017-10-24 2020-12-11 新华三信息安全技术有限公司 一种报文镜像方法、装置及电子设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN85101348A (zh) * 1985-04-01 1987-01-31 坦德姆计算机有限公司 多数据通道的中央处理器(cpu)结构
US5293499A (en) * 1989-12-29 1994-03-08 Sun Microsystems, Inc. Apparatus for executing a RISC store and RI instruction pair in two clock cycles
CN1584824A (zh) * 2003-08-18 2005-02-23 上海海尔集成电路有限公司 一种基于cisc结构的微处理器构架及指令实现方式

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5781790A (en) * 1995-12-29 1998-07-14 Intel Corporation Method and apparatus for performing floating point to integer transfers and vice versa
US6393452B1 (en) * 1999-05-21 2002-05-21 Hewlett-Packard Company Method and apparatus for performing load bypasses in a floating-point unit
US6970996B1 (en) * 2000-01-04 2005-11-29 National Semiconductor Corporation Operand queue for use in a floating point unit to reduce read-after-write latency and method of operation
JP2003044273A (ja) * 2001-08-01 2003-02-14 Nec Corp データ処理装置及びデータ処理方法
WO2006018822A1 (en) * 2004-08-20 2006-02-23 Koninklijke Philips Electronics, N.V. Combined load and computation execution unit
US20070106883A1 (en) * 2005-11-07 2007-05-10 Choquette Jack H Efficient Streaming of Un-Aligned Load/Store Instructions that Save Unused Non-Aligned Data in a Scratch Register for the Next Instruction
KR101360812B1 (ko) * 2008-06-05 2014-02-11 삼성전자주식회사 반도체 장치 및 이를 포함하는 반도체 시스템
US9501286B2 (en) * 2009-08-07 2016-11-22 Via Technologies, Inc. Microprocessor with ALU integrated into load unit

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN85101348A (zh) * 1985-04-01 1987-01-31 坦德姆计算机有限公司 多数据通道的中央处理器(cpu)结构
US5293499A (en) * 1989-12-29 1994-03-08 Sun Microsystems, Inc. Apparatus for executing a RISC store and RI instruction pair in two clock cycles
CN1584824A (zh) * 2003-08-18 2005-02-23 上海海尔集成电路有限公司 一种基于cisc结构的微处理器构架及指令实现方式

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2709003A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE112013005188B4 (de) 2012-12-27 2023-08-03 Intel Corporation Prozessor und vefrahren zur vektorisierung von zusammengeführten, mehrfach geschachtelten schleifen

Also Published As

Publication number Publication date
EP2709003A4 (en) 2017-06-07
EP2709003A1 (en) 2014-03-19
US20140156685A1 (en) 2014-06-05
EP2709003B1 (en) 2018-08-01
CN102779023A (zh) 2012-11-14

Similar Documents

Publication Publication Date Title
US10515046B2 (en) Processors, methods, and systems with a configurable spatial accelerator
US10564980B2 (en) Apparatus, methods, and systems for conditional queues in a configurable spatial accelerator
US11307873B2 (en) Apparatus, methods, and systems for unstructured data flow in a configurable spatial accelerator with predicate propagation and merging
US10469397B2 (en) Processors and methods with configurable network-based dataflow operator circuits
CN109213723B (zh) 一种用于数据流图处理的处理器、方法、设备、及一种非暂时性机器可读介质
US10445451B2 (en) Processors, methods, and systems for a configurable spatial accelerator with performance, correctness, and power reduction features
US10417175B2 (en) Apparatus, methods, and systems for memory consistency in a configurable spatial accelerator
EP3726389B1 (en) Apparatuses, methods, and systems for memory interface circuit allocation in a configurable spatial accelerator
US10565134B2 (en) Apparatus, methods, and systems for multicast in a configurable spatial accelerator
US10416999B2 (en) Processors, methods, and systems with a configurable spatial accelerator
US10558575B2 (en) Processors, methods, and systems with a configurable spatial accelerator
US20190303297A1 (en) Apparatus, methods, and systems for remote memory access in a configurable spatial accelerator
EP3719654A1 (en) Apparatuses, methods, and systems for operations in a configurable spatial accelerator
US10678724B1 (en) Apparatuses, methods, and systems for in-network storage in a configurable spatial accelerator
US20190095383A1 (en) Processors, methods, and systems for debugging a configurable spatial accelerator
US10853073B2 (en) Apparatuses, methods, and systems for conditional operations in a configurable spatial accelerator
GB2503438A (en) Method and system for pipelining out of order instructions by combining short latency instructions to match long latency instructions
WO2020005447A1 (en) Apparatuses, methods, and systems for integrated control and data processing in a configurable spatial accelerator
EP3757814A1 (en) Apparatuses, methods, and systems for time-multiplexing in a configurable spatial accelerator
EP1422617A2 (en) Coprocessor architecture based on a split-instruction transaction model
US11907713B2 (en) Apparatuses, methods, and systems for fused operations using sign modification in a processing element of a configurable spatial accelerator
Gschwind et al. A novel SIMD architecture for the Cell heterogeneous chip-multiprocessor
US20130159666A1 (en) Reducing issue-to-issue latency by reversing processing order in half-pumped simd execution units
WO2012151822A1 (zh) 一种处理器的环回结构及数据环回处理方法
US7613905B2 (en) Partial register forwarding for CPUs with unequal delay functional units

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11865214

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 14117244

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2011865214

Country of ref document: EP