WO2013097793A1 - 一种片上多核数据传输方法和装置 - Google Patents

一种片上多核数据传输方法和装置 Download PDF

Info

Publication number
WO2013097793A1
WO2013097793A1 PCT/CN2012/087985 CN2012087985W WO2013097793A1 WO 2013097793 A1 WO2013097793 A1 WO 2013097793A1 CN 2012087985 W CN2012087985 W CN 2012087985W WO 2013097793 A1 WO2013097793 A1 WO 2013097793A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
data transmission
destination
spm
module
Prior art date
Application number
PCT/CN2012/087985
Other languages
English (en)
French (fr)
Inventor
张帅
焦帅
张�浩
范东睿
李海忠
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2013097793A1 publication Critical patent/WO2013097793A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package

Definitions

  • the present invention relates to the field of multi-core processor design, and in particular to a method and apparatus for on-chip data transmission of a multi-core processor. Background technique
  • the storage hierarchy is divided into level 1 cache, level 2 cache, and even multi-level cache and off-chip storage.
  • Level 1 cache is typically designed inside the processor core and is directly connected to the processor core's memory access module.
  • L2 caches and multi-level caches are generally designed to be shared by multiple or all processor cores.
  • the above caches are all on-chip caches, and there is no separate address space, which is invisible to the programmer.
  • This design is common in traditional single-core processors, and the hardware cache provides quick access to its mapped data. This shows that the traditional single-core processor cache does not have its own address space, and the SPM (Scratch-pad Memory, SPM for short) is a cache with its own address space.
  • Level 1 cache can be configured into a address space visible to the programmer through the software interface part.
  • access requests to the L2 cache and off-chip storage must be issued by the fetching component.
  • the programmer cannot directly issue the fetch request, but the fetch component retrieves it from the cache.
  • Data, but the maximum length of data transmitted by this method is generally the line width of the secondary cache.
  • common parallel applications today often require large-scale data transmission, such as FFT (Fast Fourier Transform), matrix multiplication, and so on. Therefore, the traditional on-chip cache data transmission method has become a bottleneck that limits the speed of calculation.
  • the existing on-chip cache cannot adjust the address of the data in the cache according to the algorithm being run.
  • the traditional cache is less spatially local; this design allows programmers to achieve controllable data transfer between local and remote according to their needs, thus improving cache utilization. Rate and spatial locality. Summary of the invention
  • An object of the present invention is to provide an on-chip data transmission method and apparatus which can greatly reduce the pressure of a network on a chip and implement program control of data size and position.
  • Step 100 Configure a data transmission device, generate a command stream for controlling the data transmission device by using a software interface, and send, by the processor core, the instruction stream to a data transmission device located inside the processor core, and perform data transmission device on the data transmission device through a software interface.
  • Step 200 The data transmission device receives the instruction stream, and combines operations for sending to a same SPM or a second level cache, and the data transmission device is encapsulated into a data packet that can be transmitted on an on-chip network;
  • Step 300 The sending module of the data transmission device queries the on-chip network, and parses the data address to give the coordinates of the destination SPM or the second-level cache. When the router indicates that the data can be transmitted, the sending module sequentially sends the data packet.
  • Step 400 The data transmission device receives the data returned by the destination SPM or the L2 cache or receives the peer signal and returns it to the control module until the return data or the number of the same signal is equal to the sent request, and the device control module processes the The core returns the operation completion signal.
  • the on-chip multi-core data transmission method is characterized in that: the step 100 further includes the following steps:
  • Step 110 Set the data block width, set the source data address, set the source data to one-dimensional length, set the source data to two-dimensional length, set the destination data address, set the destination data to one-dimensional length, and set the destination data to two-dimensional length. , setting the number of one-dimensional data, setting the number of two-dimensional data;
  • Step 120 Configure a control register and a data register of the data transmission device according to the instruction stream.
  • the on-chip multi-core data transmission method is characterized in that: the step 200 further includes the following steps: Step 210. Determine, by the control register, a data transmission type, and combine operations performed on the same remote SPM or L2 cache;
  • Step 220 Encapsulating, by the sending module, a data packet that can be transmitted on an on-chip network.
  • the on-chip multi-core data transmission method is characterized in that the step 300 further includes the following steps:
  • Step 310 The sending module directly sends the data packet with the destination coordinate to the local processor core to the local SPM, without using the on-chip network transmission;
  • Step 320 The control module of the data transmission device records the number of data packets sent.
  • the on-chip multi-core data transmission method is characterized in that the step 400 further includes the following steps:
  • Step 410 The receiving module receives data returned by the destination SPM or the second level cache according to an instruction in the data packet and writes the data to the local SPM;
  • Step 420 The receiving module receives a peer signal returned by the destination SPM or the second level cache, and returns a control module of the data transmission device.
  • Step 430 Determine whether the number of the returned data or the peer signal is equal to the sent request, and if so, execute step 440; otherwise, return to step 410;
  • Step 440 The control module returns the current operation completion signal to the processor core.
  • the on-chip multi-core data transmission method is characterized in that the data packet in the step 200 carries a data block width, a source data address, a source data length in one dimension, a source data two-dimensional length, a destination data address, and a destination.
  • the data is one-dimensionally long
  • the target data is two-dimensionally long
  • the number of one-dimensional data the number of two-dimensional data, the register module to which it belongs, and the routing coordinate information.
  • the on-chip multi-core data transmission method is characterized in that, in the step 310, if the destination coordinate is a local processor core, indicating that the data packet is transmitted by the local SPM to the local SPM, directly indicating the SPM operation, and does not need to be sent to online;
  • the destination coordinate is a remote processor core or a secondary cache
  • the destination coordinates are recorded in the packet, and the packet is sent to the network via the router and ultimately to the destination processor core or secondary cache.
  • the invention discloses an on-chip multi-core data transmission device, which comprises:
  • Generating an instruction stream module for configuring a data transmission device generating a command stream for controlling the data transmission device through a software interface, and transmitting, by the processor core, the instruction stream to a data transmission device located inside the processor core, and using the software interface to the data
  • the transmission device performs the following configuration for determining the data transmission type; Receiving an instruction stream module, configured to receive, by the data transmission device, the instruction stream, and send the same to the same piece
  • the operations of the SPM or the secondary cache are combined and encapsulated by the data transmission device into data packets that can be transmitted over the network on the chip;
  • a sending module configured to query the on-chip network by the data transmission device, and parse the data address, and give coordinates of the destination SPM or the second-level cache, and the sending module sequentially sends the data packet when the router indicates that the data can be transmitted;
  • a receiving module configured to receive, by the data transmission device, data returned by the destination SPM or the second level cache, until the number of return data or the same signal is equal to the sent request, the device control module returns the current operation completion signal to the processor core ;
  • the control module is configured to receive the same signal until the return data or the number of the same signal is equal to the sent request, and the device control module returns the operation completion signal to the processor core.
  • the on-chip multi-core data transmission device wherein the generating the instruction stream module further includes:
  • Set data module set data block width, set source data address, set source data one-dimensional length, set source data two-dimensional length, set destination data address, set destination data one-dimensional length, set destination data two-dimensional Long, set the number of one-dimensional data, set the number of two-dimensional data;
  • a register module for configuring a control register and a data register of the data transfer device according to the instruction stream.
  • the on-chip multi-core data transmission device is characterized in that: the receiving instruction stream module further includes:
  • An operation module configured to determine, by the control register, a data transmission type, and combine operations for sending to the same SPM or the second level cache;
  • the on-chip multi-core data transmission device is characterized in that: the sending module further includes: a sending data packet module, configured to send, by the sending module, a data packet whose destination coordinate is a local processor core to a local SPM, without Transmission via an on-chip network;
  • a packet module is recorded for the control module of the data transmission device to record a number of transmitted packets.
  • the receiving module further includes: a data writing module, configured to receive, by the receiving module, data returned by the destination SPM or the second level cache according to an instruction in the data packet and write the data to the local SPM;
  • a signal returning module configured to receive, by the receiving module, a peer signal returned by the destination SPM or the second level buffer, and return to the control module of the data transmission device;
  • the determining module is configured to determine whether the number of the returned data or the peer signal is equal to the sent request; the control module returns the current operation completion signal to the processor core.
  • the on-chip multi-core data transmission device is characterized in that: the data packet in the receiving instruction stream module carries a data block width, a source data address, a source data is one-dimensionally long, a source data is two-dimensionally long, and a destination data address is The destination data is one-dimensionally long, the target data is two-dimensionally long, the number of one-dimensional data, the number of two-dimensional data, the register module to which it belongs, and the routing coordinate information.
  • the on-chip multi-core data transmission device is characterized in that, in the sending data packet module, if the destination coordinate is a local processor core, indicating that the data packet is transmitted by the local SPM to the local SPM, directly indicating the SPM operation, and does not need to be sent. On the network;
  • the destination coordinate is a remote processor core or a secondary cache
  • the destination coordinates are recorded in the packet, and the packet is sent to the network via the router and ultimately to the destination processor core or secondary cache.
  • the invention has the beneficial effects that the multi-core processor of the present invention can use the programming method of overlapping calculation and communication to hide the on-chip communication delay in the calculation process, and the invention also alleviates the network caused by the burst large-scale data request.
  • FIG. 1 is a flow chart of a data transmission method according to the present invention.
  • FIG. 2 is a state transition diagram of a data transmission device according to the present invention.
  • FIG. 3 is a basic structural diagram of a data transmission device of the present invention.
  • Figure 4 is a flow chart showing the operation of a specific embodiment of the present invention. detailed description
  • the present invention provides the programmer with a programmable on-chip data transmission method for data occurring between the level 1 cache and the level 2 cache. Parallel and large-scale transfers can be implemented and data can be transferred between the Level 1 caches.
  • the present invention requires a level 1 cache to provide a programmer-visible address space, allowing data to be stored in this space, typically referred to as a notebook (Scratch-pad Memory, SPM for short).
  • a notebook Sctch-pad Memory
  • SPM Sctch-pad Memory
  • the present invention greatly reduces the pressure on the on-chip network and implements programmatic control over the size and location of the data.
  • the new data transmission technology that controls the width and number of data blocks through the programming interface can transmit data in two different lengths, which can also be called two-dimensional data transmission technology.
  • Step 100 Configuring a data transmission device: generating, by a software interface, an instruction stream for controlling the data transmission device, the processor core transmitting the instruction stream to a data transmission device located inside the processor core, the data transmission device receiving the instruction Flow, the data transmission device is configured as follows through a software interface;
  • Step 110 Set the data block width, set the source data address, set the source data to one-dimensional length, set the source data to two-dimensional length, set the destination data address, set the destination data to one-dimensional length, and set the destination data to two-dimensional length. , setting the number of one-dimensional data, setting the number of two-dimensional data;
  • Step 120 Configure a control register and a data register of the data transmission device according to the instruction stream.
  • the above control registers include an identification operation type register, a read completion register, a write completion register, an operation completion register, a return value register, and an idle status register.
  • the read completion register indicates whether all operations of reading the local SPM or reading the remote SPM and the second level cache are completed
  • the write completion register indicates whether the operation of writing the local SPM or writing the remote SPM and the second level cache is completed.
  • the completion register indicates whether all of the read and write operations have been completed.
  • the return value register indicates whether all operations have been completed (including transmission and return), and the idle status register indicates that the data transmission device is currently available.
  • the above data registers include: a data block width register, a source data address register, a source data one-dimensional length register, a source data two-dimensional length register, a destination data address register, a destination data one-dimensional length register, and a destination data two-dimensional length Register, one-dimensional data number register, two-dimensional data number register.
  • the source data address is sent
  • the memory saves the first address of the address where the data transmission occurs before, the source data one-dimensional length register stores the column interval address of the data matrix to be transmitted, and the source data two-dimensional length register stores the row interval address of the data matrix to be transmitted, the destination data address
  • the register holds the first address of the address where the data transfer occurs, the destination data one-dimensional length register stores the column interval address of the data matrix after the transfer, and the destination data two-dimensional length register stores the row interval address of the data matrix after the transmission, one-dimensional data
  • the number register holds the number of columns of the data matrix, and the two-dimensional data number register holds the number of rows of the data matrix.
  • the above data registers have the same three groups (not limited to three groups, in fact, can be more groups), and can process three sets of data transmission requests simultaneously.
  • Each set of register modules can store information about a data transfer task until the data transfer task is completed.
  • FIG. 2 is a state transition relationship of the data transmission device after receiving an instruction stream.
  • the idle state is the presence of available data and control register modules.
  • the configuration register state is receiving the instruction stream.
  • the configuration completion state is an active state, and the data transmission device starts to work, and performs the task of the above instruction flow configuration.
  • the busy state is that there is currently no free register module, and the processor core instruction stream needs to wait.
  • Step 200 Determine a data transmission type, combine operations for sending to the same SPM or L2 cache, and encapsulate the data transmission device into a data packet that can be transmitted on an on-chip network;
  • the step 200 includes the following steps:
  • Step 210 Determine, by the control register, a data transmission type, and combine operations for sending to the same SPM or the second level cache;
  • Step 220 Encapsulating, by the sending module, a data packet that can be transmitted on an on-chip network.
  • the data packet carries a data block width, a source data address, a one-dimensional length of the source data, a two-dimensional length of the source data, a destination data address, a one-dimensional length of the destination data, a two-dimensional length of the destination data, and a number of one-dimensional data. , the number of two-dimensional data, the register module to which it belongs, the routing coordinate information.
  • Step 300 The sending module of the data transmission device queries the on-chip network, and parses the data address to give the coordinates of the destination SPM or the second-level cache. When the router indicates that the data can be transmitted, the sending module sequentially sends the data packet.
  • the step 300 includes the following steps:
  • Step 310 The sending module directly sends the data packet whose destination coordinate is the local processor core to the local SPM, without using the network on the chip; In this step, if the destination coordinate is a local processor core, indicating that the data packet is transmitted by the local SPM to the local SPM, directly indicating the SPM operation, and does not need to be sent to the network. If the destination coordinate is a remote processor core or a secondary cache, the destination coordinates are recorded in the packet, and the packet is sent to the network via the router and ultimately to the destination processor core or secondary cache.
  • Step 320 The control module of the data transmission device records the number of data packets sent.
  • Step 400 The data transmission device receives the data or the peer signal returned by the destination SPM or the second level buffer until the number of the returned data or the peer signal is equal to the sent request, and the device control module returns the current to the processor core. Operation completion signal. The programmer can know through the software query that the operation has been completed.
  • the step 400 includes the following steps:
  • Step 410 The receiving module receives data returned by the destination SPM or the second level cache according to an instruction in the data packet and writes the data to the local SPM;
  • Step 420 The receiving module receives the peer signal returned by the destination SPM or the second level cache, and returns the signal to the control module of the data transmission device.
  • the receiving module has the feature that, for a remote read data request, the receiving module of the data transmission device receives the data packet returned by the remote SPM or the second level cache and parses the write to the local SPM.
  • the local SPM is the SPM directly connected to the device, and the remote SPM is the SPM connected to other cores.
  • the destination SPM is the SPM in which the data read in the read operation is located and the SPM to be written in the write operation, either remotely or Local SPM
  • the receiving module of the data transmission device receives the peer signal returned by the remote SPM or the second level cache.
  • Step 430 Determine whether the number of the returned data or the peer signal is equal to the sent request, and if so, execute step 440; otherwise, return to step 410;
  • Step 440 The control module returns the current operation completion signal to the processor core.
  • the present invention and the memory access component of the processor core, the device and the memory access unit share an on-chip network port and an on-chip network port.
  • the following operations can be included:
  • the device writes data in the local SPM to the remote SPM;
  • the device reads the data from the remote SPM into the local SPM;
  • the device reads the data in the local SPM into another location of the local SPM; (6) The device reads the data in the secondary cache into the local SPM;
  • Void put 12 data transfer from SPM to L2 cache, unsigned bw, ⁇ data width write data from local SPM to L2 cache
  • Step 1 As shown in Figure 4, if there is a free register module in the device, the device saves the parameters carried by the instruction stream in the data transmission device register module shown in Figure 3. . If all register modules are occupied, the processor core is instructed to stop sending instructions and is in a wait state.
  • Step 2 The device parses the parameters in the data transfer device register module, and each copy The address and width of the data are sent to the data transfer device control module shown in FIG.
  • Step 3 The control module determines whether the operation is a local operation or a remote operation.
  • the local operation includes the above (2) (5) operation type; the remote operation includes the above (1) (3) (4) (6) operation type.
  • Step 4 If the operation is a local operation type, it is judged as (2) (5) two operation types. These two types of operations are local data handling and are sent directly to the local SPM for processing. After the local SPM processing is completed, the corresponding register module of the device is cleared, and the next data transmission request can be processed.
  • Step 5 If the operation is a remote operation type, it is judged as (1) (3) (4) (6).
  • the control module analyzes the operating parameters, because the data required for the same operation is often located in multiple SPMs (this device can only read and write to the local SPM, and can also read and write to the remote SPM) or on the secondary cache. Therefore, the control module judges the operation of the register module distribution, and merges the operations to the same network packet that the same SPM (the packet sent to the local SPM does not leave the network) or the second level cache.
  • Step 6 The control module fills each packet with the network coordinates of the destination SPM or L2 cache. When the network packet is full and cannot receive more requests, the control module transfers the network packet to the data transmission device sending module shown in FIG.
  • Step 7 The data transmission device of the device transmits a module to detect a network status. If there is no component with a higher priority than the device to send a data packet to the network, the data transmission device sending module immediately sends the prepared network data packet to the network on the chip. .
  • Step 8 After receiving the data packet sent by the device, the remote SPM or the second level cache determines the type of the data packet, and if it is a read operation, returns data to the device; if it is a write operation, the data in the data packet is written. SPM or L2 cache, and then return the same signal to the device, the device clears the register module, can process the next data transmission request.
  • the multi-core processor of the present invention uses a programming method in which calculation and communication overlap, so that the on-chip communication delay can be hidden in the calculation process, and the present invention also alleviates the negative increase in network delay due to burst large-scale data request. influences.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multi Processors (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本发明公开了一种片上多核数据传输方法和装置,其特征在于,歩骤一配置数据传输装置,通过软件接口生成控制数据传输装置的指令流,由处理器核将所述指令流发送给位于处理器核内部的数据传输装置;歩骤二所述数据传输装置接收上述指令流,对发往同一片SPM或二级缓存的操作进行组合,由所述数据传输装置封装成可以在片上网络上传输的数据包;歩骤三所述数据传输装置的发送模块查询片上网络,并对数据地址进行解析,给出目的SPM或二级缓存的坐标;歩骤四所述数据传输装置接收由目的SPM或二级缓存返回的数据或接收同歩信号返回给控制模块,直至返回数据或同歩信号数目与发送的请求相等,本装置控制模块向处理器核返回本次操作完成信号。

Description

一种片上多核数据传输方法和装置
本申请要求于 2011 年 12 月 29 日提交中国专利局、 申请号为 201110451374.1发明名称为 "一种片上多核数据传输方法和装置"的中国专利 申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域
本发明涉及多核处理器设计领域,特别是涉及一种多核处理器的片上数据 传输方法和装置。 背景技术
在多核处理器设计中,对存储资源的访问是制约性能提高的主要因素。单 纯通过提高时钟频率和改进缓存策略已经不能满足运行大规模并行程序时对 访存带宽的要求。
传统多核处理器中, 存储层次分为一级缓存, 二级缓存, 甚至多级缓存和 片外存储。一级缓存一般设计在处理器核的内部, 与处理器核的访存模块直接 相连。二级缓存及多级缓存一般设计为多个或全部处理器核共享。上述缓存均 为片上缓存, 没有独立的地址空间, 对程序员来说是不可见的。这种设计在传 统的单核处理器中十分常见, 利用硬件缓存可以快速访问到其映射的数据。这 说明传统单核处理器的缓存没有自己的地址空间, 而本设计的 SPM ( Scratch-pad Memory, 简称 SPM) 是有自己地址空间的缓存。
目前,一级缓存可以通过软件接口部分配置成程序员可见的地址空间。但 在传统多核处理器设计中,对二级缓存和片外存储的访问请求则必须由访存部 件发出,程序员不能直接发出访存请求, 而是由访存部件从各级缓存中取回数 据,但这种方法传输的数据最长长度一般为二级缓存的行宽度。然而目前常见 的并行应用程序常常需要大规模的数据传输, 例如 FFT (快速傅立叶变换) , 矩阵乘法运算等。因此传统片上缓存的数据传输方法已经成为限制提升计算速 度的瓶颈。现有的片上缓存不能根据所运行的算法调整数据在缓存中的地址分 配, 对于具有本地缓存的多核处理器, 传统缓存的空间局部性较差; 而本设计 可以让程序员根据自己的需求实现本地和远程地之间可控的数据传输,从而提 高了缓存的利用率和空间局部性。 发明内容
为解决上述问题, 本发明设计了一种片上多核数据传输方法和装置。 本发明的目的在于提供一种片上数据传输方法和装置,其能够大大降低片 上网络的压力, 并对数据规模和位置实现编程控制。
为实现本发明的目的而提供的一种片上多核数据传输方法, 其特征在于, 包括下列歩骤:
歩骤 100, 配置数据传输装置, 通过软件接口生成控制数据传输装置的指 令流, 由处理器核将所述指令流发送给位于处理器核内部的数据传输装置,通 过软件接口对数据传输装置进行如下配置判断数据传输类型;
歩骤 200 , 所述数据传输装置接收上述指令流, 对发往同一片 SPM或二 级缓存的操作进行组合,由所述数据传输装置封装成可以在片上网络上传输的 数据包;
歩骤 300, 所述数据传输装置的发送模块查询片上网络, 并对数据地址进 行解析, 给出目的 SPM或二级缓存的坐标, 当路由器指示可以传输时发送模块 将数据包依次发送完毕;
歩骤 400, 所述数据传输装置接收由目的 SPM或二级缓存返回的数据或接 收同歩信号返回给控制模块, 直至返回数据或同歩信号数目与发送的请求相 等, 本装置控制模块向处理器核返回本次操作完成信号。
所述的片上多核数据传输方法, 其特征在于, 所述歩骤 100, 还包括下列 歩骤:
歩骤 110.设置数据块宽度, 设置源数据地址, 设置源数据一维歩长, 设 置源数据二维歩长, 设置目的数据地址, 设置目的数据一维歩长, 设置目的数 据二维歩长, 设置一维数据个数, 设置二维数据个数;
歩骤 120.根据指令流配置数据传输装置的控制寄存器和数据寄存器。 所述的片上多核数据传输方法, 其特征在于, 所述歩骤 200, 还包括下列 歩骤: 歩骤 210.由所述控制寄存器判断数据传输类型, 对发往同一片远程 SPM 或二级缓存的操作进行组合;
歩骤 220.由所述发送模块封装成可以在片上网络传输的数据包。
所述的片上多核数据传输方法, 其特征在于, 所述歩骤 300, 还包括下列 歩骤:
歩骤 310.所述发送模块把目的坐标为本地处理器核的数据包直接发给本 地 SPM, 无需通过片上网络传输;
歩骤 320. 所述数据传输装置的控制模块记录发送的数据包数目。
所述的片上多核数据传输方法, 其特征在于, 所述歩骤 400, 还包括下列 歩骤:
歩骤 410. 所述接收模块接收由目的 SPM或二级缓存根据数据包中的指令 返回的数据并写入本地 SPM;
歩骤 420. 所述接收模块接收由目的 SPM或二级缓存返回的同歩信号, 并 返回该数据传输装置的控制模块;
歩骤 430.判断所述返回数据或同歩信号数目与发送的请求是否相等, 若 是, 执行歩骤 440; 否则, 返回歩骤 410 ;
歩骤 440. 所述控制模块向处理器核返回本次操作完成信号。
所述的片上多核数据传输方法, 其特征在于,所述歩骤 200中的数据包携 带数据块宽度, 源数据地址, 源数据一维歩长, 源数据二维歩长, 目的数据地 址, 目的数据一维歩长, 目的数据二维歩长, 一维数据个数, 二维数据个数, 所属寄存器模块, 路由坐标信息。
所述的片上多核数据传输方法, 其特征在于, 所述歩骤 310中, 如果目的 坐标是本地处理器核, 说明数据包是本地 SPM传到本地 SPM的, 直接指示 SPM 操作, 不需要发送到网络上;
如果目的坐标是远程处理器核或二级缓存, 目的坐标将记录在数据包内, 数据包通过路由器发送至网络上并最终送至目的处理器核或二级缓存。
本发明公开一种片上多核数据传输装置, 其特征在于, 包括:
生成指令流模块, 用于配置数据传输装置,通过软件接口生成控制数据传 输装置的指令流,由处理器核将所述指令流发送给位于处理器核内部的数据传 输装置, 通过软件接口对数据传输装置进行如下配置用于判断数据传输类型; 接收指令流模块, 用于所述数据传输装置接收上述指令流,对发往同一片
SPM或二级缓存的操作进行组合, 由所述数据传输装置封装成可以在片上网络 上传输的数据包;
发送模块,用于所述数据传输装置查询片上网络,并对数据地址进行解析, 给出目的 SPM或二级缓存的坐标,当路由器指示可以传输时发送模块将数据包 依次发送完毕;
接收模块, 用于所述数据传输装置接收由目的 SPM或二级缓存返回的数 据,直至返回数据或同歩信号数目与发送的请求相等, 本装置控制模块向处理 器核返回本次操作完成信号;
控制模块, 用于接收同歩信号,直至返回数据或同歩信号数目与发送的请 求相等, 本装置控制模块向处理器核返回本次操作完成信号。
所述的片上多核数据传输装置, 其特征在于, 所述生成指令流模块, 还包 括:
设置数据模块, 用于设置数据块宽度, 设置源数据地址, 设置源数据一维 歩长, 设置源数据二维歩长, 设置目的数据地址, 设置目的数据一维歩长, 设 置目的数据二维歩长, 设置一维数据个数, 设置二维数据个数;
寄存器模块,用于根据指令流配置数据传输装置的控制寄存器和数据寄存 器。
所述的片上多核数据传输装置, 其特征在于, 所述接收指令流模块, 还包 括:
操作模块, 用于由所述控制寄存器判断数据传输类型, 对发往同一片 SPM 或二级缓存的操作进行组合;
封装数据包模块,用于由所述发送模块封装成可以在片上网络传输的数据 包。
所述的片上多核数据传输装置, 其特征在于, 所述发送模块, 还包括: 发送数据包模块,用于所述发送模块把目的坐标为本地处理器核的数据包 直接发给本地 SPM, 无需通过片上网络传输;
记录数据包模块,用于所述数据传输装置的控制模块记录发送的数据包数 圈。
所述的片上多核数据传输方法, 其特征在于, 所述接收模块, 还包括: 数据写入模块,用于所述接收模块接收由目的 SPM或二级缓存根据数据包 中的指令返回的数据并写入本地 SPM;
信号返回模块,用于所述接收模块接收由目的 SPM或二级缓存返回的同歩 信号, 并返回给数据传输装置的控制模块;
判断模块, 用于判断所述返回数据或同歩信号数目与发送的请求是否相 等; 所述控制模块向处理器核返回本次操作完成信号。
所述的片上多核数据传输装置, 其特征在于,所述接收指令流模块中的数 据包携带数据块宽度, 源数据地址, 源数据一维歩长, 源数据二维歩长, 目的 数据地址, 目的数据一维歩长, 目的数据二维歩长, 一维数据个数, 二维数据 个数, 所属寄存器模块, 路由坐标信息。
所述的片上多核数据传输装置, 其特征在于, 所述发送数据包模块中, 如 果目的坐标是本地处理器核, 说明数据包是本地 SPM传到本地 SPM的, 直接指 示 SPM操作, 不需要发送到网络上;
如果目的坐标是远程处理器核或二级缓存, 目的坐标将记录在数据包内, 数据包通过路由器发送至网络上并最终送至目的处理器核或二级缓存。
本发明的有益效果是:采用本发明的多核处理器使用计算与通信重叠的编 程方式, 可以使由片上通信延迟隐藏在计算过程中, 本发明还缓解了因为爆发 式大规模数据请求造成的网络延迟增加的负面影响。 附图说明
图 1为本发明数据传输方法流程图;
图 2为本发明数据传输装置状态转换图;
图 3为本发明数据传输装置基本结构图;
图 4为本发明具体实施方式工作流程图。 具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白, 以下结合附图及实 施例,对本发明的一种片上数据传输方法和装置进行进一歩详细说明。应当理 解, 此处所描述的具体实施例仅仅用以解释本发明, 并不用于限定本发明。
为了提高访存带宽,大规模数据传输的并行应用程序所需数据应具有较强 的连续性和规律性, 有利于程序员不用在存储层次上调度数据, 因此, 本发明 给程序员提供一个可编程的片上数据传输方法,使发生在一级缓存和二级缓存 之间的数据可以实现并行和大规模传输, 并可以在一级缓存之间传输数据。
本发明需要一级缓存提供程序员可见的地址空间,允许数据被指定存放在 这块空间内, 通常这种缓存被称为笔记本(Scratch-pad Memory , 简称 SPM) 。 本发明通过将发往同一个二级缓存的读写请求整理成一次或少数几次请求,大 大降低片上网络的压力, 并对数据规模和位置实现编程控制。通过编程接口控 制数据块宽度和数量的新型数据传输技术可以同时按照两种歩长来传输数据, 也可称为二维数据传输技术。
下面结合上述目标详细介绍本发明的片上数据传输方法, 所述方法,包括 下列歩骤:
歩骤 100.配置数据传输装置: 通过软件接口生成控制数据传输装置的指 令流, 由处理器核将所述指令流发送给位于处理器核内部的数据传输装置,所 述数据传输装置接收上述指令流, 通过软件接口对数据传输装置进行如下配 置;
歩骤 110.设置数据块宽度, 设置源数据地址, 设置源数据一维歩长, 设 置源数据二维歩长, 设置目的数据地址, 设置目的数据一维歩长, 设置目的数 据二维歩长, 设置一维数据个数, 设置二维数据个数;
歩骤 120.根据指令流配置数据传输装置的控制寄存器和数据寄存器。 上述控制寄存器包括标识操作类型寄存器,读完成寄存器,写完成寄存器, 操作完成寄存器, 返回值寄存器, 空闲状态寄存器。其中读完成寄存器指示本 次读本地 SPM或读远程 SPM及二级缓存的操作是否全部发送完成,写完成寄存 器指示本次写本地 SPM或写远程 SPM及二级缓存的操作是否全部发送完成,操 作完成寄存器指示本次读写操作是否全部发送完成,返回值寄存器指示所有操 作是否全部完成(包括发送和返回) , 空闲状态寄存器指示数据传输装置当前 为可用状态。
上述数据寄存器包括: 数据块宽度寄存器, 源数据地址寄存器, 源数据一 维歩长寄存器, 源数据二维歩长寄存器, 目的数据地址寄存器, 目的数据一维 歩长寄存器, 目的数据二维歩长寄存器, 一维数据个数寄存器, 二维数据个数 寄存器。其中数据块宽度寄存器保存数据占用通信链路的位宽, 源数据地址寄 存器保存数据传输发生前所在地址的首地址,源数据一维歩长寄存器保存所要 传输数据矩阵的列间隔地址,源数据二维歩长寄存器保存所要传输数据矩阵的 行间隔地址, 目的数据地址寄存器保存数据传输发生后所在地址的首地址, 目 的数据一维歩长寄存器保存传输后数据矩阵的列间隔地址,目的数据二维歩长 寄存器保存传输后数据矩阵的行间隔地址,一维数据个数寄存器保存数据矩阵 的列数, 二维数据个数寄存器保存数据矩阵的行数。
上述数据寄存器有相同 3组(不仅限于 3组, 其实可以为更多组) , 能够 同时处理 3组数据传输请求。每一组寄存器模块可以存放一次数据传输任务的 信息, 直到该数据传输任务完成。
图 2中为所述数据传输装置在接收指令流后的状态转换关系。
空闲状态为存在可用的数据和控制寄存器模块。
配置寄存器状态为正在接收指令流。
配置完成状态为工作态, 此时所述数据传输装置开始工作,执行上述指令 流配置的任务。
繁忙状态为目前没有空闲寄存器模块, 处理器核指令流需要等待。
歩骤 200.判断数据传输类型, 对发往同一片 SPM或二级缓存的操作进行 组合, 由所述数据传输装置封装成可以在片上网络上传输的数据包;
所述歩骤 200, 包括下列歩骤:
歩骤 210.由所述控制寄存器判断数据传输类型, 对发往同一片 SPM或二 级缓存的操作进行组合;
歩骤 220.由所述发送模块封装成可以在片上网络传输的数据包。
所述数据包携带数据块宽度, 源数据地址, 源数据一维歩长, 源数据二维 歩长, 目的数据地址, 目的数据一维歩长, 目的数据二维歩长,一维数据个数, 二维数据个数, 所属寄存器模块, 路由坐标信息。
歩骤 300.所述数据传输装置的发送模块查询片上网络, 并对数据地址进 行解析, 给出目的 SPM或二级缓存的坐标, 当路由器指示可以传输时发送模块 将数据包依次发送完毕。
所述歩骤 300, 包括下列歩骤:
歩骤 310.所述发送模块把目的坐标为本地处理器核的数据包直接发给本 地 SPM, 无需通过片上网络传输; 该歩骤中, 如果目的坐标是本地处理器核, 说明数据包是本地 SPM传到本 地 SPM的, 直接指示 SPM操作, 不需要发送到网络上。如果目的坐标是远程处 理器核或二级缓存, 目的坐标将记录在数据包内, 数据包通过路由器发送至网 络上并最终送至目的处理器核或二级缓存。
歩骤 320. 所述数据传输装置的控制模块记录发送的数据包数目。
歩骤 400.所述数据传输装置接收由目的 SPM或二级缓存返回的数据或同 歩信号,直至返回数据或同歩信号数目与发送的请求相等, 本装置控制模块向 处理器核返回本次操作完成信号。程序员通过软件查询方式可得知本次操作已 经完成。
所述歩骤 400, 包括下列歩骤:
歩骤 410. 所述接收模块接收由目的 SPM或二级缓存根据数据包中的指令 返回的数据并写入本地 SPM;
歩骤 420. 所述接收模块接收由目的 SPM或二级缓存返回的同歩信号, 并 返回给数据传输装置的控制模块。
所述接收模块具有以下特征:对于远程读数据请求,所述数据传输装置的 接收模块接收远程 SPM或二级缓存返回的数据包, 并解析写入本地 SPM。 (本 地 SPM是与本装置直接连接的 SPM,远程 SPM是与其他核相连的 SPM, 目的 SPM 是读操作中所读数据所在 SPM和写操作中将要写入的 SPM, 既可以是远程也可 以是本地的 SPM)对于远程写数据请求, 所述数据传输装置的接收模块接收远 程 SPM或二级缓存返回的同歩信号。
歩骤 430.判断所述返回数据或同歩信号数目与发送的请求是否相等, 若 是, 执行歩骤 440; 否则, 返回歩骤 410 ;
歩骤 440. 所述控制模块向处理器核返回本次操作完成信号。
本发明与处理器核的访存部件,本装置和访存部件共用片上网络端口和片 上网络端口连接。 在接收来自处理器核的指令时, 可以包括以下几种操作:
( 1 ) 本装置将本地 SPM中的数据写入远程 SPM;
( 2 ) 本装置将本地 SPM中的数据写入本地 SPM的另一位置;
( 3 ) 本装置将本地 SPM中的数据写入二级缓存;
( 4) 本装置从远程 SPM中的数据读入本地 SPM;
( 5 ) 本装置将本地 SPM中的数据读入到本地 SPM的另一位置; ( 6 ) 本装置将二级缓存中的数据读入本地 SPM;
本发明的具体实施方法如下, 当程序员将参数输入编程接口后, 通过编译 器生成汇编指令。 当程序执行到此处时, 指令通过处理器访存部件将参数传递 给本装置。
接口 描述
bool get_done() 査询装置状态, 所有操作完成返
回 1, 否则返回 0
void get_spm( 从 SPM到 SPM的数据传输, 从
unsigned bw,〃数据宽度 远程或本地 SPM读数据到本地
void* src addr, //数据源地址首地址 SPM
unsigned src—rec— stride,〃数据源地址歹 (J unsigned src—blk— stride,〃数据源地址行
步长
void* dst_addr, //数据目的地址首地址
unsigned dst—rec— stride,〃数据目的地址
列步长
unsigned dst—blk— stride, //数据目的地址
行步长
unsigned rec ent,〃所传输二维数据歹 Ll数
unsigned blk_cnt//所传输二维数据行数)
void put_spm( 从 SPM到 SPM的数据传输, 从
unsigned bw,〃数据宽度 本地 SPM写数据到远程 SPM或
void* src addr, //数据源地址首地址 本地 SPM另一地址
unsigned src—rec— stride,〃数据源地址歹 Ll
IX
少 15
unsigned src—blk— stride, //数据源地址行
步长
void* dst_addr, //数据目的地址首地址
unsigned dst—rec— stride,〃数据目的地址
列步长
unsigned dst—blk— stride, //数据目的地址
行步长
unsigned rec ent,〃所传输二维数据歹 Ll数 unsigned blk_cnt//所传输二维数据行数;)
void put 12( 从 SPM到二级缓存的数据传输, unsigned bw,〃数据宽度 从本地 SPM写数据到二级缓存
void* src addr, //数据源地址首地址
unsigned src—rec— stride,〃数据源地址歹 ij unsigned src—blk— stride,〃数据源地址行 void* dst_addr,〃数据目的地址首地址
unsigned dst—rec— stride,〃数据目的地址
列步长
unsigned dst—blk— stride, //数据目的地址
行步长
unsigned rec ent, //所传输二维数据列数
unsigned blk_cnt//所传输二维数据行数;)
void get 12( 从二级缓存到 SPM的数据传输, unsigned bw,〃数据宽度 从二级缓存读数据到本地 SPM
void* src addr, //数据源地址首地址
unsigned src—rec— stride,〃数据源地址歹 Ll unsigned src—blk— stride,〃数据源地址行 void* dst_addr, //数据目的地址首地址
unsigned dst—rec— stride,〃数据目的地址
列步长
unsigned dst—blk— stride, //数据目的地址
行步长
unsigned rec ent,〃所传输二维数据歹 Ll数
unsigned blk_cnt//所传输二维数据行数) 步骤 1 : 如图 4所示, 若本装置存在空闲寄存器模块, 本装置将指令流携 的参数保存在图 3所示的数据传输装置寄存器模块中。若所有寄存器模块都 占用, 则指示处理器核停止发送指令, 处于等待状态。
步骤 2 : 本装置对数据传输装置寄存器模块中的参数进行解析, 将每一份 数据的地址和宽度都发送至图 3所示的数据传输装置控制模块中。 歩骤 3: 控制模块判断该操作为本地操作还是远程操作。 本地操作包括上 述 (2 ) ( 5 ) 操作类型; 远程操作包括上述 (1 ) ( 3 ) (4) ( 6) 操作类型。
歩骤 4: 若操作为本地操作类型, 经判断为 (2 ) ( 5 ) 两种操作类型。 这 两种操作类型为本地数据搬运, 直接发送给本地 SPM处理。本地 SPM处理完成 后清空本装置相应的寄存器模块, 可以处理下一个数据传输请求。
歩骤 5: 若操作为远程操作类型, 经判断为 (1 ) ( 3 ) (4) ( 6) 两种类 型。控制模块对操作参数进行分析, 因为同一次操作所需要的数据往往位于多 个 SPM (本装置可以只对本地 SPM进行读写操作, 也可以对远程 SPM进行读写 操作)上或二级缓存上, 所以控制模块将寄存器模块分发的操作进行判断, 将 去往同一个 SPM (发往本地 SPM的数据包不走网络)或二级缓存的操作合并在 同一个网络数据包。
歩骤 6: 控制模块为每个数据包填充目的 SPM或二级缓存的网络坐标。 当 网络数据包已满无法接收更多的请求时, 控制模块将该网络数据包转移至图 3 所示的数据传输装置发送模块。
歩骤 7: 本装置的数据传输装置发送模块检测网络状态, 若没有优先级高 于本装置的部件向网络上发送数据包,数据传输装置发送模块立即将准备好的 网络数据包发送至片上网络。
歩骤 8: 远程 SPM或二级缓存接收到本装置发出的数据包后, 判断数据包 的类型, 若为读操作则返回数据给本装置; 若为写操作则将数据包中的数据写 入 SPM或二级缓存, 然后返回同歩信号给本装置, 本装置清空寄存器模块, 可 以处理下一个数据传输请求。
有益效果: 采用本发明的多核处理器使用计算与通信重叠的编程方式, 可 以使由片上通信延迟隐藏在计算过程中,本发明还缓解了因为爆发式大规模数 据请求造成的网络延迟增加的负面影响。
通过结合附图对本发明具体实施例的描述,本发明的其它方面及特征对本 领域的技术人员而言是显而易见的。
以上对本发明的具体实施例进行了描述和说明,这些实施例应被认为其只 是示例性的, 并不用于对本发明进行限制, 本发明应根据所附的权利要求进行 解释。

Claims

权 利 要 求 书
1、 一种片上多核数据传输方法, 其特征在于, 包括下列歩骤:
歩骤 100, 配置数据传输装置, 通过软件接口生成控制数据传输装置的指 令流, 由处理器核将所述指令流发送给位于处理器核内部的数据传输装置,通 过软件接口对数据传输装置进行如下配置判断数据传输类型;
歩骤 200 , 所述数据传输装置接收上述指令流, 对发往同一片 SPM或二 级缓存的操作进行组合,由所述数据传输装置封装成可以在片上网络上传输的 数据包;
歩骤 300, 所述数据传输装置的发送模块查询片上网络, 并对数据地址进 行解析, 给出目的 SPM或二级缓存的坐标, 当路由器指示可以传输时发送模块 将数据包依次发送完毕;
歩骤 400, 所述数据传输装置接收由目的 SPM或二级缓存返回的数据或接 收同歩信号返回给控制模块, 直至返回数据或同歩信号数目与发送的请求相 等, 本装置控制模块向处理器核返回本次操作完成信号。
2、 如权利要求 1所述的片上多核数据传输方法, 其特征在于, 所述歩骤 100, 还包括下列歩骤:
歩骤 110.设置数据块宽度, 设置源数据地址, 设置源数据一维歩长, 设 置源数据二维歩长, 设置目的数据地址, 设置目的数据一维歩长, 设置目的数 据二维歩长, 设置一维数据个数, 设置二维数据个数;
歩骤 120.根据指令流配置数据传输装置的控制寄存器和数据寄存器。
3、 如权利要求 1所述的片上多核数据传输方法, 其特征在于, 所述歩骤 200, 还包括下列歩骤:
歩骤 210.由所述控制寄存器判断数据传输类型, 对发往同一片 SPM或二 级缓存的操作进行组合;
歩骤 220.由所述发送模块封装成可以在片上网络传输的数据包。
4、 如权利要求 1所述的片上多核数据传输方法, 其特征在于, 所述歩骤 300, 还包括下列歩骤:
歩骤 310.所述发送模块把目的坐标为本地处理器核的数据包直接发给本 地 SPM, 无需通过片上网络传输;
歩骤 320. 所述数据传输装置的控制模块记录发送的数据包数目。
5、 如权利要求 1所述的片上多核数据传输方法, 其特征在于, 所述歩骤 400, 还包括下列歩骤:
歩骤 410. 所述接收模块接收由目的 SPM或二级缓存根据数据包中的指令 返回的数据并写入本地 SPM;
歩骤 420. 所述接收模块接收由目的 SPM或二级缓存返回的同歩信号, 并 返回该数据传输装置的控制模块;
歩骤 430.判断所述返回数据或同歩信号数目与发送的请求是否相等, 若 是, 执行歩骤 440 ; 否则, 返回歩骤 410 ;
歩骤 440. 所述控制模块向处理器核返回本次操作完成信号。
6、 如权利要求 1所述的片上多核数据传输方法, 其特征在于, 所述歩骤 200中的数据包携带数据块宽度, 源数据地址, 源数据一维歩长, 源数据二维 歩长, 目的数据地址, 目的数据一维歩长, 目的数据二维歩长,一维数据个数, 二维数据个数, 所属寄存器模块, 路由坐标信息。
7、 如权利要求 4所述的片上多核数据传输方法, 其特征在于, 所述歩骤 310中, 如果目的坐标是本地处理器核, 说明数据包是本地 SPM传到本地 SPM 的, 直接指示 SPM操作, 不需要发送到网络上;
如果目的坐标是远程处理器核或二级缓存, 目的坐标将记录在数据包内, 数据包通过路由器发送至网络上并最终送至目的处理器核或二级缓存。
8、 一种片上多核数据传输装置, 其特征在于, 包括:
生成指令流模块, 用于配置数据传输装置,通过软件接口生成控制数据传 输装置的指令流,由处理器核将所述指令流发送给位于处理器核内部的数据传 输装置, 通过软件接口对数据传输装置进行如下配置用于判断数据传输类型; 接收指令流模块, 用于所述数据传输装置接收上述指令流,对发往同一片
SPM或二级缓存的操作进行组合, 由所述数据传输装置封装成可以在片上网络 上传输的数据包;
发送模块,用于所述数据传输装置查询片上网络,并对数据地址进行解析, 给出目的 SPM或二级缓存的坐标,当路由器指示可以传输时发送模块将数据包 依次发送完毕; 接收模块, 用于所述数据传输装置接收由目的 SPM或二级缓存返回的数 据,直至返回数据或同歩信号数目与发送的请求相等, 本装置控制模块向处理 器核返回本次操作完成信号;
控制模块, 用于接收同歩信号,直至返回数据或同歩信号数目与发送的请 求相等, 本装置控制模块向处理器核返回本次操作完成信号。
9、 如权利要求 8所述的片上多核数据传输装置, 其特征在于, 所述生成 指令流模块, 还包括:
设置数据模块, 用于设置数据块宽度, 设置源数据地址, 设置源数据一维 歩长, 设置源数据二维歩长, 设置目的数据地址, 设置目的数据一维歩长, 设 置目的数据二维歩长, 设置一维数据个数, 设置二维数据个数;
寄存器模块,用于根据指令流配置数据传输装置的控制寄存器和数据寄存 器。
10、 如权利要求 8所述的片上多核数据传输装置, 其特征在于, 所述接收 指令流模块, 还包括:
操作模块, 用于由所述控制寄存器判断数据传输类型, 对发往同一片 SPM 或二级缓存的操作进行组合;
封装数据包模块,用于由所述发送模块封装成可以在片上网络传输的数据 包。
11、 如权利要求 8所述的片上多核数据传输装置, 其特征在于, 所述发送 模块, 还包括:
发送数据包模块,用于所述发送模块把目的坐标为本地处理器核的数据包 直接发给本地 SPM, 无需通过片上网络传输;
记录数据包模块,用于所述数据传输装置的控制模块记录发送的数据包数 圈。
12、 如权利要求 8所述的片上多核数据传输方法, 其特征在于, 所述接收 模块, 还包括:
数据写入模块,用于所述接收模块接收由目的 SPM或二级缓存根据数据包 中的指令返回的数据并写入本地 SPM;
信号返回模块,用于所述接收模块接收由目的 SPM或二级缓存返回的同歩 信号, 并返回给数据传输装置的控制模块; 判断模块, 用于判断所述返回数据或同歩信号数目与发送的请求是否相 等; 所述控制模块向处理器核返回本次操作完成信号。
13、 如权利要求 8所述的片上多核数据传输装置, 其特征在于, 所述接收 指令流模块中的数据包携带数据块宽度, 源数据地址, 源数据一维歩长, 源数 据二维歩长, 目的数据地址, 目的数据一维歩长, 目的数据二维歩长, 一维数 据个数, 二维数据个数, 所属寄存器模块, 路由坐标信息。
14、 如权利要求 8所述的片上多核数据传输装置, 其特征在于, 所述发送 数据包模块中, 如果目的坐标是本地处理器核, 说明数据包是本地 SPM传到本 地 SPM的, 直接指示 SPM操作, 不需要发送到网络上;
如果目的坐标是远程处理器核或二级缓存, 目的坐标将记录在数据包内, 数据包通过路由器发送至网络上并最终送至目的处理器核或二级缓存。
PCT/CN2012/087985 2011-12-29 2012-12-31 一种片上多核数据传输方法和装置 WO2013097793A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110451374.1 2011-12-29
CN2011104513741A CN102567278A (zh) 2011-12-29 2011-12-29 一种片上多核数据传输方法和装置

Publications (1)

Publication Number Publication Date
WO2013097793A1 true WO2013097793A1 (zh) 2013-07-04

Family

ID=46412724

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/087985 WO2013097793A1 (zh) 2011-12-29 2012-12-31 一种片上多核数据传输方法和装置

Country Status (2)

Country Link
CN (1) CN102567278A (zh)
WO (1) WO2013097793A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567278A (zh) * 2011-12-29 2012-07-11 中国科学院计算技术研究所 一种片上多核数据传输方法和装置
CN105095147B (zh) 2014-05-21 2018-03-13 华为技术有限公司 片上网络的Flit传输方法及装置
CN104933009A (zh) * 2015-04-29 2015-09-23 中国人民解放军国防科学技术大学 一种用于多核dsp间的片上通信方法及数据通信装置
CN110413562B (zh) * 2019-06-26 2021-09-14 北京全路通信信号研究设计院集团有限公司 一种具有自适应功能的同步系统和方法
WO2021134521A1 (zh) * 2019-12-31 2021-07-08 北京希姆计算科技有限公司 一种存储管理装置及芯片
CN113138711B (zh) * 2020-01-20 2023-11-17 北京希姆计算科技有限公司 一种存储管理装置及芯片
CN111506541B (zh) * 2020-06-30 2020-09-22 翱捷科技(上海)有限公司 一种嵌入式网络设备中加速网络数据包处理的方法及系统
CN112052944A (zh) * 2020-08-13 2020-12-08 厦门壹普智慧科技有限公司 一种神经网络计算模块及人工智能处理系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0142626A1 (en) * 1983-08-26 1985-05-29 WILLI STUDER AG Fabrik für elektronische Apparate Apparatus for recording of digital data
CN1904868A (zh) * 2005-07-11 2007-01-31 商辉达股份有限公司 用于包化总线的组合包
US20100058024A1 (en) * 2008-09-01 2010-03-04 Sony Computer Entertainment Inc. Data Transfer Apparatus, Data Transfer Method And Processor
CN102207916A (zh) * 2011-05-30 2011-10-05 西安电子科技大学 一种基于指令预取的多核共享存储器控制设备
CN102567278A (zh) * 2011-12-29 2012-07-11 中国科学院计算技术研究所 一种片上多核数据传输方法和装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7099983B2 (en) * 2002-11-25 2006-08-29 Lsi Logic Corporation Multi-core communications module, data communications system incorporating a multi-core communications module, and data communications process
CN101290592B (zh) * 2008-06-03 2010-10-13 浙江大学 一种mpsoc上多道程序共享spm的实现方法
CN102262608A (zh) * 2011-07-28 2011-11-30 中国人民解放军国防科学技术大学 基于处理器核的协处理器读写操作控制方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0142626A1 (en) * 1983-08-26 1985-05-29 WILLI STUDER AG Fabrik für elektronische Apparate Apparatus for recording of digital data
CN1904868A (zh) * 2005-07-11 2007-01-31 商辉达股份有限公司 用于包化总线的组合包
US20100058024A1 (en) * 2008-09-01 2010-03-04 Sony Computer Entertainment Inc. Data Transfer Apparatus, Data Transfer Method And Processor
CN102207916A (zh) * 2011-05-30 2011-10-05 西安电子科技大学 一种基于指令预取的多核共享存储器控制设备
CN102567278A (zh) * 2011-12-29 2012-07-11 中国科学院计算技术研究所 一种片上多核数据传输方法和装置

Also Published As

Publication number Publication date
CN102567278A (zh) 2012-07-11

Similar Documents

Publication Publication Date Title
WO2013097793A1 (zh) 一种片上多核数据传输方法和装置
US10970131B2 (en) Host proxy on gateway
US7788334B2 (en) Multiple node remote messaging
US9658981B2 (en) Network interface card for a computing node of a parallel computer accelerated by general purpose graphics processing units, and related inter-node communication method
US11740946B2 (en) Gateway to gateway synchronisation
US11902149B2 (en) Sync network
KR101150928B1 (ko) 네트워크 아키텍처 및 이를 이용한 패킷 처리 방법
US11615038B2 (en) Data through gateway
CN104102542A (zh) 一种网络数据包处理方法和装置
US11455155B2 (en) Code compilation for scaling accelerators
US11550639B2 (en) Sync groupings
US20230054059A1 (en) Gateway Fabric Ports
US11327813B2 (en) Sync group selection
Li et al. High performance MPI datatype support with user-mode memory registration: Challenges, designs, and benefits
US11237882B2 (en) Streaming engine
Zhao et al. Hcma: Supporting high concurrency of memory accesses with scratchpad memory in fpgas
CN109408453B (zh) 一种高性能多节点互联并行传输控制方法
CN116049087A (zh) 一种基于片上网络的众核智能处理器通信架构设计方法
Fu et al. Direct distributed memory access for CMPs
CN115686637A (zh) 一种包含片上调度器的fpga异构处理架构

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12863575

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12863575

Country of ref document: EP

Kind code of ref document: A1