CN105373492A - Task flow-oriented register file-based fast data exchange structure - Google Patents

Task flow-oriented register file-based fast data exchange structure Download PDF

Info

Publication number
CN105373492A
CN105373492A CN201410406915.2A CN201410406915A CN105373492A CN 105373492 A CN105373492 A CN 105373492A CN 201410406915 A CN201410406915 A CN 201410406915A CN 105373492 A CN105373492 A CN 105373492A
Authority
CN
China
Prior art keywords
register file
computing node
task flow
data
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201410406915.2A
Other languages
Chinese (zh)
Inventor
何阳
米奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Huize Intellectual Property Operation Management Co Ltd
Original Assignee
Xi'an Huize Intellectual Property Operation Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Huize Intellectual Property Operation Management Co Ltd filed Critical Xi'an Huize Intellectual Property Operation Management Co Ltd
Priority to CN201410406915.2A priority Critical patent/CN105373492A/en
Publication of CN105373492A publication Critical patent/CN105373492A/en
Withdrawn legal-status Critical Current

Links

Abstract

The invention discloses a task flow-oriented register file-based fast data exchange structure. The task flow-oriented register file-based fast data exchange structure is characterized in that a global register file and n computational nodes are contained, wherein n is a natural number; the computational nodes are respectively connected with the global register file through dedicated buses; the computational nodes contain local register files and task flow processing components in different levels; the task flow processing components are connected with one another and connected with the local register files through crossed buses; the local register files are connected with the crossed buses through register file logic ports; and the local register files are used for providing addressable register file logic ports for each task flow processing components in the computational nodes. The task flow-oriented register file-based fast data exchange structure has the advantages of being small in time delay, easy to expand and convenient to control and manage, so that the mutual conversion of storage and communication is realized, and the price performance ratio of system expansion is effectively improved.

Description

A kind of rapid data exchange structure based on register file of oriented mission stream
Technical field
The present invention relates to a kind of rapid data exchange structure based on register file of oriented mission stream.
Background technology
In computer system, three large pillars are: processing element, memory unit and communication component (comprising I/O and the network interconnection).On the surface, calculate, store, this three that communicates is separate.But can transform mutually between in fact, under certain condition, calculating, storage, communication.In the application of many high performance parallel computation, the communication capacity of data restricts the bottleneck of its performance raising often, and the overall performance that namely system is final often determined by the data exchange capability between system node.But current high performance computing system is very single to the resolving ideas of this problem, be all realized by various method optimizing and the performance of improving internet between computing node substantially.
Mpp system solves complete machine interconnection problems by the use of SMP, CC-NUMA, Cluster-NUMA multithreaded architecture and various hybrid combination, and its network design complexity, network diameter and communication delay expand with computer capacity and increase rapidly.And adopt Cluster architecture group system, there is the plurality of advantages such as retractable, height ALARA Principle, High Availabitity, high performance price ratio; But its communication overhead and postpone large, along with the increase with calculating crunode number, to the quantity of switch and performance requirement more and more higher.
For the high-effect concurrent computational system of easily extensible, high bandwidth, low delay, network diameter and time delay the brought impact that increases substantially along with the increase of system scale is very outstanding, restructural in high-effect computing system, subregion, configurable characteristic can also become more and more important.How to break through this single data exchange mode in internet, obtaining than MPP coupling network structure and the higher usefulness of Cluster loosely-coupled network structure is a problem demanding prompt solution.
Summary of the invention
For solving above-mentioned existing shortcoming, fundamental purpose of the present invention is the rapid data exchange structure based on register file of the oriented mission stream providing a kind of practicality, there is the advantage that time delay is little, be easy to expand and be convenient to arrange management, achieve and store and the mutual conversion communicated, effectively improve the ratio of performance to price of system extension.
For reaching above-described object, the rapid data exchange structure based on register file of a kind of oriented mission stream of the present invention takes following technical scheme:
A kind of rapid data exchange structure based on register file of oriented mission stream, it is characterized in that, containing global register file and n computing node, n is natural number, each computing node is connected with global register file respectively by private bus, computing node contains the task flow processing element of local register file and different levels, between task flow processing element and task flow processing element be connected by crossover bus with between local register file, local register file connects crossover bus by register file logic port, local register file is according to the hierarchical structure of computing node, for each task flow processing element in computing node provides addressable register file logic port, global register file connects private bus by register file logic port, global register file provides different register file logic ports for different computing nodes, for task flow processing element different in same computing node provides single physical port, by time slicing for different task flow processing element provides the register file that can access logic port simultaneously, global register file is the register file group storing each computing node intermediate result, local register file is the register file group storing in computing node all operations number when carrying out task process and intermediate result, task flow processing element contains FPGA, CPU array, CPU array, shared storage accelerator Sharedmemoryaccelerator, multinuclear shares Cache Cachesharedmulti-core, primary memory Mainmemory and storer Storage, described register file group contains m register file, each register file contains register file controller and k storage bunch, storage bunch is made up of register, m and k is natural number, register file controller is connected by unified crossover bus, store bunch between connected by data bus, register file controller and store bunch between adopt centralized serial link mode, each computing node described all can carry out share and access to global register file, to excavate the task flow data reusing between each computing node, the size of global register file can carry out dynamic assignment and adjustment according to the resource extent of computing node, when carrying out data communication when between task flow processing element each in computing node, local register file is that corresponding exchanges data provides support.
Between computing node and global register file, the unified data layout of Bian carries out data interaction, according to the large I of operation task desired data amount on each computing node, global register file is dynamically marked off corresponding register space, for the exchanges data between computing node, to register file controller and store bunch access controlled by the physical address of register file, when using register file, directly conducted interviews by the physical address of crossover bus according to register file, register file controller and between storing bunch the interface that uses be the physical address to store bunch, the namely physical address of register file, can not carry out the reading of global register file the same area and write operation simultaneously, but after a computing node completes write, other computing node can read data simultaneously, the read/write operation shared in register file between zones of different can be carried out simultaneously, multiple computing node use simultaneously global register file execute the task stream in task time, can according to the priority height of task performed by different computing node, preferentially the higher computing node of priority is distributed in global register file space, if the data volume of computing node required exchange when executing the task changes greatly, also can carry out dynamic conditioning according to remaining space size in global register file, the data volume demand of all computing nodes is met as much as possible according to priority height, when carrying out exchanges data between each computing node, first data to be exchanged is deposited the appointed area to global register file, and then by needing the computing node receiving data to read from the appointed area global register file, local register file is kept in all operations number of task operating and intermediate result, each task flow processing element is by the crossover bus transmission in each computing node and be cached in local register file, computing node is in computation process, each task flow processing element does not need access external memory or global register file, when only needing data interaction between each computing node, just result is write back global register file or external memory storage.
Adopt the present invention of as above technical scheme, there is following beneficial effect:
The present invention has the advantage that time delay is little, be easy to expand and be convenient to arrange management, achieves and stores and the mutual conversion communicated, effectively improve the ratio of performance to price of system extension.
Accompanying drawing explanation
Fig. 1 is the schematic diagram of the rapid data exchange structure based on register file.
Fig. 2 is the inside detailed construction schematic diagram of the fast data exchange based on register file.
Embodiment
In order to further illustrate the present invention, be described further below in conjunction with accompanying drawing:
See Fig. 1 Fig. 2, a kind of rapid data exchange structure based on register file, containing global register file and n computing node, n is natural number, each computing node is connected with global register file respectively by private bus, and the private bus between each computing node does not exist intersection or shares, and the bandwidth that the overall situation is deposited is far above memory bandwidth, therefore global register enhances the data locality between each computing node, provides fast data exchange ability.
Computing node contains the task flow processing element of local register file and different levels, between task flow processing element and task flow processing element be connected by crossover bus with between local register file.
Bus can be divided into private bus and non-dedicated bus by its function, all belongs to data bus.Private bus refers to and only connects the bus of a pair physical unit, and from the one belonging to external bus physically, at this, private bus connects computing node and global register file, carries out the passage of data transmission.
Crossover bus belongs to internal bus from physically.Between each task flow processing element in computing node, all undertaken interconnected by crossover bus between task flow processing element and local register file.Crossover bus is the data interaction between each task flow processing element, the data interaction between task flow processing element and local register file provides transmission channel.
Local register file connects crossover bus by register file logic port, local register file according to the hierarchical structure of computing node, for each task flow processing element in computing node provides addressable register file logic port; Global register file connects private bus by register file logic port, global register file provides different register file logic ports for different computing nodes, for task flow processing element different in same computing node provides single physical port, by time slicing for different task flow processing element provides the register file that can access logic port simultaneously.
Global register file is the register file group storing each computing node intermediate result, local register file is the register file group storing in computing node all operations number when carrying out task process and intermediate result, and task flow processing element contains FPGA, CPU array, GPU array, shared storage accelerator Sharedmemoryaccelerator, multinuclear share Cache Cachesharedmulti-core, primary memory Mainmemory and storer Storage.
Register file group contains m register file, each register file contains register file controller and k storage bunch, storage bunch is made up of register, m and adopted k is natural number, register file controller is connected by unified crossover bus, store bunch between connected by data bus, register file controller and store bunch between adopt centralized serial link mode.
Rapid data exchange structure based on register file has exchanged form and good extensibility flexibly, dynamic assignment register file resource can be used in task implementation, support that storage resources exchanges the realization of computational resource and exchanges data resource for, fully demonstrated and stored and the thought transformed mutually that communicates.
A kind of management method to the described rapid data exchange structure based on register file, be specially: each computing node all can carry out share and access to global register file, to excavate the task flow data reusing between each computing node, the size of global register file can carry out dynamic assignment and adjustment according to the resource extent of computing node, when carrying out data communication when between task flow processing element each in computing node, local register file is that corresponding exchanges data provides support.
Unified data layout is adopted to carry out data interaction between computing node and global register file, global register file is dynamically marked off corresponding register space, for the exchanges data between computing node according to the large I of operation task desired data amount on each computing node, to register file controller and store bunch access controlled by the physical address of register file, when using register file, directly conducted interviews by the physical address of crossover bus according to register file, register file controller and between storing bunch the interface that uses be the physical address to store bunch, the namely physical address of register file, such as when conducting interviews in register file 1 No. 1 storage bunch, the interface IP address of its access is 000001000001, when No. 6 storage bunch in register file 6 conducts interviews, the interface IP address of its access is 000110000110.
Can not carry out the reading of global register file the same area and write operation simultaneously, but after a computing node completes write, other computing node can read data simultaneously, and the read/write operation shared in register file between zones of different can be carried out simultaneously.
Multiple computing node use simultaneously global register file execute the task stream in task time, can according to the priority height of task performed by different computing node, preferentially the higher computing node of priority is distributed in global register file space, if the data volume of computing node required exchange when executing the task changes greatly, also can carry out dynamic conditioning according to remaining space size in global register file, meet the data volume demand of all computing nodes according to priority height as much as possible.
When carrying out exchanges data between each computing node, first data to be exchanged is deposited the appointed area to global register file, and then by needing the computing node receiving data to read from the appointed area global register file.
Computing node is that it contains the task flow processing element of different levels for the elementary cell of different task in stream of executing the task, and the data interaction between each computation layer of computing node inside time is undertaken by local register file.Local register file is kept in all operations number of task operating and intermediate result, each task flow processing element is by the crossover bus transmission in each computing node and be cached in local register file, and its operating mechanism is identical with the data interactive mode of global register file.
Computing node is in computation process, each task flow processing element does not need access external memory or global register file, when only needing data interaction between each computing node, just result is write back global register file or external memory storage, such as, after all operations of task flow is all finished, just result is write back external memory storage.Local register enhances the data locality between each task flow processing element, provides the fast data exchange ability in computing node.

Claims (2)

1. the rapid data exchange structure based on register file of oriented mission stream, is characterized in that, containing global register file and nindividual computing node, nfor natural number, each computing node is connected with global register file respectively by private bus, computing node contains the task flow processing element of local register file and different levels, between task flow processing element and task flow processing element be connected by crossover bus with between local register file, local register file connects crossover bus by register file logic port, local register file is according to the hierarchical structure of computing node, for each task flow processing element in computing node provides addressable register file logic port, global register file connects private bus by register file logic port, global register file provides different register file logic ports for different computing nodes, for task flow processing element different in same computing node provides single physical port, by time slicing for different task flow processing element provides the register file that can access logic port simultaneously, global register file is the register file group storing each computing node intermediate result, local register file is the register file group storing in computing node all operations number when carrying out task process and intermediate result, task flow processing element contains FPGA, CPU array, CPU array, shared storage accelerator Sharedmemoryaccelerator, multinuclear shares Cache Cachesharedmulti-core, primary memory Mainmemory and storer Storage, described register file group contains m register file, each register file contains register file controller and k storage bunch, storage bunch is made up of register, m and k is natural number, register file controller is connected by unified crossover bus, store bunch between connected by data bus, register file controller and store bunch between adopt centralized serial link mode, each computing node described all can carry out share and access to global register file, to excavate the task flow data reusing between each computing node, the size of global register file can carry out dynamic assignment and adjustment according to the resource extent of computing node, when carrying out data communication when between task flow processing element each in computing node, local register file is that corresponding exchanges data provides support.
2. the rapid data exchange structure based on register file of a kind of oriented mission stream according to claim 1, it is characterized in that, between computing node and global register file, the unified data layout of Bian carries out data interaction, according to the large I of operation task desired data amount on each computing node, global register file is dynamically marked off corresponding register space, for the exchanges data between computing node, to register file controller and store bunch access controlled by the physical address of register file, when using register file, directly conducted interviews by the physical address of crossover bus according to register file, register file controller and between storing bunch the interface that uses be the physical address to store bunch, the namely physical address of register file, can not carry out the reading of global register file the same area and write operation simultaneously, but after a computing node completes write, other computing node can read data simultaneously, the read/write operation shared in register file between zones of different can be carried out simultaneously, multiple computing node use simultaneously global register file execute the task stream in task time, can according to the priority height of task performed by different computing node, preferentially the higher computing node of priority is distributed in global register file space, if the data volume of computing node required exchange when executing the task changes greatly, also can carry out dynamic conditioning according to remaining space size in global register file, the data volume demand of all computing nodes is met as much as possible according to priority height, when carrying out exchanges data between each computing node, first data to be exchanged is deposited the appointed area to global register file, and then by needing the computing node receiving data to read from the appointed area global register file, local register file is kept in all operations number of task operating and intermediate result, each task flow processing element is by the crossover bus transmission in each computing node and be cached in local register file, computing node is in computation process, each task flow processing element does not need access external memory or global register file, when only needing data interaction between each computing node, just result is write back global register file or external memory storage.
CN201410406915.2A 2014-08-19 2014-08-19 Task flow-oriented register file-based fast data exchange structure Withdrawn CN105373492A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410406915.2A CN105373492A (en) 2014-08-19 2014-08-19 Task flow-oriented register file-based fast data exchange structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410406915.2A CN105373492A (en) 2014-08-19 2014-08-19 Task flow-oriented register file-based fast data exchange structure

Publications (1)

Publication Number Publication Date
CN105373492A true CN105373492A (en) 2016-03-02

Family

ID=55375708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410406915.2A Withdrawn CN105373492A (en) 2014-08-19 2014-08-19 Task flow-oriented register file-based fast data exchange structure

Country Status (1)

Country Link
CN (1) CN105373492A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107493485A (en) * 2016-06-13 2017-12-19 中兴通讯股份有限公司 A kind of resource control method, device and IPTV server
CN108595258A (en) * 2018-05-02 2018-09-28 北京航空航天大学 A kind of GPGPU register files dynamic expansion method
CN115001895A (en) * 2022-05-25 2022-09-02 西安微电子技术研究所 Data sharing device, system and method of satellite-borne heterogeneous system based on SPACEWIRE bus
CN116400982A (en) * 2023-05-26 2023-07-07 摩尔线程智能科技(北京)有限责任公司 Method and apparatus for configuring relay register module, computing device and readable medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107493485A (en) * 2016-06-13 2017-12-19 中兴通讯股份有限公司 A kind of resource control method, device and IPTV server
CN108595258A (en) * 2018-05-02 2018-09-28 北京航空航天大学 A kind of GPGPU register files dynamic expansion method
CN115001895A (en) * 2022-05-25 2022-09-02 西安微电子技术研究所 Data sharing device, system and method of satellite-borne heterogeneous system based on SPACEWIRE bus
CN116400982A (en) * 2023-05-26 2023-07-07 摩尔线程智能科技(北京)有限责任公司 Method and apparatus for configuring relay register module, computing device and readable medium
CN116400982B (en) * 2023-05-26 2023-08-08 摩尔线程智能科技(北京)有限责任公司 Method and apparatus for configuring relay register module, computing device and readable medium

Similar Documents

Publication Publication Date Title
CN103761215B (en) Matrix transpose optimization method based on graphic process unit
CN107590085B (en) A kind of dynamic reconfigurable array data path and its control method with multi-level buffer
CN102497411B (en) Intensive operation-oriented hierarchical heterogeneous multi-core on-chip network architecture
CN102135949B (en) Computing network system, method and device based on graphic processing unit
CN104699631A (en) Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor)
CN103218208A (en) System and method for performing shaped memory access operations
CN108139882B (en) Implement the system and method for stratum's distribution lists of links for network equipment
CN101441616B (en) Rapid data exchange structure based on register document and management method thereof
CN102446159B (en) Method and device for managing data of multi-core processor
CN111124675A (en) Heterogeneous memory computing device for graph computing and operation method thereof
CN111433758A (en) Programmable operation and control chip, design method and device thereof
CN105874758B (en) Memory pool access method, interchanger and multicomputer system
CN105373492A (en) Task flow-oriented register file-based fast data exchange structure
US20220179823A1 (en) Reconfigurable reduced instruction set computer processor architecture with fractured cores
CN103440246A (en) Intermediate result data sequencing method and system for MapReduce
US9892042B2 (en) Method and system for implementing directory structure of host system
CN103984677A (en) Embedded reconfigurable system based on large-scale coarseness and processing method thereof
Han et al. A novel ReRAM-based processing-in-memory architecture for graph computing
CN111630487A (en) Centralized-distributed hybrid organization of shared memory for neural network processing
CN106844263B (en) Configurable multiprocessor-based computer system and implementation method
CN104679670A (en) Shared data caching structure and management method for FFT (fast Fourier transform) and FIR (finite impulse response) algorithms
CN105553646B (en) Reconfigurable S-box circuit structure towards block cipher parallel computation
Kobus et al. Gossip: Efficient communication primitives for multi-gpu systems
CN103761072A (en) Coarse granularity reconfigurable hierarchical array register file structure
CN104239520A (en) Historical-information-based HDFS (hadoop distributed file system) data block placement strategy

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C04 Withdrawal of patent application after publication (patent law 2001)
WW01 Invention patent application withdrawn after publication

Application publication date: 20160302