CN100341014C - Scaleable interconnect structure for parallel computing and parallel memory access - Google Patents

Scaleable interconnect structure for parallel computing and parallel memory access Download PDF

Info

Publication number
CN100341014C
CN100341014C CNB018208878A CN01820887A CN100341014C CN 100341014 C CN100341014 C CN 100341014C CN B018208878 A CNB018208878 A CN B018208878A CN 01820887 A CN01820887 A CN 01820887A CN 100341014 C CN100341014 C CN 100341014C
Authority
CN
China
Prior art keywords
data
node
ring
computing
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB018208878A
Other languages
Chinese (zh)
Other versions
CN1489732A (en
Inventor
约翰·赫斯
科克·S·里德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Interactic Holdings LLC
Original Assignee
Interactic Holdings LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Interactic Holdings LLC filed Critical Interactic Holdings LLC
Publication of CN1489732A publication Critical patent/CN1489732A/en
Application granted granted Critical
Publication of CN100341014C publication Critical patent/CN100341014C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • G06F15/17368Indirect interconnection networks non hierarchical topologies
    • G06F15/17375One dimensional, e.g. linear array, ring

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

Multiple processors are capable of accessing the same data in parallel using several innovative techniques. First, several remote processors can request to read from the same data location and the requests can be fulfilled in overlapping time periods. Second, several processors can access a data item located at the same position, and can read, write, or perform multiple operations on the same data item overlapping times. Third, one data packet can be multicast to several locations and a plurality of packets can be multicast to a plurality of sets of target locations.

Description

The scaleable interconnect structure that is used for parallel computation and parallel memory access
Background technology
A lasting problem that exists is to provide sufficient data stream for processor in the large-scale parallel computing system.In U.S. Pat 5996020 and US6289021, describe the low interconnection structure that postpones of high bandwidth, greatly improved the data stream in the network.Needed is a kind of system that the low interconnection structure that postpones of this high bandwidth is realized fully by parallel memory access in the network enabled and calculating.
Summary of the invention
Use some innovative technologies can make a plurality of processors can the identical data of concurrent access.At first, some remote processors can be asked from same Data Position reading of data, and this request can be met in the overlapping time cycle.Secondly, some processors can be visited the data item that is positioned at same position, and can read, write or implement multiple operation to same data item overlappingly.The three, one data grouping can be multi-cast to some places, and a plurality of grouping can be multi-cast to many groups destination.
In the following description, term " grouping " expression data unit is preferably with the form of serial.The example of grouping comprises Internet Protocol (IP) grouping, ethernet frame, atm cell, the switching fabric section that comprises the part of a bigger frame or grouping, supercomputer interprocessor message, and other has the data message type of the message-length upper limit.
The invention provides a kind of parallel data treating apparatus, it comprises: interconnection structure, its a plurality of positions that interconnect; One or more storage unit, it is coupled to this interconnection structure in these a plurality of positions, and can be accessed by this interconnection structure, described storage unit is included in the storage unit W of position L, and this storage unit W has a plurality of memory blocks that are connected to paired and synchronous FIFO storage ring; And a plurality of computing units, it is coupled to described interconnection structure in these a plurality of positions, these a plurality of computing units can be by the memory cell access data of this interconnection structure from the paired FIFO storage ring of one or more described synchronous circulation, described computing unit comprises first computing unit and second computing unit, this first and second computing unit can be simultaneously from the different memory areas reading of data of described storage unit W, and the data content of the memory block of this storage unit W can be sent to different target locations.
The invention provides a kind of parallel data treating apparatus, it comprises: interconnection structure, its a plurality of positions that interconnect; A plurality of storage unit, it is connected to paired and synchronous FIFO storage ring, and be coupled to this interconnection structure, and can be accessed as the position by this interconnection structure, described storage unit comprises first storage unit and second storage unit that lays respectively at the primary importance and the second place; And a plurality of computing units, it is coupled to described interconnection structure in these a plurality of positions, these a plurality of computing units can be by the memory cell access data of this interconnection structure from one or more form synchronous circulation with described paired FIFO storage ring, described computing unit comprises first computing unit and second computing unit, described first computing unit can be operated the data that this reads from the first and second storage unit reading of data simultaneously then, and described second computing unit can read and operate the data of first and second storage unit overlappingly with reading with the running time of being carried out of first computing unit.
The invention provides a kind of parallel data treating apparatus, it comprises: interconnection structure, its a plurality of positions that interconnect; A plurality of storage unit, it is coupled to this interconnection structure in these a plurality of positions, and can be accessed by this interconnection structure, described storage unit is connected to paired and synchronous FIFO storage ring, each storage unit in these a plurality of storage unit all comprises first circulating register, and this first shift register storage is divided into first word of a plurality of memory blocks; And a plurality of computing units, it is coupled to described interconnection structure in these a plurality of positions, and these a plurality of computing units can be operated the memory block that separates of first word simultaneously.
The invention provides a kind of parallel data treating apparatus, it comprises: interconnection structure is used to carry data, it comprises a plurality of layering interconnected nodes, this interconnection structure comprises a logic, the data collision of this logic prediction node, and according to the right of priority solution data collision of determining according to level; First switch, it is coupled to this interconnection structure, according to being included in the communication information in the data to this interconnection structure distributing data; A plurality of computing modules, it is coupled to this interconnection structure by paired and synchronous FIFO storage ring, and these logic modules can be executed operation by logarithm factually; Second switch, it is coupled to described a plurality of computing module, and receives data from described a plurality of computing modules.
The invention provides a kind of many accesses storage and computing equipment, comprising: a plurality of logical device, these logical device comprise the memory device that is connected to paired and synchronous FIFO storage ring; And an interconnection structure, it is coupled to this logical device, is used for data and operational code are routed to these logical device, and this interconnection structure further comprises: a plurality of nodes; A plurality of logical blocks that are associated with these a plurality of nodes; The multiple messages interconnection path, wherein, every paths is coupled to the node in described a plurality of node selectively, so that send data from a node as sending node to a node as receiving node; Many control signal interconnection paths, wherein, every paths is coupled to the node in described a plurality of node selectively, so that transmit control signal to the logical block related with receiving node from sending node; Described a plurality of node comprises: different node A, B and X; The logic L related with Node B B, it is determined route for Node B and judges; One from as the Node B of sending node to information interconnect path as the nodes X of receiving node; One from as the node A of sending node to information interconnect path as the nodes X of receiving node; One from as the node A of sending node to logic L BThe control signal interconnection path, described control signal is forced to send data from node A to nodes X and is had than sending the higher right of priority of data from Node B to nodes X.
The invention provides a kind of many accesses storage and computing equipment, comprising: a plurality of logical device, these logical device comprise the memory device that is connected to paired and synchronous FIFO storage ring; And an interconnection structure, it is coupled to described logical device, is used for data and operational code are routed to these logical device, and this interconnection structure further comprises: a plurality of nodes, comprising different node A, B, X and Y; Many interconnection paths, node in its described a plurality of node that is coupled selectively, these interconnection paths comprise and are used for control signal interconnection path that transmits control signal to the computing module related with the node that utilizes control signal from the node that transmits control signal and the data interconnect path that is used for sending to data reception node from data transmitting node data; Node B comprises and is used for to nodes X and sends the data interconnect path of data to node Y; Node A comprises the logic L that is used for to related with Node B BThe control interconnection path that transmits control signal, logic L BBe exercisable, make that node A is to this logic L for the message M that arrives Node B BSend a control signal C, logic L BUtilizing this control signal C decision that message M is sent to nodes X still is node Y.
The invention provides a kind of many accesses storage and computing equipment, comprising: a plurality of logical device, described logical device comprise the memory device that is connected to paired and synchronous FIFO storage ring; An and interconnection structure, it is coupled to described logical device, be used for data and operational code are routed to these logical device, this interconnection structure further comprises: a plurality of nodes, comprising a node A, a Node B and a set of node P, node A is set of node P different nodes in addition with B, and Node B can send data by all nodes in set of node P; And many interconnection paths, node in its described a plurality of node that is coupled selectively, these nodes are to being selected with the node that comprises a sending node and a receiving node, described sending node sends data to receiving node, and these many interconnection paths comprise data interconnect path and control interconnection path; These many control interconnection paths are coupled in these a plurality of nodes selectively as the node of control signal sending node, are used for to utilizing the related logic of node to transmit control signal with control signal; Many control interconnection paths comprise from node A to the logic L related with Node B BThe control interconnection path, logic L BBe used to determine from the control signal of node A Node B sends the data to which node among the set of node P.
The invention provides a kind of many accesses storage and computing equipment, comprising: a plurality of logical device, described logical device comprise the memory device that is connected to paired and synchronous FIFO storage ring; An and interconnection structure, it is coupled to described logical device, be used for data and operational code are routed to these logical device, this interconnection structure further comprises: a plurality of nodes, comprising a node A, a Node B and a set of node P, node A is set of node P different nodes in addition with B, and Node B can send data by all nodes in set of node P; And many interconnection paths, the node in its described a plurality of node that is coupled selectively, these nodes be with the node that comprises a sending node and a receiving node to being selected, sending node sends data to receiving node; One logic L A, it is associated with node A, can determine where data are routed to from node A; One logic L B, it is associated with Node B, can determine where data to be routed to logic L from Node B AWith logic L BDifference, logic L BUtilize logic L AThe information of determining determines Node B sends the data to which node among the set of node P.
The invention provides a kind of many accesses storage and computing equipment, comprising: a plurality of logical device, described logical device comprise the memory device that is connected to paired and synchronous FIFO storage ring; An and interconnection structure, it is coupled to described logical device, be used for data and operational code are routed to these logical device, this interconnection structure further comprises: a plurality of nodes, wherein each node comprises a plurality of data-in ports, a plurality of data-out port, and control is by the logical block of the data stream of this node; Described a plurality of node comprises different node A, B, X and Y mutually; Many interconnection paths, node in its described a plurality of node that is coupled selectively, these interconnection paths comprise and are used for the control interconnection path that transmits control signal to the logic related with the node that utilizes control signal from the node that transmits control signal, with the data interconnect path that is used for sending to data reception node data from data transmitting node, described data interconnect path is coupled with described data-in port and data-out port selectively, described many control interconnection path switching nodes and logical block are used for transmitting control signal to the logical block that the node with the data stream with the control signal of depending on is associated from the control signal sending node; Node B and logic L BBe associated logic L BBe used to determine message M is passed from the control signal of node A the route of Node B, the control signal C that receives from node A makes message M is sent to nodes X that the control signal C ' that receives from node A makes message M be sent to node Y from Node B.
The invention provides a kind of many accesses storage and computing equipment, comprising: a plurality of logical device, described logical device comprise the memory device that is connected to paired and synchronous FIFO storage ring; An and interconnection structure, it is coupled to described logical device, is used for data and operational code are routed to these logical device, and this interconnection structure further comprises: a plurality of nodes, comprising nodes X and set of node P, this set of node P comprises a plurality of nodes that can send data to nodes X; And many interconnection paths, node in its described a plurality of node that is coupled selectively, these interconnection paths comprise the data interconnect path that is used for sending to receiving node from sending node data, node among the set of node P has the priority relationship that sends data to nodes X, wherein, having the node that sends the highest priority of data to nodes X never can get clogged when nodes X sends data.
The invention provides a kind of calculation element that in computing system, uses, comprising: the first and second synchronization fifo rings; And at least one is coupled to the computing module of this first and second synchronization fifos ring, and this computing module can be visited at least one position of each FIFO ring simultaneously.
System of the present invention has solved the similar problem in the communication when the data of a plurality of minutes group access same positions that arrive a switch.
Other multistage minimum logical network topology many of great use equipment and system in can be used as basic building block, these equipment and system comprise logical device, the memory device of all kinds and characteristic, and computing machine and processor.The object lesson of these equipment and system has parallel RAM (PRAM) and parallel computation engine.These equipment and system comprise as the storer of the network interconnection structure of basic building block and embedding or internal memory and logic.Data-carrier store can be the form of first in first out (FIFO) ring.
Description of drawings
The feature that is considered to the embodiment of describing of novelty is specified by claims.But with reference to following description and accompanying drawing, the embodiment that the present invention may be better understood about structure and method of operating.
Fig. 1 shows the schematic block diagram of a general-purpose system that is made of member, and it comprises a plurality of network interconnection structures.
Fig. 2 shows the schematic block diagram of parallel storage structure, as with the parallel RAM (PRAM) of network interconnection structure as elementary cell.
Fig. 3 shows the synoptic diagram of the end level of top switch, and it shows the connection of communication loop, a plurality of logic module, circulation data fifo storage ring, and the top connection of arriving the bottom switch.
Fig. 4 A, Fig. 4 B and Fig. 4 C show the schematic block diagram of data of description by communication loop and circulation data fifo storage ring, and wherein, Fig. 4 A is used for READ and WRITE request; Fig. 4 B and Fig. 4 C are used for ongoing READ request.
Fig. 5 shows the part of interconnection structure, when carrying out two read operations, reads from the same loop-around data storage ring that appears at the overlapping time interval, and enters second switch, and the data that read out are directed to different targets there.
Fig. 6 shows the part of the interconnection structure when carrying out the WRITE operation.
Fig. 7 shows the structure of utilizing indirect addressing to implement multileaving and the schematic block diagram of technology.
Embodiment
Referring to Fig. 1, wherein show the schematic block diagram of a general-purpose system 100 that constitutes by member, it comprises one or more network interconnection structures.In this embodiment, general-purpose system 100 comprises a top switch 110 and a bottom switch 112 that is made of the network interconnection structure.Term " network interconnection structure " also can refer to other interconnection structure.Other system can comprise the additional parts that are made of the network interconnection structure.General-purpose system 100 has been described the various assemblies that can be used as basic example system core parts.Some embodiments also comprise other parts except that these core components.Other parts can comprise as shared storage 1); 2) the direct connection 130 between top switch and bottom switch; 3) the direct connection 140 between bottom switch and the I/O, and 4) be connected the concentrator between logical block 114 and the bottom switch 112.
General-purpose system 100 has a top switch 110, its effect is to receive from external source and may be from the input packet of bottom switch through incoming line 136 or bus 130 as input end, and with these packet distribution to dynamic process machine (DPIM:dynamic processor-in-memory) logic module 114 in storer.Top switch 110 is branch group selection route according to the communication information that is included in the packets headers in general-purpose system 100.Grouping is sent to DPIM module 114 from top switch 110.The sequential that 110 control signal control grouping is injected from DPIM module 114 to top switch is to avoid conflict.So just prevented otherwise will occur with DPIM in the conflicting of data, or with bottom switch in the conflicting of data.System also can use output line and bus 130,132,134 and 136 with information be sent to other calculating, communicate by letter, storage and other parts (not shown).
Packet enters top switch 110 and advances to target DPIM 114 according to the address field in each grouping.The information that is included in the grouping can (may in conjunction with out of Memory) be used for determining to be included in the operation that the data in grouping and the DPIM storer are carried out by 114 couples of logic DPIM.For example, the information in the grouping can be revised the data that are stored in the DPIM storer, cause the information in the DPIM storer to send by bottom switch 112, or other data that cause the DPIM logic module to produce withdraws from from bottom switch.Grouping from DPIM is sent to bottom switch.The another kind of selection is to comprise computing unit, memory cell or both in general-purpose system 100.Computing unit 126 can be used for by I/O unit 124 outside system 100 or to top switch 110, or sends packet to both.Send under the situation of grouping to top switch in bottom switch, grouping can directly be sent, also can be by one or more interconnecting modules (not shown) transmissions of handling sequential and control between the integrated circuit as the sub-component of system 100.
In an example of this system, the form of the data-carrier store that is associated with computing unit (CU) 126 of first in first out (FIFO) data storage ring R and routine among the data-carrier store employing DPIM 114.The FIFO ring is the 1 bit shift register collection that circulation connects.The FIFO ring comprises two kinds of assemblies.In first conventional example, the FIFO ring comprises 1 bit shift register that only is connected to next 1 bit shift register, to form a simple FIFO in the ring C of DPIM114.In second example, other shift register of this ring is included in 1 or multidigit register in other parts of system (as logic module 114).In sum, two kinds of assemblies all connect into ring serially.For example, the length overall F of FIFO ring LCan be 200, wherein, 64 are stored among a plurality of logic module L, and in remaining 136 registers connected in series that are stored in FIFO.Clock in one system scope is connected to this FIFO parts and shift register, and makes data bit advance to the next position in " bucket group (bucket-brigade) " mode.A cycle period is defined as the time that data are finished a round-robin clock period of FIFO ring exactly.The round values of cycle period is identical with the length of FIFO ring assemblies.For example, for the ring with 200 assemblies (length is 200), its cycle period is 200 system clock cycles.System can also comprise local sequential source or the clock with the different rates stepping.In some embodiments, all FIFO ring in the system has identical length, or is different from an integer and multiply by a predetermined minimum length.In other embodiments, ring is the bus structure with a plurality of parallel routes, and the data volume of being held in the ring is the length F that an integer multiply by ring L
In general-purpose system 100, top switch can be handled the grouping with all lengths that is system's maximum length to the maximum.In some applications, the length of all groupings is identical.More commonly, the grouping that will have a different length is input to top switch.The length of one given grouping is P L, wherein, P LBe not more than F L
Similarly, bottom switch also can be handled the grouping of all lengths.In the exemplary embodiment of general-purpose system 100, produce with operation according to the function of DPIM logic module 114 and CU126 and to have the data of different bit lengths.DPIM can work independently, perhaps, a plurality of systems (not shown) can be arranged also, and they collect data from DPIM, and can send the data to that DPIM or other are included within the system 100 or outside parts.
With reference to Fig. 2, it shows the schematic block diagram by parallel RAM (PRAM) system 200 that constitutes than system shown in Figure 1 member still less.This PRAM system comprises a top switch being made of the network interconnection structure 110, a concentrator 150, and a bottom switch 112.This system also comprises the DPIM 114 that stores data.READ and WRITE function typically can be implemented in the DPIM unit, and therefore, this system can be used as parallel RAM.
In the embodiment shown in, the form that enters the packet of top switch 110 is: service load | 2| operational code 1| address, operational code 2| address 1| sequential BIT
Be abbreviated as:
PAYLOAD|OP2|AD2|OP1|AD1|BIT
The figure place of PAYLOAD (service load) field is defined as PayL.The figure place of OP2 and OP1 is respectively defined as OP2L and OP1L.The figure place of AD2 and AD1 is respectively defined as AD2L and AD1L.BIT field length in a preferred embodiment is 1.
Following table is the concise and to the point description to grouping field:
Field Describe
BIT There is a grouping in value " 1 " expression, and there is not grouping in value " 0 " expression.
AD1 Top switch 110 is utilized this address, routes the packet to target DPIM AD1114.
OP1 The operational code that target DPIM 114 uses.Its explanation action or processing that DPIM implemented, action or the object of handling are PAYLOAD field and the data content that is stored in the one or more storage ring R that are arranged in target DPIM.
AD2 Bottom switch 112 is utilized this address, DPIM output is connected 132 by output be routed to external unit, or be routed to computing unit 126.In some operation, this AD2 field is not used.If use, then this AD2 field comprises that one is set to the first BIT2 field of " 1 ".
OP2 Computing unit 126 or be positioned at bottom switch 124, the address is the employed operational code of external unit of the output port of AD2.In some operation, this OP2 field is not used.
PAYLOAD Data content or " service load (payload) " of the grouping of the target DPIM 114 that is routed in address AD 1 by top switch 110.In some operation, the PAYLOAD field can be changed by DPIM 114, and further sends to the output port that is illustrated by AD2 by bottom switch 112.In some operation, this PAYLOAD field is not used.
The BIT field at first enters switch, has grouping always to be set to " 1 " when existing in expression.The BIT field also is described to " service digit (traffic bit) ".The AD1 field is used to grouping is routed to by top switch the target DPIM of grouping.Top switch 110 can be set to multilayer level and row, and packet delivery is crossed these levels.Whenever grouping enters one of top switch 110 new level, the AD1 field just is removed 1, and therefore, this field has been shortened.System 200 utilizes identical technology.When grouping when top switch withdraws from, do not had remaining position in the AD1 field.Therefore, the grouping of leaving top switch has following form:
PAYLOAD|OP2|AD2|OP1|BIT
System 100 and 200 comprises the DPIM unit.Fig. 3 is the schematic block diagram of an example of explanation DPIM unit 114, and shows data and control linkage path between DPIM and top switch 110 and the bottom switch 112.Fig. 3 shows four interconnection structure Z, C, R, B.Interconnection structure Z can be the FIFO ring that is positioned at top switch 110.Interconnection structure C and R are the FIFO ring that is positioned at the DPIM module, for example FIFO ring C310.In some embodiments, DPIM directly sends data to bottom switch.In these embodiments, if bottom switch is an interconnection structure, then interconnection structure B is a FIFO ring.In other embodiments, DPIM sends data to concentrator, by concentrator data is sent to bottom switch then.In these embodiments, if concentrator is an interconnection structure, then B is ring-type or acyclic data FIFO.Fig. 1 and Fig. 7 show the system that does not comprise concentrator.Fig. 2,3,4A and Fig. 5 show the system that comprises concentrator.
Data are passed top switch 110, arrive target output ring Z J, wherein, J=AD1.Z=Z JRing have a plurality of nodes 330 that are connected to output line 326.The DPIM module comprises that one is called as the branch group of received ring C 302 and one or more " data storage ring " R 304 of " data communication ring ".Fig. 3 shows the DPIM that has forms data storage ring R.Each structure Z, C, R and B are the FIFO that comprises 1 FIFO node of interconnection.Some nodes in the structure have single data-in port and single data-out port, and are interconnected to form simple multinode FIFO.Other node in the structure also has the another one data-in port, or another data-out port, or both.These nodes can also comprise control signal output ends mouth or signal input end mouth.Ring Z receives the control signal from ring C, and sends data to logic module L314.Ring C and R receive and send data to logic module L314.FIFO B 308 transmits control signal to logic module L, and receives data from logic module L.DPIM can comprise a plurality of logic modules that can send data in interconnection structure or among the FIFO B to a plurality of input ports.Data from DPIM can be injected into the top a plurality of row of the B of system.The number of DPIM can be identical with the number of storage unit, and wherein, each DPIM has a monocycle storage ring R who comprises a data word.Select as another kind, the DPIM unit further comprises a plurality of storage ring R.A specific memory ring can be identified by the part of address field AD1 or the part of opcode field OP1.
The sequential that grouping is moved is synchronous in all four rings.Circulation time in being grouped in ring is alignd with respect to the BIT field.The result with advantage of alignment is that ring C sends permission or stops the node among the ring Z to send the control signal 328 of dividing into groups to ring C to ring Z.When the node 312 on the ring Z receives the permission of the node 330 on the ring C, can send grouping to logic module L, so that logic module L is positioned to handle grouping in the mode of bit serial immediately.Similarly, round-robin grouping and ring C are synchronous in data storage ring R, so that logic module L circulation time in being grouped in each ring can advantageously be handled everybody respectively.Data storage ring R plays the effect of the following storage unit that can be used for some new application that will describe.Can when not being positioned on the same chip, DPIM and top switch be used for the sequential and the control of chip at the ring node of Z and the data communication ring (not shown) that separates between the logic module L.
Data among the storage ring R can a plurality of groupings mode conduct interviews from top switch 110, align with packet partial in the Z ring 306 of top switch and overlapping, and make the cycle period unanimity.A plurality of logic modules 314 are associated with data communication ring C and data storage ring R.Logic module L can be from ring C and R reading of data, logarithm is executed operation factually under some conditions, and to ring C and R write data.Logic module L can also send grouping to the node on the FIFO 308 of bottom switch 112 or concentrator 320.Can when not being positioned on the same chip, DPIM and bottom switch be used for the sequential and the control of chip at the data communication ring (not shown) that separates between the node 320 of logic module L314 and interconnection structure B.Data communication ring separately can also need be in a monocycle during some of accessing communication ring at a single equipment, is used for sequential and control operation.
Grouping enters communication loop C by logic module L314.Logic module L is withdrawed from grouping, and enters bottom switch by input channel from different perspectives.
In some examples of general-purpose system 100, all have identical type along the ring C of DPIM 114 and the logic module of R, and logic function like the implementation of class.In some other example, use a plurality of different logic module types, allow the data among the ring R that is stored in a specific DPIM are implemented multiple logic function.Around ring R circulation time, logic module L314 can revise data in data.The data bit that logic module is passed this module to the node serial on ring C and ring R and ring Z is operated.Typical logic function comprises: (1) data transfer operation, as loading, store, read and writing; (2) logical operation, as AND (with), OR (or), NOR (or non-), NAND (with non-), EXCLUSIVE OR (XOR), bit test, or the like; (3) arithmetical operation, as add, subtract, multiplication and division, transcendental function, or the like.Can also comprise many other logical operation types.The logic module function can realize in logic module with hardware, also can be loaded in the logic module by the grouping that sends to logic module based on software.In some embodiments, the logic module that is associated with specific data storages ring R is worked independently.In other embodiments, the logic module group is by discrete system's (not shown) control that can receive data from one group of logic module.In also having some embodiments, the logic module group is controlled by a logic module control system.In other embodiments, the logic module control system is carried out steering order to the data that receive from logic module.
In Fig. 1 and 2, each DPIM comprises a ring R and a ring C.In the another kind of embodiment of system 100, a specific DPIM 114 comprises a plurality of ring R.In the embodiment of a plurality of ring R, logic module 314 can be simultaneously from ring C and all ring R access datas.Access simultaneously allows the content of logic module based on ring R, also revises data based on the content of grouping that receives and the communication loop C that is associated on one or more ring R.
The exemplary functions that logic module is implemented is that execution promptly is combined in the data that keep among the ring R data that remain in the grouping PAYLOAD field are operated in the operation of field OP1 regulation.In a concrete example, can regulation will divide into groups data in the PAYLOAD field of operation OP1 are added on the data that the ring R that is positioned at address AD 1 comprised.The target port that is positioned at address AD 2 gained and that be sent to bottom switch.The instruction defined that keeps as the data field in OP1 operation, logic module can be implemented some operations.For example, logic module can make data be retained in without change among the ring R304.Logic module can be replaced the data of encircling among the R304 with the content of PAYLOAD field.Logic module L can also replace the data that keep in the PAYLOAD field with the result that the content with in the PAYLOAD field before among the ring R304 is carried out feature operation.In other examples, storage FIFO can stored program instruction and data.
Comprise that it may use the one or more concrete logic modules of removing to specify implementation and operation of OP1 field more than the general-purpose system 100 of logic module 314 one type, that be associated with a communication loop C and a storage ring R.In some embodiments, a plurality of logic modules are to same data implementation and operation.Logic module collection at address AD 1=x can be implemented different operations with the logic module collection at address AD 1=y.
Packet is by the effectively mobile sequential that depends on data stream of general-purpose system 100.In some systems, the buffer (not shown) that is associated with logic module helps to keep the sequential of data transmission.In many embodiments, it is data cached that the keeping of sequential do not need.The interconnection structure of general-purpose system 100 preferably has the exercisable sequential that can cause efficient parallel calculating, generation and access data.
The general-purpose system 100 that is made of a plurality of assemblies comprises at least one switch, data storage ring 304 and interrelated logic 314, and this general-purpose system 100 can be used to constitute various calculating and communication switch.Calculating and the example of communication switch comprises the IP group router that is used for the Internet exchange system or switch, a special-purpose classification engine, multi-purpose computer the concurrent computational systems that have universal or special function perhaps more.
Referring to Fig. 2, wherein show with the schematic block diagram of network interconnection structure as the parallel RAM (PRAM) of basic building block.This PRAM storage can be from multiple source access simultaneously, and can send to the data of a plurality of targets simultaneously.This PRAM has a top switch 110, can be with or without from the communication loop of the target articulating contracture group of top switch 110.In not having the interconnection structure of communication loop, ring Z passes logic module.Top switch 110 has T the output port 210 from each target ring.In a typical PRAM system 200, the quantity of address location is greater than the quantity of the I/O of system port.For instance, can there be 128 I/O ports in a PRAM system, is used for visiting the 64K word that is stored in the DPIM data.The AD1 field is 16 long, as can to hold 64K DPIM addresses 114.The AD2 field be 8 long, can hold 128 output ports 204, wherein, 7 is address bit, 1 is the BIT2 part of address.Top switch 110 has 128 input ports 202 and 64K Z ring (not shown), and each Z ring is multi-link to the DPIM unit by output port 206.Concentrator 150 has the individual input port 208 of 64K (65536) and 128 output ports 210.Bottom switch 112 has 128 input ports and 128 output ports 204.Concentrator is followed control timing and the signal instructions of importing, exporting with top switch and bottom switch and identical being used to of logic module.
Select as another kind, top switch can have output Z ring and related DPIM unit still less.The DPIM unit can comprise a plurality of R rings, so that total size of data remains unchanged.
PRAM shown in Fig. 2 comprises DPIM unit 114, and it comprises the logic module 314 that is directly connected to communication loop C302 and storage ring R304.DPIM unit 114 is connected to the packet concentrator 150 with output data feed-in bottom switch 112.
Referring to Fig. 3, the node 312 of node 330 on top switch ring Z on the ring C transmits control signal, and allows each node 312 on the ring Z to send grouping to logic module L.When logic module L received grouping from encircling Z, logic module L can implement one of several actions.At first, logic module L can begin grouping is put on the C ring.Secondly, logic module L can bring into use the data in the grouping immediately.The 3rd, the grouping that logic module L can begin to be produced immediately sends in the concentrator 150, and this grouping need not be put on the C ring.Logic module Li can begin grouping P is put on the C ring.After logic module Li is put into some positions on the C ring, another logic module Lk (wherein, k>i) can begin to handle and remove these.In some cases, whole group P never is placed on the C ring.Logic module can be inserted data to ring C or ring R, or sends data to concentrator 150.To the control of the grouping that enters concentrator by means of from the control signal on the line 324 of concentrator.The logic module 314 that is associated with ring R304 can comprise to the additional transmission of the utility appliance (not shown) that can be associated with ring R and receive interconnection.Utility appliance can have various structures according to the purpose and the function of system, and implements various functions.An example of utility appliance is a system controller.
In some embodiments, PRAM 200 has the DPIM that comprises the logic module 314 that logical type is identical and function is identical.
In other embodiments, can have dissimilar and logic module function at a DPIMS of a particular address.The 2nd DPIMT can have with a DPIMS in the logic module of identical or different type.In a PRAM examples of applications, data word of storage in a single storage ring R.When data circulation time in ring R, this logic module can be revised data.In PRAM, the content among the storage ring R of logic module change stored program instruction and data.
PRAM utilizes packet memory and retrieve data, and this grouping is defined and comprises following field:
PAYLOAD|OP2|AD2|OP1|AD1|BIT
Being changed to 1 expression has the BIT field of a grouping to enter general-purpose system 100.The AD1 field is specified the address of concrete DPIM, and this DPIM comprises that one comprises the data storage ring R of desired data.Top switch routes the packet to by the DPIM of address AD 1 appointment (AD1).In the example shown, the OP1 field is one 1 bit field, is used to specify the operation that will carry out.For example, logical value 1 regulation READ (reading) request, logical value 0 regulation WRITE (writing) request.
In the READ request, the receive logic module that is positioned at AD1 among the DPIM will be stored in the address AD 2 that the data of encircling R send to bottom switch 112.In the WRITE request, the PAYLOAD field of grouping is placed on the R ring in address AD 1.AD2 only is used for specifying by the address of bottom switch 112 route datas in the READ request, and has stipulated the position that memory content will mail to.OP2 has described the operation that will implement the data that send to this AD2 equipment at the equipment of address AD 2 alternatively.If operation OP1 is READ request, then carries out the logic module of READ operation and do not use the PAYLOAD field.
Shown in realization in, PRAM only comprises one type logic module, promptly carries out READ and WRITE operation types.In the example that other PRAM realizes, use the logic module of other type, comprise independent READ type units and independent WRITE type units.
Referring to Fig. 2 and 3, shown PRAM 200 begins operation in the suitable moment by grouping is received in the top switch 110.Grouping P is passed through the top switch route, and arrives the target ring Z that is positioned at address AD 1.The target ring Z of the AD1 field designated top switch of grouping J306, J=AD1 wherein.Node S (not shown) and node T (not shown) are defined the describing message sequential.Node S is defined as encircling C JNode 330, node T is defined as encircling Z JNode 312, therefore, node S is located on control line 328 and transmits control signal to node T.Based on an overall clock signal, ring C JTagmeme time of arrival was in the appearance of node S when node S330 determined.If have value 1 the time tagmeme the time tagmeme arrive node S time of arrival, then the node T312 on ring Z sends block signal on the node S online 328, prevents that node T along the line 326 from sending grouping to logic module L downwards.If node S the time tagmeme do not receive position time of arrival with value 1, then do not have message to enter node S from node T, node S sends the unblock control signal to node T.Overall situation sequential promptly, the time that control signal arrives node T and message are from the Z ring or to be positioned at time of node U arrival node T of Z ring upper level from top switch consistent.Grouping 326 is withdrawed from top switch 110 from node 312 in the path, to logic module.This logic module can be put into this grouping on the communication loop C302, or grouping is not put into communication loop C and goes up and immediately grouping is handled.At this, the form of grouping P is as follows:
PAYLOAD|OP2|AD2|OP1|BIT
Grouping P along the line 326 is transferred to logic module L from the Z ring downwards.P begins when logic module L transmits the node N on the Z ring when grouping ZSend a control signal, higher node W notice node N in top switch ZThe unblock situation.This control signal is authorized node W and is routed the packet to and be positioned from node N ZReceive the node N of data XRight.Logic module L operates about sequential in the same manner to the grouping of receiving track 326 and the grouping of arrival ring C.Grouping P enters logic module L, and logic module L resolves and carried out the order of OP1 field.
In shown embodiment, the length of communication loop C is consistent with the length of storage ring R.The position is passed in the mode of bit serial, with the speed of common clock control and is encircled C and R.First of grouping PAYLOAD field with the DATA field of encircling R first aligns.Therefore, under the situation of READ request, the data among the ring R can be copied into the service load district of grouping.Under the situation of WRITE request, the data in the service load district of grouping can be transferred to storage ring R from grouping.
The READ request
Under READ request situation, the form of grouping P is:
PAYLOAD|OP2|AD2|OP1|AD1|BIT
Grouping enters top switch.Usually, discern the READ request in the logic module of the DPIM of address AD 1 by the operation of checking the OP1 field.Logic module replaces with the PAYLOAD field of grouping the DATA field of ring R.Then, the grouping after the renewal is sent in the bottom switch by concentrator, and this bottom switch is with computing unit (CU) 126 or the miscellaneous equipment of direct packets in address AD 2.This CU or miscellaneous equipment can be in conjunction with the instructions of data executable operations sign indicating number 2 (OP2) appointment of PAYLOAD field.
Grouping P enters the node T312 of ring on the Z, and for response enters the time tagmeme of the grouping P of node T, and from the unblock control signal of ring C node 330, node T begins to send grouping P to logic module L downwards along data routing 326.After BIT and OP1 field entered logic module L, the control signal on online 324 also arrived logic module L, expression concentrator 150 or be whether bottom switch can receive message when not comprising concentrator in structure.If this control signal is represented concentrator and can not receive message that then logic module L begins to ring C transmission grouping P.Grouping P moves to the next logic module on the ring C.
In same point, one of logic module L on the ring C receives a control signal that is not in a hurry from low layer.At this moment, logic module L begins the input node 320 transmission grouping P to interconnection structure B.
In the READ request, logic module is peelled off the OP1 field from grouping, and beginning input node 320 to concentrator on path 322 sends grouping.At first, logic module sends the BIT field, follow by the AD2 field, and be the OP2 field after again.Sequential is to be provided with like this, so that first time that arrives logic module of DATA field of time of this logic module and storage ring R is left in the last position of OP2 field is identical.Logic module remains unchanged the DATA field among the storage ring R, and the duplicate of DATA is placed the PAYLOAD field of the grouping of downward transmission, and continues in the mode of bit serial grouping to be sent in the concentrator.Data among the ring R remain unchanged.
The grouping that enters and leave concentrator is constant, and enters bottom switch 112, and its form is:
DATA|OP2|AD2|BIT
The PAYLOAD field comprises the DATA field of encircling R now.When grouping route process bottom switch, the AD2 field is removed.The output port 204 in address AD 2 of bottom switch is withdrawed from grouping.When withdrawing from, the form of grouping is:
DATA|OP2|BIT
The OP2 field is a code, can be used for different purposes.One of them is that the utilization of expression bottom switch output device is included in the operation that the data in the PAYLOAD field are implemented.
The interconnection structure of PRAM has the circulation sequential that causes effectively parallel generation and access data inherently.For example, can be at a plurality of external sources of different input port 202 to the same DATA field request READ operation of a certain specific DPIM 114.A plurality of READ requests can enter the specific target ring Z306 of top switch, the Different Logic module L of target approach DPIM then at different nodes 312.The READ request can enter ring C in same cycle period Different Logic module.Communication loop C320 and storage ring R304 be mobile synchronous with respect among the input interconnection structure B of target ring Z that is grouped in top switch and concentrator always.
For from ring R, to be added to the data of the suitable PAYLOAD position of forward packet, the READ request always arrives logic module in the correct time.Favourable result is can send simultaneously for the multiple request of same data among the ring R.Same DATA field is visited by a plurality of requests.Data from ring R are sent to a plurality of final goals.But a plurality of READ operation executed in parallel, forward packet arrives a plurality of output ports 204 simultaneously.By being read simultaneously by the diverse location of Different Logic module from ring R, this multiple READ request is carried out in overlapping mode.In addition, other multiple READ request is carried out in the different addresses of PRAM storer in same cycle period.
Because system sequence, the READ request is carried out in overlapping, effective and parallel mode.The sequential of one single READ has been shown among Fig. 4 A, 4B and the 4C.The length of storage ring R is identical with the length of communication loop C.Ring R comprises the loop-around data 414 that length is PayL.The remaining memory cell of ring R comprises zero or " blank (blank) ", or is left in the basket, and can be any value.BLANK field 412 is the position collection that are not included in the DATA field 414.
Referring to Fig. 4 A, the logic module that the part of each ring C and R is all passed a specific DPIM.One logic module comprises at least two of the shift register collection that constitutes ring C, and two of shift register that constitute ring R at least.In some embodiments, DPIM 314 comprises a plurality of logic modules 314.A logic module is located in read communication loop 302 in a clock period two.The time (not shown) that shows in overall signal, logic module is checked BIT field and OP1 field.In the embodiment that illustrates, logic module reads out whole OP1 field and BIT field together.In other embodiments, OP1 field and BIT field can be read in multiple operation.In the READ request, a logic module 314 of not blocking sends packets to concentrator or bottom switch in the correct time, aligns with other position in the input of concentrator or bottom switch dividing into groups.
In the READ request, the logic module that gets clogged is put into grouping on the ring C, and grouping will move to next logic module there.This next logic module may get clogged or not get clogged.If logic module subsequently gets clogged, then the grouping that also will encircle similarly on the C of the logic module of this obstruction sends to next logic module.If this grouping enters the rightest logic module LR that gets clogged, then the rightest logic module LR sends grouping by the FIFO on the ring C.Be grouped in when withdrawing from FIFO, enter the most left logic module.Grouping so circulates until running into a logic module that does not get clogged.The setting of ring C length will make the round-robin grouping fully adapt with ring forever.In other words, the length P of grouping LNever can be greater than ring length F L
In the READ operation, the form of grouping is:
|PAYLOAD|OP2|AD2|OP1|AD1|BIT|。
Grouping is inserted into top switch.Address field AD1 represents to comprise the destination address of the ring R304 of desired data.Operation field OP1 represents the READ request.Address field AD2 is the destination address that sends result's bottom switch output port 204.Operational code OP2 specifies will be by the function of output device enforcement.
In an exemplary embodiment, output device is identical with input equipment.Therefore, a single equipment is connected to input port 202 and the output port 204 of PRAM.For the READ request, logic module is ignored the PAYLOAD field, and it can have arbitrary value.On the contrary, in WRITE operation, the PAYLOAD field comprise to be put into ring R304 that DPIM in address AD 1 is associated on data.The form that leaves the grouping after the change of logic module is:
|DATA|OP2|AD2|BIT|
The form that enters the data of bottom switch is:
|DATA|OP2|BIT|
Data are left bottom switch by the output port by address field AD2 appointment, and DATA is the data field 414 of ring R there.
Fig. 4 A, 4B and 4C show the timing coordination between communication loop C, data storage ring R and the concentrator B.In one embodiment, these rings comprise a plurality of parallel FIFO in bus structure.Logic module 314 can once read and get multidigit.In this example, logic module L only receives 1 in each clock period.Concentrator B comprises a plurality ofly can accept input node 320 grouping, on the FIFO 308 from a logic module.Logic module is located in by the top injection data of input port 322 to concentrator.
Referring to Fig. 4 A, BIT field 402 is changed to 1, with first B of BLANK field on the data ring R 0408 arrive logic module simultaneously.Arranged the relative timing of loop-around data, thus make among the ring R first of DATA with ring C in first align (illustrating) of the request service load field of dividing into groups by line 410.
Data in concentrator B recently enter node 316 from top data on path 322 and have higher right of priority when another node from concentrator enters node 316.One global packet arrives the time that 316 groupings of clock signal (not shown) notice node may enter.If the grouping in concentrator enters node 316, then node 316 sends a block signal to the logic module that is connected to node 316 on path 324.For responding this block signal, logic module L sends to READ request grouping among the communication loop C as mentioned above.If the block signal that arrives from low layer not, then on the logic module L online 322 in concentrator B the downstream node 320 of node 316 send a grouping.
The READ request that Fig. 4 A shows in time T=0 o'clock, the logic module that promptly receives this request is handled the start time of this request.In this point, logic module has enough information and determines that this logic module receives a READ request, and should request not blocked from low layer.Particularly, this logic module is checked BIT and OP1 field, and three conditions are responded:
Receive not busy signal on online 324 from low layer,
BIT=1, and
The OP1=READ request.
When these three kinds of conditions are met, when this logic module is handled at initialization READ, for next time beat is got ready.Under the situation of OP1=WRITE, this logic module is handled at next time beat initialization WRITE.
Fig. 4 B, 4C show when not from the READ request process of node 316 when logic module sends block signal.
The READ request that Fig. 4 B shows in time T=1.All data bit among ring Z, C and R position that all moves right.The rightmost position of ring enters a FIFO.This FIFO provides one to the most left element.Logic module L along the line 322 sends the BIT field to the input port of concentrator downwards.After displacement, the C ring register comprises second and the 3rd of grouping, is respectively 1 OP1 field and first of AD2 field.This logic module also comprises second and the 3rd of the BLANK field of encircling R, i.e. B1 and B2.In the typical operation of PRAM 200, the logic module (not shown) on the logic module left side shown in may entering from the grouping of ring Z.Therefore grouping is not completely contained among the ring C.The remainder of this grouping perhaps may be when entering logic module L314 in top switch 110, still stays from input port in top switch and the wormhole process that withdraws from from ring Z.For ease of understanding, Fig. 4 A, 4B and 4C show READ request grouping and are completely contained in situation about encircling among the C.
In next step AD2L+OP2L, logic module L reads AD2 and OP2 field, and they are copied to input port 320.In this point, concentrator receives BIT field, AD2 field and OP2 field in the mode of bit serial.Before first arrival logic module L of DATA field 414, concentrator receives and handles this sequence in the wormhole mode.When AD2 on the logic module L read loop C and OP2, the BLANK field of ring R is passed through this logic module L, and is left in the basket.Logic module L is positioned to read first (being illustrated by line 410) in the PAYLOAD district that divides into groups among the communication loop C in first arrival of the DATA field of ring R.
Logic module L sends output data to both direction.At first, logic module L returns the grouping of a vanishing to ring C.Secondly, logic module L sends the DATA field downwards.All positions that send to ring C all are changed to 0 430, so that subsequently logic module can not repeat READ and operates on the ring C.In other words, when logic module L had successfully handled a request, this request grouping like this, was advantageously made other logic module on the same ring have an opportunity to accept other request grouping in same cycle period by zero clearing from communication loop C.Be to handle grouping in the wormhole mode ideally, and a plurality of different request grouping can be handled in a cycle period by a specific DPIM by logic module.
At time K+3, first of service load is located in and will be replaced with zero by L, the first data bit D on the ring R 1Be located in and be sent to bottom switch or the concentrator of data transmission to the bottom switch.The continuation of handling is shown in Fig. 4 C.Logic module is with the 2nd DATA position D 2Send to concentrator, read the 3rd DATA position D from data ring R simultaneously 3In the ending of handling, whole group is removed from communication loop C, and the form of grouping is:
|DATA|OP2|AD2|BIT|
Grouping is sent to the input port 320 or the bottom switch of concentrator.DATA is copied to concentrator by the DATA field from ring R.DATA field 414 in data ring R remains unchanged.
Referring to Fig. 5, logic module L1 504 and L2 502 carry out the READ request simultaneously.Usually, different request grouping P1 is sent from different input port 202 with P2, and enters top switch, causes in a DPIM and in the wormhole mode a plurality of READ requests is handled.All requests are in the example shown all asked the PRAM address of the AD1 field defined of grouping with respect to same by each.Grouping P1 and P2 arrive Different Logic module L1 and the L2 among the target DPIM respectively.Each logic module is handled grouping independently of each other.In the example shown, the first READ request P2 that arrives is handled by module L2 502.Module L2 has read and has handled BIT field, OP1 field and AD2 field five earlier.First four the input nodes 512 that send to concentrator of module L2 with BIT field and AD2 field.Similarly, module L1 reads and handles two of AD2 field of grouping P1 earlier, and first of AD2 sent to node 514 downwards.The AD2 field of two groupings has nothing in common with each other, and therefore, DATA field 414 is sent to two different output ports of bottom switch.To these two processing of request is to carry out in overlapping mode, wherein, second request only after first request seldom clock period occur.DPIM has T logic module, can handle T READ request potentially in same cycle period.As the result who handles the READ request, logic module always is put on the C ring 0 430.
Wormhole Route Selection to request and response is undertaken by top switch and bottom switch respectively, allows any input port and other input port to send the request grouping simultaneously.In general, any input port 202 can be independent of the request that other input port sends simultaneously and send the READ request to any DPIM.PRAM 200 supports parallel, the overlappingly visit of a plurality of requestors to single database, and supports a plurality of requests to same Data Position.
The WRITE request
In the WRITE request, the AD1 field of grouping is used for the grouping route through top switch.Be grouped in the position that enters ring C and leave the node 312 of top switch.The OP1 field is specified WRITE request.In this WRITE request, there are not data to be sent to concentrator.Therefore, logic module is ignored the control signal from concentrator.Logic module sends " 0 " input port 320 to concentrator, to pass on the information that does not have grouping sending.Always be allowed to enter first logic module that on the C ring, runs in the WRITE of Z ring request.
For making explanation simple and clear, the request grouping has been shown in ring C.In more typical application, this request will enter logic module by wormhole through top switch.For the WRITE request, logic module is ignored the information of the field beyond field OP1 and the PAYLOAD.
The WRITE request that Fig. 6 shows at time T=K+5.WRITE grouping on the ring C and the data sync among the ring R are rotated together and are passed through logic module.Last of OP2 field abandoned by logic module when last of the BLANK field of logic module and storage ring R aligns.When first of the PAYLOAD field of grouping arrived logic module L, logic module L removed this first from ring C, and with in this first the DATA field that is placed on ring R.Processing is proceeded, and is transferred to the DATA field of ring R from communication loop until whole PAYLOAD field.The logic module L zero clearing of will dividing into groups is that grouping is removed from ring C ideally, so that other logic module can not repeat the WRITE operation.
For visual, Fig. 6 show from ring C to the grouping of ring R moving.Typically, data arrive from top switch.More particularly, data scatter is on top switch.
In another embodiment, a plurality of R rings are arranged in a DPIM, the address of DPIM module is stored in the AD1 field, and the address that a given R encircles in the DPIM module is stored as the part of the OP1 field of expansion.In a DPIM memory module, have in the example of eight R ring, the OP1 field be four long, wherein, which R ring the first bit representation READ or WRITE operation, back three bit representation requests will be directed to.When each DPIM comprised a plurality of R ring, the progression of top switch and the progression of concentrator all were reduced.
Comprise a plurality of R rings among DPIM and also allow to require more multilogic more complicated operations in the more multidata and module and more complicated OP1 sign indicating number.For example, request to DPIM can be that requirement sends the peaked request in all R rings, or require to send the R ring a subclass value and request.DPIM request can also be the request that each duplicate that comprises the word of specifying son field is sent to an address of calculating, and therefore, permission is effectively inquired about certain data type.
In shown PRAM system, the BLANK field is left in the basket, and can have any value.In other embodiments, the BLANK field can be defined as supporting various operations.In an example, the BLANK field is used to the scoring plug function.A system comprises N processor, and wherein, processor is counted N less than BL.All N processor must read the DATA field before permission DATA field is capped.When a new DATA value was placed on the storage ring R, the BLANK field all was changed to zero.When the processor W reading of data in N the processor, then the position W of BLANK field is changed to 1.Only under the suitable N seat field of BLANK all was changed to 1 situation, the DATA part of ring R just can be capped.The BLANK field is changed to complete zero again.
The scoring plug function is one of polytype of BLANK field purposes.Those skilled in the art can effectively utilize the BLANK field in many calculating and communications applications.
In some applications, a plurality of logic modules among the DPIM must be able to intercom mutually.An example of this application is leakage bucket (leakybucket) algorithm that is used for ATM(Asynchronous Transfer Mode) the Internet switch.In shown parallel access memory 200, computational logic module 314 sends a signal to local counter (not shown) when receiving READ request item.Do not have two computational logic modules to receive first that reads grouping simultaneously in a DPIM, therefore, public DPIM bus (not shown) is advantageously used in the counter that stepping is connected to all logic modules.This counter can respond all computational logic modules, and therefore, when notice " leak bucket change all over ", all suitable logic modules all are informed to, and produces to the suitable suitable answer of target and respond this information by revising AD2 and OP2 field.
With reference to Fig. 1, the computing engines 100 that constitutes as elementary cell with the network interconnection structure has been shown in schematic block diagram.The various embodiments of this computing engines are included in the core cell of the general-purpose system of mentioning in the discussion about Fig. 1 100.For a kind of embodiment as computing system of this computing engines, bottom switch 112 sends to grouping the computing unit 126 that comprises one or more processors and storer.With reference to Fig. 3, the part of the computational logic module executive system whole computing functions related with ring R.Receive data computing unit 126 from bottom switch 112 and carry out additional logical operation.
Logic module is carried out conventional and processor operation novelty according to the whole ideal functionalities of computing engines.
First example of system 100 is telescopic a, concurrent computational system.In the one side of operation, this system carries out parallel SORT, and it comprises the parallel relatively child-operation of this SORT operation.Logic module L accepts first data element from grouping, and accepts second data element from storage ring R304.Logic module is put into the PAYLOAD field with bigger being put on the storage ring R in two data elements with less value, and will this less value sends to the address by the AD2 field appointment of grouping.If two such logic modules are in series, as shown in Figure 3, then second logic module can only compare carrying out second from the data of first logic module in the clock period seldom.Relatively and replace that to handle in a lot of sort algorithms be the work of public unit, know prior art the people can with relatively and replace handle be integrated into bigger and the line ordering engine in.
Those skilled in the art can construct many useful, effectively be applicable to the logic module 314 of the broad range of system applies.A logic module can be implemented a large amount of operations, maybe can construct dissimilar logic modules, so that the more a spot of task of each cell processing.In system 100, comprise two class processing units, unit in DPIM 114 and the unit in computing unit CU 126.DPIM handles the bit serial data and moves, and implements the compute type of mobile mass data.CU comprises one or more processors, as general processor and conventional RAM.CU carries out " numeral is moved (number crunching) " operation to this CU being set to local data effectively, and produces, transmits and receive grouping.The critical function of DPIM is to provide to have to CU in the low mode that postpones to walk abreast to be convenient to the further data of the form of processing.
In a functional example, the computational problem in a big zone can be broken down into one group of nonoverlapping subregion.CU is selected to receive from all subregion the data of predefined type, and these subregions are made contributions to the calculating that CU implements with important way.DPIM prepares data and the result is sent to suitable CU.For example, the position that the zone can be an international chess piece in possible ten move, and each subregion be included in given mobile right eight move in all possible position.DPIM only returns likely first to CU and moves rightly, and wherein, data are arranged to minimum hope according to being hopeful most.
In Another Application, the zone comprises object in three-dimensional expression, and subregion then is a kind of division in space.In a concrete example, interested situation is defined in that gravity has surpassed a ratings on the interested object.DPIM is transmitted to CU with data from the subregion that comprises with the data of interested term harmonization.
Scalable system shown in Figure 1 can be used for supercomputer with the embodiment that utilizes the core cell of scalable system and uses.In supercomputer was used, CU was concurrently so that form and in good time mode receive data easily.CU is deal with data concurrently, sends result, and produces the request of iteration (iteration) subsequently.
DPIM is of great use as book keeper and task dispatcher.An example is to use the task dispatcher of a plurality of K computing units (CU) in set H.Set H CU typically implements the various tasks in the parallel computation.When task was finished, the N among K CU was assigned with new task.Data storage ring R can store K bit data at least, and the word W that is K with a length is changed to zero.Each bit position all is associated with a specific CU among the set H among the word W.When a CU finished the task of distribution, this CU sent a grouping M to comprising the DPIM that encircles R.Logic module L1 on the data storage ring R revises word W by insert 1 on the position, position relevant with the CU that sends grouping M.1 number among another logic module L2 trace word W on the data storage ring R.When among the word W N position being arranged, N idle CU begins new task among the H.New task is by beginning to N processor multileaving grouping.A kind of effective ways to the subclass multileaving of gathering H below will be discussed.
With reference to Fig. 7, this schematic block diagram shows structure and the technology of utilizing indirect addressing to implement the multileaving operation.To a plurality of target multileaving groupings by the appointment of corresponding address are very useful functions in calculating and the communications applications.One group of second address pointed in first a single address.These second addresses are destinations of the duplicate of multileaving grouping service load.
In some embodiments, the interconnection structure system has the set C of an output port, and its characteristic is under certain conditions, and system sends predetermined grouping service load to set C 0In all output ports.Each gathers C 0, C 1, C 2..., C J-1Be one group of output port, therefore, for a specific Integer N less than J, as the result of a multileaving request, set C NIn all of the port all can receive same specific cluster.
Multileaving interconnection structure 700 is stored one group of set C in storage ring R 704 NOPADD.The address capability of each ring is FMAX address.In an example shown, the address capability of ring R is FMAX=5 address among Fig. 7.
Can use various switch configurations and size, in the example shown in, bottom switch comprises 64 output ports.Output port address can be stored in one 6 binary modes.Ring R comprises that five are labeled as F 0, F 1, F 2, F 3, F 4, comprise the set C NThe field 702 of middle output port position.The length of each field is 7.If C NA location storage in back 6 of this field, then first of this 7 bit field is set as 1, otherwise is 0.
Have at least two types grouping to arrive multileaving logic module MLM 714, comprise MULTICAST READ and MULTICAST WRITE grouping.
First kind grouping PW has an OP1 field of specifying MULTICAST WRITE to operate.This WRITE grouping arrival communication loop 302 and form are as follows:
|PAYLOAD|OP1|BIT|
PAYLOAD equals F 0, F 1, F 2, F 3, F 4The connection of field.Grouping PW arrives communication loop 302 in the position that is suitable for MLM 714, to read F at reasonable time 0First.MLM writes ring R with first of PAYLOAD, and the WRITE class of operation of its mode and top discussion to Fig. 6 seemingly.
Fig. 7 shows a logic module that is connected to the special hardware DPIM 714 that supports the multileaving ability.Be response WRITE request, system implementation one is with field F 0, F 1, F 2, F 3, F 4From encircling the operation that Z and C are transferred to data storage ring R 304.BIT=1 represents a grouping.When BIT=0, the remainder of grouping always is left in the basket.Opcode field OP1 follows the BIT field.In MULTICAST WRITE operation, OP1 replaces any current data in storage ring in representing that service load will be from the transmitted in packets to the storage ring.Data are transferred to storage ring R from MLM serially.
As shown in the figure, data improve the rightest line 334 transmission.Data arrive storage ring 704 with correct form at reasonable time and position.In MULTICAST WRITE operation, the control signal from the bottom switch to MLM on the line 722 can be left in the basket.
Another kind of grouping PR specifies MULTICAST READ request, can arrive communication loop 302, and its form is:
|PAYLOAD|OP2|BLANK|OP1|BIT|
BLANK district in this example be 6 long.The BLANK field is by C NThe destination address of one of field replace.Perhaps, the OP1 field can maybe can not be used for specific grouping or application.One component group enters bottom switch 112, and form is as follows:
|PAYLOAD|OP2|AD2|BIT|
Address field AD2 is at first from the ring R field.Operation field OP2 and PAYLOAD divide into groups from MULTICAST READ at first.
In an example shown, storage ring R 704 is positioned at destination address AD1, stores three output port addresses, and for example 3,8 and 17.OPADD 3 is stored in field F 0The highest significant position of address 3 occurs earlier, is next highest significant position then, or the like.Therefore, six binary modes of standard of expression decimal integer 3 are 000011.The title position is used with the order of highest significant position to least significant bit (LSB).More suitably be, the storage of title position with highest significant position preceding, so that at field F 0In, the field of expression target output 3 is by 110000 expressions of six bit patterns.The whole F of tagmeme when comprising 0Field has one or seven bit patterns 1100001.Similarly, field F 1With pattern 0001001 storage decimal number 8.Field F 2With pattern 1000101 storage decimal numbers 17.Owing to there is not extra output port to be addressed field F 3And F 4All be changed to 0:0000000.
Control signal on the line 722 is represented the not clogged conditions of bottom switch, allows to enter switch in the grouping online 718.If the control signal on the control signal from bottom switch to logic module on 714 online 722 represents the situation of doing then do not have data to send downwards.When " not being in a hurry " control signal arrived MLM, the address data field among the ring R suitably was positioned to produce and is sent response down to reading unit 708 and bottom switch 112.The suitable time after " not being in a hurry " signal arrives logic module, MLM begins by bottom switch 112 to set C NThe address send a plurality of MULTICASTREAD respond packet.
This system can send MULTICAST READ grouping to the DPIM in address AD 1, and the PAYLOAD field that will divide into groups then is multi-cast to the set C that is stored among the ring R704 NIn a plurality of addresses.
Typically, this multileaving system comprises the hardware that can implement a large amount of calculating and data storage task.In an example shown, the multileaving ability obtains by using DPIM unit 700, and this DPIM unit has carried out particular arrangement, to keep and the transmission multicast address.
An above-described general multileaving function is a concrete pattern, and wherein, a grouping M is broadcast to set C NIn have the predetermined subset of the output port of mandatory member address.A position mask of representing which member will be sent out is called the transmission mask.In an example, address 3,8 and 17 is set C NThree members.Send mask 0,0,1,0,1 and be illustrated in tabulation C NIn the first and the 3rd output port will receive grouping.Respond packet is multi-cast to output port 3 and 17.In an example, control signal represents whether all input ports all are ready to receive grouping, perhaps have one or more input ports to get clogged.
In another example, the tabulation that output port is not blocked in storage.This tabulation is to be called as the mask that blocks mask.The value 1 expression C that is sending mask N position NN member wish to be sent out.The value 1 expression C that is blocking mask N position NN member do not get clogged, can send to it.For in these two locational values 1 of mask N, grouping M will be sent to N output port in the table.
For by the subclass that sends the mask appointment, will be multi-cast to and be listed in C NIn the form of grouping of subclass of output port as follows:
| the PAYLOAD|OP2| mask | multileaving OP|AD1|BIT|
Grouping is inserted into the top switch of system.Address field AD2 does not use, because the address in the AD2 field is generally comprised within the data that are stored in the AD1 address field.
With reference to Fig. 7, BIT field and OP1 sign indicating number are sent to the logic module 714 from ring C or ring Z.Send mask and block mask and enter logic module simultaneously.PAYLOAD is sent to address F JIf the J position that sends mask and obstruction mask all is changed to 1.The remainder of this operation will continue to carry out with the multileaving pattern that does not have mask.
Set C NIn the output port collection be represented as P 0, P 1..., P mOutput port is divided into group, wherein comprises the C that can be stored among the data storage ring R at most NNumber of members.At data ring R five OPADD and set C are arranged NHave under the situation of nine output ports, then preceding four output ports are stored in group 0, and following four output ports are stored in group 1, and last four output ports are stored in group 3.Output port sequence P 0, P 1..., P 9Can also add that index is: q 00, q 01, q 02, q 03, q 10, q 11, q 12, q 13, q 20By this way, can be with representing that two integers of target under group number and the address field intactly describe the physical address of target.
Concerning some were used, following information was carried in the service load of grouping:
C NSubscript N represent which port output port concentrates be used to positioning address,
The C that the address was positioned NGroup,
The member of the group under the address, and
Grouping is with the input port of the top switch that is inserted into.
Two subscripts of member q are represented in item of information (2) and (3), can easily calculate the index of p from this two subscript.For carrying this packets of information, the form of PAYLOAD field is:
First subscript of N|q | second subscript of q | the input end slogan |
The system of location between Fig. 7 also shows and uses in multileaving.One more shirtsleeve operation be indirect addressing to an output port.In the example of an indirect addressing, data storage ring R comprises the only field of representing this indirect address.As an example, the data storage ring R of 17 DPIM comprises value 153 in the address.The grouping that sends to address 17 is forwarded to the output port 153 of bottom switch.
Here in the embodiment of Miao Shuing, all send data with the logic module that given ring R is associated to bottom switch 112.Send burst service at a DPIM, and the business that other DPIM unit sends in a small amount arrives under the situation of bottom switch, each ring R send grouping to one group of ring B rather than to same environment-development.In another example, ring R sends packets to the concentrator 150 that transmits data to bottom switch 112.
Here in the system that is put down in writing, the information among data storage ring R304 and the communication loop R302 circulates with the form of the FIFO that circulation is connected.A kind of distortion of this system is that the information among the ring R704 is static.Data from level 0 ring of top switch 110 can be connected to and enter the static cache device.Data in the static cache device can be with mutual with the mode of above-mentioned circulation pattern logical equivalence.The advantage of static model is more effective storages possible to data.
In this manual, data X is sent to the ring R that keeps data Y.Ring C receives the stream of data X and Y as input signal, to data X and Y carry out mathematical functions F, and result of calculation is sent to the target output port.Target can be stored in the field of ring R, or is stored in the AD2 field of grouping.In addition, target can also be conditionally based on F (X, result Y), or by another function G (X Y) produces.
In another kind is used, can implement multiple operation to data X and Y, the result of this multiple operation can be sent to a plurality of targets.For example (X, result Y) is sent to the target by address AD 2 appointments to function F.(X, result Y) can be sent to the target by address AD 3 appointments in the grouping to function H.The advantage of multiple operation is the multiple conversion of the parallel effectively enforcement of permission system 100.
Except two variable X and Y are carried out the more complicated arithmetic function, also can carry out more simple task, so that function F only is the function of X or Y.The result of simple function F (X) or F (Y) is sent to the target by address AD 2 appointments, or is produced by another function G (X).
Although invention has been described with reference to various embodiments, should be understood that illustrated embodiment and scope of the present invention are not limited in this.Many distortion, modification and be possible to the improvement of described embodiment.For example, those skilled in the art can realize structure disclosed herein and the required step of method, and can understand, and procedure parameter, material and scope are only for providing for example; And scalable to be to realize the desired function feature, and modification is within the scope of the invention.To the change of embodiment disclosed herein be modified under the situation about not exceeding and carry out according to instructions according to the scope and spirit of the present invention of claim.
Those skilled in the art can carry out useful change and modification within the scope of the invention.The change that some are such and the example of modification are listed, but may expand to other system.
In the claims, unless otherwise mentioned, article " a " expression " or more than ".

Claims (49)

1. parallel data treating apparatus, it comprises:
Interconnection structure (100), its a plurality of positions that interconnect;
One or more storage unit (114), it is coupled to this interconnection structure in these a plurality of positions, and can be accessed by this interconnection structure, described storage unit is included in the storage unit W of position L, and this storage unit W has a plurality of memory blocks that are connected to paired and synchronous FIFO storage ring (304); And
A plurality of computing units (126), it is coupled to described interconnection structure in these a plurality of positions, these a plurality of computing units can be by the memory cell access data of this interconnection structure from the paired FIFO storage ring of one or more described synchronous circulation, described computing unit comprises first computing unit and second computing unit, this first and second computing unit can be simultaneously from the different memory areas reading of data of described storage unit W, and the data content of the memory block of this storage unit W can be sent to different target locations.
2. parallel data treating apparatus, it comprises:
Interconnection structure (100), its a plurality of positions that interconnect;
A plurality of storage unit (114), it is connected to paired and synchronous FIFO storage ring (304), and be coupled to this interconnection structure, and can be accessed as the position by this interconnection structure, described storage unit comprises first storage unit and second storage unit that lays respectively at the primary importance and the second place; And
A plurality of computing units (126), it is coupled to described interconnection structure in these a plurality of positions, these a plurality of computing units can be by the memory cell access data of this interconnection structure from one or more form synchronous circulation with described paired FIFO storage ring, described computing unit comprises first computing unit and second computing unit, described first computing unit can be operated the data that this reads from the first and second storage unit reading of data simultaneously then, and described second computing unit can read and operate the data of first and second storage unit overlappingly with reading with the running time of being carried out of first computing unit.
3. parallel data treating apparatus, it comprises:
Interconnection structure (100), its a plurality of positions that interconnect;
A plurality of storage unit (114), it is coupled to this interconnection structure in these a plurality of positions, and can be accessed by this interconnection structure, described storage unit is connected to paired and synchronous FIFO storage ring (304), each storage unit in these a plurality of storage unit all comprises first circulating register, and this first shift register storage is divided into first word of a plurality of memory blocks; And
A plurality of computing units (126), it is coupled to described interconnection structure in these a plurality of positions, and these a plurality of computing units can be operated the memory block that separates of first word simultaneously.
4. device according to claim 3, wherein, described storage unit is connected to paired and synchronous FIFO storage ring (304), and this storage unit comprises second circulating register (302), and this second shift register storage is divided into second word of a plurality of memory blocks; And
Described a plurality of computing unit can utilize the information in described first word that second word is operated.
5. parallel data treating apparatus, it comprises:
Interconnection structure (100) is used to carry data, and it comprises a plurality of layering interconnected nodes (330), and this interconnection structure comprises a logic (114), the data collision of this logic prediction node, and according to the right of priority solution data collision of determining according to level;
First switch (110), it is coupled to this interconnection structure, according to being included in the communication information in the data to this interconnection structure distributing data;
A plurality of computing modules (114), it is coupled to this interconnection structure by paired and synchronous FIFO storage ring (304), and these logic modules can be executed operation by logarithm factually;
Second switch (112), it is coupled to described a plurality of computing module, and receives data from described a plurality of computing modules.
6. device according to claim 5, further comprise: a plurality of interconnecting modules, they are coupled to described a plurality of computing module, and be coupled to described first switch, the data traffic of described a plurality of interconnecting modules in can the monitoring calculation module, and control is by the sequential of described first switch injection data, to avoid data collision.
7. device according to claim 5, wherein, described first switch has a plurality of output ports, this device further comprises: a plurality of interconnecting modules, they are coupled to described a plurality of computing module, and being coupled to described first switch, these a plurality of interconnecting modules are associated with a plurality of first output switching terminal mouths respectively.
8. device according to claim 5, wherein, described a plurality of computing modules comprise logic, this logic utilizes the information that comprises in the data to determine the computing module of implementation and operation and this operation to be performed.
9. device according to claim 5, wherein, described a plurality of computing module has a plurality of different logical unit type (LU type)s, the logic function of its logic function for from data transfer operation, logical operation and arithmetical operation, selecting, wherein, data transfer operation comprises loading, stores, reads and writes; Logical operation comprise with or or non-, with non-, different and and XOR, bit test; And arithmetical operation comprise add, subtract, multiplication and division and transcendental function.
10. device according to claim 5, further comprise: a plurality of interconnecting modules, it is coupled to described a plurality of computing module, and be coupled to described first switch, the data traffic of these a plurality of interconnecting modules in can the monitoring calculation module, and comprise buffer and concentrator, be used for storage and intensive data, these interconnecting modules are also controlled the sequential of being injected data by described first switch, to avoid data collision.
11. device according to claim 5, wherein, described first and second switches, interconnection structure, and a plurality of computing modules constitute an interconnecting unit, described device further comprises: one or more computing units (126), it is coupled to described interconnection structure, and is positioned to send data outside this interconnecting unit, and sends data to described first switch.
12. device according to claim 5, wherein, described first and second switches, interconnection structure, and a plurality of computing modules constitute an interconnecting unit, described device further comprises: one or more memory cells, it is coupled to described interconnection structure, and is positioned to send data outside this interconnecting unit, and sends data to described first switch.
13. device according to claim 5, wherein, described first switch and second switch are handled the long data of multiple not coordination.
14. device according to claim 5, wherein, described computing module is dynamic storage computing module in processor.
15. device according to claim 5, wherein, described device is operated message, this message comprises a plurality of information and data field, wherein, comprise that one can carry the service load field of data service load, one specifies first address of the storage unit of the data that keep to be operated, one specifies remaining on first operational code of the performed operation of data in first address, one specifies second address of the optional equipment that the data in first address storaging unit are operated, and second operational code of specifying the operation that this second address device implements the data in first address storaging unit.
16. device according to claim 5, wherein, described device is operated message, this message comprises a plurality of information and data field, wherein, comprise the field that an expression packet exists, one can carry the service load field of data service load, one specifies first address of the storage unit of the data that keep to be operated, one specifies remaining on first operational code of the performed operation of data in first address, one specifies second address of the optional equipment that the data in first address storaging unit are operated, and second operational code of specifying the operation that this second address device implements the data in first address storaging unit.
17. device according to claim 5 further comprises:
One or more computing units (126), it is coupled to described second switch, and this second switch can send packet to these one or more computing units, and described device is a computing engines.
18. device according to claim 5 further comprises:
One or more storage unit, it is coupled to described interconnection structure in a plurality of positions, and can be accessed by this interconnection structure, and described storage unit has a plurality of memory blocks that are connected to paired and synchronous FIFO storage ring (304); And
A plurality of computing units (126), it is coupled to described interconnection structure in a plurality of positions, these a plurality of computing units can be by the memory cell access data of this interconnection structure from one or more form synchronous circulation with described paired FIFO storage ring, described computing unit comprises first computing unit and second computing unit, this first and second computing unit can be simultaneously from the different memory areas reading of data of a storage unit, and the data content of different memory areas can be sent to different target locations.
19. device according to claim 5 further comprises:
One or more storage unit (116), it is coupled to described interconnection structure in a plurality of positions, and can be accessed by this interconnection structure; And
A plurality of computing units (126), it is coupled to described interconnection structure in a plurality of positions, these a plurality of computing units can be by this interconnection structure from one or more memory cell access data, described computing unit comprises first computing unit and second computing unit, this first computing unit can be simultaneously reads and operates the data of two storage unit, and this second computing unit can read and operate the data of these two storage unit overlappingly with reading with the running time of being carried out of first computing unit.
20. a parallel access memory, it comprises:
A plurality of computing modules (114), it is connected to a layering interconnection structure by paired and synchronous FIFO storage ring (304), this interconnection structure can carry data, and can predict the data collision of node, and solves data collision according to the right of priority of determining according to level at least in part;
First switch (110), it is coupled to this interconnection structure, according to being included in the communication information in the described data to these a plurality of computing module distributing datas;
Second switch (112), it is coupled to described a plurality of computing module, and receives data from described a plurality of computing modules.
21. storer according to claim 20, wherein, a computing module in described a plurality of computing modules comprises a data communication ring (306) and a data storage ring (304), and this data communication ring and data storage ring are the FIFO rings of synchronous circulation.
22. storer according to claim 20, wherein, a computing module in described a plurality of computing module comprises a data communication ring (302) and a data storage ring (304), this data communication ring and data storage ring are the FIFO rings of synchronous circulation, a data element is stored in the FIFO storer, when this data element winding number moved according to storage ring, this computing module can be made amendment to data.
23. storer according to claim 20, wherein, a computing module in described a plurality of computing module comprises a data communication ring (302) and a data storage ring (304), this data communication ring and data storage ring are the FIFO rings of synchronous circulation, a data element is stored in the FIFO storer, and this FIFO storer is stored program instruction and data simultaneously.
24. storer according to claim 20, wherein, a computing module in described a plurality of computing module comprises a data communication ring and a data storage ring, and this data communication ring is the mirror image that is coupled to the ring on the end level of described first switch of this data communication ring.
25. storer according to claim 20, it further comprises:
One data communication ring; And
A plurality of data storage rings, the one or more computing modules in described a plurality of computing modules are associated with this data communication ring and data storage ring.
26. storer according to claim 20, it further comprises:
One data communication ring; And
A plurality of data storage rings, the one or more computing modules in described a plurality of computing modules are associated with this data communication ring and data storage ring, and described a plurality of computing modules have identical logical unit type (LU type).
27. storer according to claim 20, it further comprises:
One data communication ring; And
A plurality of data storage rings, the one or more computing modules in described a plurality of computing modules are associated with this data communication ring and data storage ring, and described a plurality of computing modules have multiple different logical unit type (LU type).
28. storer according to claim 20, it further comprises:
One data communication ring; And
A plurality of data storage rings, one or more computing modules in described a plurality of computing module are associated with this data communication ring and data storage ring, described a plurality of computing module has multiple different logical unit type (LU type), the logic function of its logic function for from data transfer operation, logical operation and arithmetical operation, selecting, wherein, data transfer operation comprises loading, stores, reads and writes; Logical operation comprise with or or non-, with non-, different and and XOR, bit test; And arithmetical operation comprise add, subtract, multiplication and division and transcendental function.
29. storer according to claim 20, it further comprises:
A plurality of interconnecting modules, it is coupled to described a plurality of computing module, and be coupled to described first switch, the data traffic of these a plurality of interconnecting modules in can the monitoring calculation module, and comprise buffer and concentrator, be used for storage and intensive data, described interconnecting modules is also controlled the sequential of being injected data by described first switch, to avoid data collision.
30. storer according to claim 20, it further comprises:
One data communication ring (302); And
With a plurality of data storage rings (304) of this data communication ring synchronous circulation, this data storage ring storage can be carried out the data that access also sends to a plurality of targets simultaneously simultaneously from multiple source.
31. storer according to claim 20, wherein,
Described computing module is dynamic storage computing module (114) in processor.
32. access storage and computing equipment more than one kind comprise:
A plurality of logical device, these logical device comprise the memory device that is connected to paired and synchronous FIFO storage ring (304); And
One interconnection structure, it is coupled to this logical device, is used for data and operational code are routed to these logical device, and this interconnection structure further comprises:
A plurality of nodes (330);
A plurality of logical blocks (114) that are associated with these a plurality of nodes;
The multiple messages interconnection path, wherein, every paths is coupled to the node in described a plurality of node selectively, so that send data from a node as sending node to a node as receiving node;
Many control signal interconnection paths, wherein, every paths is coupled to the node in described a plurality of node selectively, so that transmit control signal to the logical block related with receiving node from sending node;
Described a plurality of node comprises:
Different node A, B and X;
The logic L related with Node B B, it is determined route for Node B and judges;
One from as the Node B of sending node to information interconnect path as the nodes X of receiving node;
One from as the node A of sending node to information interconnect path as the nodes X of receiving node;
One from as the node A of sending node to logic L BThe control signal interconnection path, described control signal is forced to send data from node A to nodes X and is had than sending the higher right of priority of data from Node B to nodes X.
33. access storage and computing equipment more than one kind comprise:
A plurality of logical device (114), these logical device comprise the memory device that is connected to paired and synchronous FIFO storage ring (304); And
One interconnection structure, it is coupled to described logical device, is used for data and operational code are routed to these logical device, and this interconnection structure further comprises:
A plurality of nodes (330) are comprising different node A, B, X and Y;
Many interconnection paths, node in its described a plurality of node that is coupled selectively, these interconnection paths comprise and are used for control signal interconnection path that transmits control signal to the computing module related with the node that utilizes control signal from the node that transmits control signal and the data interconnect path that is used for sending to data reception node from data transmitting node data;
Node B comprises and is used for to nodes X and sends the data interconnect path of data to node Y;
Node A comprises the logic L that is used for to related with Node B BThe control interconnection path that transmits control signal, logic L BBe exercisable, make that node A is to this logic L for the message M that arrives Node B BSend a control signal C, logic L BUtilizing this control signal C decision that message M is sent to nodes X still is node Y.
34. many accesses storage according to claim 33 and computing equipment, wherein, described computing module L BBe exercisable, make the message M ' that arrives Node B to be routed to a node D who is different from nodes X, Y and B.
35. access storage and computing equipment more than one kind comprise:
A plurality of logical device (114), described logical device comprise the memory device that is connected to paired and synchronous FIFO storage ring (304); And
One interconnection structure (100), it is coupled to described logical device, is used for data and operational code are routed to these logical device, and this interconnection structure further comprises:
A plurality of nodes (330), comprising a node A, a Node B and a set of node P, node A is set of node P different nodes in addition with B, Node B can send data by all nodes in set of node P; And
Many interconnection paths, node in its described a plurality of node that is coupled selectively, these nodes are to being selected with the node that comprises a sending node and a receiving node, described sending node sends data to receiving node, and these many interconnection paths comprise data interconnect path and control interconnection path; These many control interconnection paths are coupled in these a plurality of nodes selectively as the node of control signal sending node, are used for to utilizing the related logic of node to transmit control signal with control signal;
Many control interconnection paths comprise from node A to the logic L related with Node B BThe control interconnection path, logic L BBe used to determine from the control signal of node A Node B sends the data to which node among the set of node P.
36. access storage and computing equipment more than one kind comprise:
A plurality of logical device (114), described logical device comprise the memory device that is connected to paired and synchronous FIFO storage ring (304); And
One interconnection structure (100), it is coupled to described logical device, is used for data and operational code are routed to these logical device, and this interconnection structure further comprises:
A plurality of nodes (330), comprising a node A, a Node B and a set of node P, node A is set of node P different nodes in addition with B, Node B can send data by all nodes in set of node P; And
Many interconnection paths, the node in its described a plurality of node that is coupled selectively, these nodes be with the node that comprises a sending node and a receiving node to being selected, sending node sends data to receiving node;
One logic L A, it is associated with node A, can determine where data are routed to from node A;
One logic L B, it is associated with Node B, can determine where data to be routed to logic L from Node B AWith logic L BDifference, logic L BUtilize logic L AThe information of determining determines Node B sends the data to which node among the set of node P.
37. many accesses storage according to claim 36 and computing equipment, wherein, described Node B can send data to the node outside the described set of node P.
38. access storage and computing equipment more than one kind comprise:
A plurality of logical device (114), described logical device comprise the memory device that is connected to paired and synchronous FIFO storage ring (304); And
One interconnection structure (100), it is coupled to described logical device, is used for data and operational code are routed to these logical device, and this interconnection structure further comprises:
A plurality of nodes (330), wherein each node comprises a plurality of data-in ports, a plurality of data-out port, and control is by the logical block of the data stream of this node;
Described a plurality of node comprises different node A, B, X and Y mutually;
Many interconnection paths, node in its described a plurality of node that is coupled selectively, these interconnection paths comprise and are used for the control interconnection path that transmits control signal to the logic related with the node that utilizes control signal from the node that transmits control signal, with the data interconnect path that is used for sending to data reception node data from data transmitting node, described data interconnect path is coupled with described data-in port and data-out port selectively, described many control interconnection path switching nodes and logical block are used for transmitting control signal to the logical block that the node with the data stream with the control signal of depending on is associated from the control signal sending node;
Node B and logic L BBe associated logic L BBe used to determine message M is passed from the control signal of node A the route of Node B, the control signal C that receives from node A makes message M is sent to nodes X that the control signal C ' that receives from node A makes message M be sent to node Y from Node B.
39. according to described many accesses storage of claim 38 and computing equipment, wherein, no matter the control signal from node A is control signal C or control signal C ', all will be through Node B route messages M.
40. according to described many accesses storage of claim 38 and computing equipment, wherein, the described control signal that sends to Node B is drawn from the data-out port of node A.
41. access storage and computing equipment more than one kind comprise:
A plurality of logical device (114), described logical device comprise the memory device that is connected to paired and synchronous FIFO storage ring (304); And
One interconnection structure (100), it is coupled to described logical device, is used for data and operational code are routed to these logical device, and this interconnection structure further comprises:
A plurality of nodes (330), comprising nodes X and set of node P, this set of node P comprises a plurality of nodes that can send data to nodes X; And
Many interconnection paths, node in its described a plurality of node that is coupled selectively, these interconnection paths comprise the data interconnect path that is used for sending to receiving node from sending node data, node among the set of node P has the priority relationship that sends data to nodes X, wherein, having the node that sends the highest priority of data to nodes X never can get clogged when nodes X sends data.
42. according to described many accesses storage of claim 41 and computing equipment, wherein, node A among the set of node P, can not blocked to the data that nodes X sent by the Node B that its right of priority to nodes X transmission data is lower than this node A when nodes X sends data at it.
43. according to described many accesses storage of claim 41 and computing equipment, wherein, internodal priority relationship to nodes X transmission data depends on the position of each node in interconnection structure among the set of node P among the described set of node P.
44, a kind of calculation element that uses in computing system comprises:
The first and second synchronization fifo rings (302,304); And
At least one is coupled to the computing module (114) of this first and second synchronization fifos ring, and this computing module (114) can be visited at least one position of each FIFO ring simultaneously.
45., further comprise according to the described calculation element of claim 44:
Connection to the clock that is applicable to whole computer system, and to comprising a plurality of position and advance to the connection of first and second FIFO ring of next position that a cycle period of described clock is defined as FIFO ring position and finishes a clock period that circulation is required just with bucket prescription formula.
46., further comprise according to the described calculation element of claim 44:
Except that described first (302) and second (304) FIFO ring at least one be (FIFO) ring (306) synchronously, and described at least one computing module can be visited the data of first and second FIFO ring and this at least one synchronization fifo ring simultaneously.
47. according to the described calculation element of claim 44, wherein:
Described computing module (114) is set to respectively read two that described first and second FIFO encircle in a single clock cycle.
48. according to the described calculation element of claim 44, wherein:
Described computing module (114) can be implemented one of following at least action when receiving packet: this data packet transmission to another FIFO ring, is used the data in this grouping, and immediately this packet sent to outside the described device.
49. according to the described calculation element of claim 44, wherein:
Described at least one computing module can once be visited the multidigit of described FIFO ring.
CNB018208878A 2000-10-19 2001-10-19 Scaleable interconnect structure for parallel computing and parallel memory access Expired - Fee Related CN100341014C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US69360300A 2000-10-19 2000-10-19
US09/693,603 2000-10-19

Publications (2)

Publication Number Publication Date
CN1489732A CN1489732A (en) 2004-04-14
CN100341014C true CN100341014C (en) 2007-10-03

Family

ID=24785344

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB018208878A Expired - Fee Related CN100341014C (en) 2000-10-19 2001-10-19 Scaleable interconnect structure for parallel computing and parallel memory access

Country Status (7)

Country Link
EP (1) EP1360595A2 (en)
JP (1) JP4128447B2 (en)
CN (1) CN100341014C (en)
AU (1) AU2002229127A1 (en)
CA (1) CA2426422C (en)
MX (1) MXPA03003528A (en)
WO (1) WO2002033565A2 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8605099B2 (en) 2008-03-31 2013-12-10 Intel Corporation Partition-free multi-socket memory system architecture
CN101833439B (en) * 2010-04-20 2013-04-10 清华大学 Parallel computing hardware structure based on separation and combination thought
CN102542525B (en) * 2010-12-13 2014-02-12 联想(北京)有限公司 Information processing equipment and information processing method
US10168923B2 (en) 2016-04-26 2019-01-01 International Business Machines Corporation Coherency management for volatile and non-volatile memory in a through-silicon via (TSV) module
US10236043B2 (en) * 2016-06-06 2019-03-19 Altera Corporation Emulated multiport memory element circuitry with exclusive-OR based control circuitry
FR3083350B1 (en) * 2018-06-29 2021-01-01 Vsora PROCESSOR MEMORY ACCESS
US10872038B1 (en) * 2019-09-30 2020-12-22 Facebook, Inc. Memory organization for matrix processing
CN117294412B (en) * 2023-11-24 2024-02-13 合肥六角形半导体有限公司 Multi-channel serial-parallel automatic alignment circuit and method based on single bit displacement

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4977582A (en) * 1988-03-31 1990-12-11 At&T Bell Laboratories Synchronization of non-continuous digital bit streams
EP0459757A2 (en) * 1990-05-29 1991-12-04 Advanced Micro Devices, Inc. Network adapter
US5923654A (en) * 1996-04-25 1999-07-13 Compaq Computer Corp. Network switch that includes a plurality of shared packet buffers
CN1249874A (en) * 1997-01-24 2000-04-05 英特拉克蒂克控股公司 Scalable low-latency switch for usage in interconnect structure

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4977582A (en) * 1988-03-31 1990-12-11 At&T Bell Laboratories Synchronization of non-continuous digital bit streams
EP0459757A2 (en) * 1990-05-29 1991-12-04 Advanced Micro Devices, Inc. Network adapter
US5923654A (en) * 1996-04-25 1999-07-13 Compaq Computer Corp. Network switch that includes a plurality of shared packet buffers
CN1249874A (en) * 1997-01-24 2000-04-05 英特拉克蒂克控股公司 Scalable low-latency switch for usage in interconnect structure

Also Published As

Publication number Publication date
CA2426422C (en) 2012-04-10
EP1360595A2 (en) 2003-11-12
WO2002033565A2 (en) 2002-04-25
CN1489732A (en) 2004-04-14
WO2002033565A3 (en) 2003-08-21
MXPA03003528A (en) 2005-01-25
AU2002229127A1 (en) 2002-04-29
CA2426422A1 (en) 2002-04-25
JP4128447B2 (en) 2008-07-30
JP2004531783A (en) 2004-10-14

Similar Documents

Publication Publication Date Title
CN1493128A (en) Class network routing
CN1020533C (en) Adaptive routing in parallel computing system
US4621359A (en) Load balancing for packet switching nodes
CN1148687C (en) Full-match search method and device for network processor
Xu et al. Efficient implementation of barrier synchronization in wormhole-routed hypercube multicomputers
Kruatrachue et al. Grain size determination for parallel processing
Panda et al. Multidestination message passing in wormhole k-ary n-cube networks with base routing conformed paths
Li et al. Efficient collective communications in dual-cube
CN1842764A (en) Computer-aided parallelizing of computation graphs
CN1317189A (en) System and method for switching packets in network
CN1493041A (en) Arithmetric functions in torus and tree networks
CN1910571A (en) A single chip protocol converter
CN100341014C (en) Scaleable interconnect structure for parallel computing and parallel memory access
CN110995598B (en) Variable-length message data processing method and scheduling device
CN1271439A (en) Method of storing elements in a database
CN1655534A (en) Double stack compatible router searching device supporting access control listing function on core routers
Talia Message-routing systems for transputer-based multicomputers
Lin et al. Adaptive multicast wormhole routing in 2D mesh multicomputers
Tsai et al. An extended dominating node approach to broadcast and global combine in multiport wormhole-routed mesh networks
Tsai et al. An extended dominating node approach to collective communication in all-port wormhole-routed 2D meshes
Mahapatra et al. Scalable global and local hashing strategies for duplicate pruning in parallel A* graph search
CN100499564C (en) Packet processing engine
Dutt et al. Scalable load balancing strategies for parallel A* algorithms
Chinn et al. Minimal adaptive routing on the mesh with bounded queue size
CN1642146A (en) Bag-preprocessing circuit assembly of interface card for high-speed network diversion equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20071003

Termination date: 20111019