WO2002033565A2 - Structure d'interconnexion adaptable autorisant un traitement parallele et l'acces a une memoire parallele - Google Patents

Structure d'interconnexion adaptable autorisant un traitement parallele et l'acces a une memoire parallele Download PDF

Info

Publication number
WO2002033565A2
WO2002033565A2 PCT/US2001/050543 US0150543W WO0233565A2 WO 2002033565 A2 WO2002033565 A2 WO 2002033565A2 US 0150543 W US0150543 W US 0150543W WO 0233565 A2 WO0233565 A2 WO 0233565A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
node
logic
interconnect
nodes
Prior art date
Application number
PCT/US2001/050543
Other languages
English (en)
Other versions
WO2002033565A3 (fr
Inventor
John Hess
Coke S. Reed
Original Assignee
Interactic Holdings, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Interactic Holdings, Llc filed Critical Interactic Holdings, Llc
Priority to AU2002229127A priority Critical patent/AU2002229127A1/en
Priority to JP2002536883A priority patent/JP4128447B2/ja
Priority to MXPA03003528A priority patent/MXPA03003528A/es
Priority to CA2426422A priority patent/CA2426422C/fr
Priority to EP01987920A priority patent/EP1360595A2/fr
Publication of WO2002033565A2 publication Critical patent/WO2002033565A2/fr
Publication of WO2002033565A3 publication Critical patent/WO2002033565A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • G06F15/17368Indirect interconnection networks non hierarchical topologies
    • G06F15/17375One dimensional, e.g. linear array, ring

Definitions

  • U.S. Patent No. 5,996,020 and U.S. Patent No. 6,289,021 describe high bandwidth low latency interconnect structures that significantly improve data flow in a network. What is needed is a system that fully exploits the high bandwidth low latency interconnect structures by supporting parallel memory access and computation in a network.
  • processors are capable of accessing the same data in parallel using several innovative techniques.
  • First, several remote processors can request to read from the same data location and the requests can be fulfilled in overlapping time periods.
  • several processors can access a data item located at the same position, and can read, write, or perform multiple operations on the same data item overlapping times.
  • Third, one data packet can be multicast to several locations and a plurality of packets can be multicast to a plurality of sets of target locations.
  • packet refers to a unit of data, preferably in serial form.
  • packets include Internet Protocol (JP) packets, Ethernet frames, ATM cells, switch-fabric segments that include a portion of a larger frame or packet, supercomputer inter-processor messages, and other data message types that have an upper limit to message length.
  • JP Internet Protocol
  • Ethernet frames Ethernet frames
  • ATM cells switch-fabric segments that include a portion of a larger frame or packet
  • switch-fabric segments that include a portion of a larger frame or packet
  • supercomputer inter-processor messages and other data message types that have an upper limit to message length.
  • the system disclosed herein solves similar problems in communications when multiple packets arriving at a switch access data in the same location.
  • PRAMs parallel random access memories
  • FIFO first-in-first-out
  • FIGURE 1 is a schematic block diagram showing an example of a generic system constructed from building blocks including a plurality of network interconnect structures.
  • FIGURE 2 is a schematic block diagram illustrating a parallel memory structure such as a parallel random access memory (PRAM) that is constructed using network interconnect structures as fundamental elements.
  • PRAM parallel random access memory
  • FIGURE 3 is a diagram of the bottom level of the top switch showing connections to a communication ring, a plurality of logic modules, a circulating FIFO data storage ring, and connections to the top level of the bottom switch.
  • FIGURES 4A, 4B and 4C are block diagrams that depict movement of data through the communication ring and the circulating FIFO data storage ring.
  • FIGURE 4A applies to both READ and WRITE requests.
  • FIGURES 4B and 4C apply to a READ request in progress.
  • FIGURE 5 illustrates a portion of the interconnect structure while executing two read operations, reading from the same circulating data storage ring occurring at overlapping time intervals and entering a second switch where the read data are directed to different targets.
  • FIGURE 6 illustrates a portion of the interconnect structure while executing a WRITE instruction.
  • FIGURE 7 is a schematic block diagram that illustrates a structure and technique for performing a multicast operation using indirect addressing.
  • a schematic block diagram illustrates an example of a generic system 100 constructed from building blocks including one or more network interconnect structures.
  • the generic system 100 includes a top switch 110 and a bottom switch 112 that are formed from network interconnect structures.
  • the term "network interconnect structure" may refer to other interconnect structures.
  • Other systems may include additional elements that are formed from network interconnect structures.
  • the generic system 100 depicts various components that may be included as core elements of a basic exemplary system. Some embodiments include other elements in addition to the core elements. Other elements may be included such as: 1) shared memory, 2) direct connections 130 between the top switch and the bottom switch; 3) direct connections 140 between bottom switch and the I/O, and 4) a concentrator connected between the logic units 114 and the bottom switch 112.
  • the generic system 100 has a top switch 110 that functions as an input terminal for receiving input data packets from input lines 136 or buses 130 from external sources and possibly from the bottom switch, and distributing the packets to dynamic processor-in-memory logic modules (DPM) 114.
  • the top switch 110 routes packets within the generic system 100 according to communication information contained within the packet headers. The packets are sent from the top switch 110 to the DPLM modules 114. Control signals from the DPLM modules 114 to the top switch 110 controls timing of packet injection to avoid collisions. Collisions that could otherwise occur with data in the DPLMs or with data in the bottom switch are prevented.
  • the system may pass information to additional computational, communication, storage, and other elements (not shown) using output lines and buses 130, 132, 134 and 136.
  • Data packets enter the top switch 110 and proceed to the target DPEVIs 114 based on an address field in each packet.
  • Information contained in a packet may be used, possibly in combination with other information, to determine the operation performed by the logic DPPMs 114 with respect to data contained in the packet and in the DPLM memory.
  • information in the packet may modify data stored in a DPEVI memory, cause information contained within the DPLM memory to be sent through the bottom switch 112, or cause other data generated by a DPLM logic module to exit from the bottom switch. Packets from the DPLM are passed to the bottom switch.
  • Another option in the generic system 100 is the inclusion of computation units, memory units, or both.
  • Computational units 126 can be positioned to send data packets through I/O unit 124 outside system 100, or to the top switch 110, or both. In the case of the bottom switch sending a packet to the top switch, the packet can be sent directly, or can be sent through one or more interconnect modules (not shown) that handle timing and control between integrated circuits that are subcomponents of system 100.
  • Data storage in one example of the system has the form of first-in-first-out (FIFO) data storage rings R in DPLM 114, and conventional data storage associated with computation units (CUs) 126.
  • a FIFO ring is a circularly-connected set of single-bit shift registers.
  • a FIFO ring includes two kinds of components. In a first example that is conventional, the FIFO ring includes single-bit shift registers that are connected only to the next single-bit shift register to form a simple FIFO 310. In a second example, other shift registers of the ring are single-bit or multiple-bit registers contained within other elements of the system, such as logic modules 114. Taken together, both kinds of components are serially connected to form a ring.
  • the total length F L of a FIFO ring can be 200 bits with 64 bits stored in a plurality of logic modules L and the remaining 136 bits stored in serially connected registers of the FIFO.
  • a system-wide clock is connected to the FIFO elements and shift registers and causes data bits to advance to the next position in a "bucket-brigade" fashion.
  • a cycle period is defined to be the time in clock periods for data to complete precisely one cycle of a FIFO ring.
  • the integer value of the cycle period is the same as the length in components of the FIFO ring. For example, for a ring of 200 components (length 200), the cycle period is 200 system clock periods.
  • the system may also include local timing sources or clocks that step at a different rate.
  • all FIFO rings in the system have the same length, or vary at integer multiples of a predetermined minimum length.
  • a ring is a bus structure with a plurality of parallel paths with the amount of data held in the ring being an integer multiple of the ring length F .
  • a top switch is capable of handling packets having various lengths up to a system maximum length. In some applications, the packets may all have the same length. More commonly, packets having different lengths may be input to the top switch.
  • the length of a given packet is P L , where P is not larger than F L .
  • the bottom switch can handle packets of various lengths.
  • Typical embodiments of the generic system 100 generate data having different bit lengths according to the functions and operation of the DPLM logic modules 114 and CUs 126.
  • the DPLMs can function independently or there can be a plurality of systems, not shown, that gather data from the DPLMs and may issue data to the DPLMs or to other elements contained inside or outside of system 100.
  • FIGURE 2 a schematic block diagram illustrates an example of a parallel random access memory (PRAM) system 200 constructed from fewer building blocks than were included in FIGURE 1.
  • the PRAM system includes a top switch 110, a concentrator 150, and a bottom switch 112, which are formed from network interconnect structures.
  • the system also includes DPLMs 114 that store data.
  • the DPLM units are typically capable of performing READ and WRITE functions, thus the system can be used as a parallel random access memory.
  • a data packet entering the top switch 110 has a form as follows:
  • the number of bits in the PAYLOAD field is designated PayL.
  • the number of bits in OP2 and OP1 are designated OP2L and OP1L, respectively.
  • the number of bits in AD2 and ADl are designated AD2L and ADIL, respectively.
  • the BIT field is a single bit in length in preferred embodiments.
  • the BIT field enters the switch first, and is always set to 1 to indicate that a packet is present.
  • the BIT field is also described as a "traffic bit".
  • the ADl field is used to route the packet through the top switch to the packet's target DPLM.
  • the top switch 110 can be arranged in a plurality of hierarchical levels and columns with packets passing through the levels. Each time the packet enters a new level of the top switch 110, one bit of the ADl field is removed and the field is thereby shortened. System 200 uses the same technique. When the packet exits the top switch 110, no ADl field bits remain. Thus, the packet leaves the top switch having the form, as follows: PAYLOAD I OP2
  • FIGURE 3 is a schematic block diagram illustrating an example of a DPLM unit 114 and showing data and control connection paths between the DPIM and top 110 and bottom 112 switches.
  • FIGURE 3 illustrates four data interconnect structures Z, C, R and B.
  • Interconnect structure Z can be a FIFO ring located in the top switch 110.
  • the interconnect structures C and R are FIFO rings located in the DPLM module.
  • the DPLMs send data directly to the bottom switch.
  • interconnect structure B is a FIFO ring.
  • the DPLVIs send data to a concentrator that then sends data to the bottom switch.
  • FIGURES 1 and 7 illustrate systems that do not include concentrators.
  • FIGURES 2, 3, 4A and 5 illustrate systems that contain concentrators.
  • the DPLM module includes a packet-receiving ring C 302 referred to as a "data communication ring” and one or more "data storage rings” R 304.
  • FIGURE 3 illustrates a DPLM with a single data storage ring R.
  • Each of the structures Z, C, R and B are FIFOs that include interconnected single bit FIFO nodes. Some of the nodes in the structure have a single data input port and a single data output port and are interconnected to form a simple multi-node FIFO.
  • a DPLM can contain multiple logic modules capable of sending data to multiple input ports in interconnect structure or FIFO B. Data from a DPLM can be injected into multiple rows into the top level of the system B.
  • the number of DPLMs may be the same as the number of memory locations, where each DPLM has a single storage ring R that contains one word of data.
  • a DPLM unit may contain a plurality of storage rings R. A particular storage ring can be identified by a portion of the address ADl field or by a portion of the operation OP1 field.
  • the timing of packet movement is synchronized in all four rings. As packets circulate in the rings, the packets are aligned with respect to the BIT field. As an advantageous consequence of the alignment, ring C sends control signal 328 to ring Z that either permits or prevents a node in Z from sending a packet to C. Upon receiving permission from a node 330 on ring C, a node 312 on ring Z can send a packet to logic module L such that logic module L is positioned to process the packet immediately in bit-serial manner. Similarly, packets circulating in data storage ring R are synchronized with ring C so that the logic module L can advantageously process respective bits as packets circulate in the respective rings.
  • the data storage rings R function as memory elements that can be used in several novel applications that are described hereinafter.
  • a separate data communication ring (not shown) between nodes of ring Z and logic modules L can be used for inter-chip timing and control where the DPLMs are not on the same chip as the top switch.
  • Data in a storage ring R may be accessed from the top switch 110 by a plurality of packets, aligned and overlapping with portions of the packets in the Z ring 306 of the top switch, and coinciding in cycle period.
  • a plurality of logic modules 314 are associated with the data communication ring C and data storage ring R.
  • a logic module L is capable of reading data from rings C and R, performing operations on the data under some conditions, and writing to rings C and R.
  • the logic module L is further capable of sending a packet to a node 320 on FIFO 308 at the bottom switch 112 or concentrator.
  • a separate data communication ring (not shown) between the logic modules L 314 and the nodes 320 of interconnect structure B may be used for inter-chip timing and control in instances that the DPLMs are not on the same chip as the bottom switch.
  • a separate data communication ring can also be used for timing and control operations when a single device needs to access several bits of the communication ring in a single cycle period.
  • Packets enter communication ring C through the logic modules 314. Packets exit the logic modules L and enter the bottom switch through input channels at different angles.
  • all of the logic modules along rings C and R of a DPLM 114 are the same type and perform a similar logic function.
  • Other examples use a plurality of different logic module types, permitting multiple logical functions to operate upon data stored in ring R of a particular DPLM.
  • the logic modules L 314 can modify the data.
  • a logic module operates on data bits passing serially through the module from ring C and ring R, and from a node on ring Z.
  • Typical logic functions include (1) data transfer operations such as loads, stores, reads, and writes; (2) logic operations such as AND, OR, NOR, NAND, EXCLUSIVE OR, bit tests, and the like; and (3) arithmetic operations such as adds, subtracts, multiplies, divides, transcendental functions, and the like. Many other types of logic operations may be included.
  • Logic module functionality can be hardwired into the logic module or functionality can be based on software that is loaded into the logic modules from packets sent to the logic module.
  • the logic modules associated with a particular data storage ring R act independently.
  • logic module groups are controlled by a separate system (not shown) that can receive data from a group of logic modules.
  • the logic module groups are controlled by a logic module control system.
  • the logic module control systems perform control instructions on data received from the logic modules.
  • each DPLM includes one ring R and one ring C.
  • a particular DPLM 114 includes multiple R rings.
  • a logic module 314 can simultaneously access data from the C ring and all of the R rings. Simultaneous access allows a logic module to modify the data on one or more of the R rings based on the content of R rings and also based on the content of the received packet and associated communication ring C.
  • a typical function performed by the logic modules is execution of an operation designated in the OP1 field that operates on data held in the PAYLOAD field of the packet in combination with data held in the ring R.
  • operation OP1 may specify that data in the PAYLOAD field of the packet be added to data contained in ring R located at address ADl . The resulting sum is sent to the target port of the bottom switch at address AD2.
  • the logic module can perform several operations. For example, the logic module can leave data in ring R 304 unchanged. The logic module can replace data in ring R 304 with contents of the PAYLOAD field. Alternatively, logic module L can replace data held in the PAYLOAD field with the result of a function operating on contents previously within ring R 304 and the PAYLOAD field.
  • a memory FIFO can store program instructions as well as data.
  • a generic system 100 that includes more than one type of logic module 314 associated with a communication ring C and a storage ring R may use one or more bits of the OP1 field to designate a specific logic module that is used in performing an operation.
  • multiple logic modules perform operations on the same data.
  • Efficient movement of data packets through the generic system 100 depends on timing of the data flow.
  • buffers (not shown) associated with the logic module help maintain timing of data transfer. In many embodiments, timing is maintained without buffering data.
  • the interconnected structure of the generic system 100 advantageously has operational timing that results in efficient parallel computation, generation, and access of data.
  • a generic system 100 composed of multiple components including at least one switch, a collection of data storage rings 304, and associated logic modules 314 can be used to construct various computing and communication switches.
  • Examples of computing and communication switches include an IP packet router or switch used in an Internet switching system, a special purpose sorting engine, a general-purpose computer, or many parallel computational systems having general purpose or specific function.
  • FIGURE 2 a schematic block diagram illustrates a parallel random-access memory (PRAM) that is constructed using network interconnect structures as fundamental elements.
  • the PRAM stores data that can be accessed simultaneously from multiple sources and simultaneously sent to multiple destinations.
  • the PRAM has a top switch 110 and may or may not have communication rings that receive packets from the target ring of the top switch 110. In interconnect structures that have no communication ring, the ring Z passes through the logic modules.
  • the top switch 110 has T output ports 210 from each of the target rings.
  • the number of address locations will be greater than the number of system I O ports.
  • a PRAM system may have 128 I O ports that access 64K words of data stored in DPLMs.
  • the ADl field is 16 bits long to accommodate 64K DPLM addresses 114.
  • the AD2 field is 8 bits long to accommodate the 128 output ports 204, where 7 bits hold the address, and 1 bit is the BIT2 portion of the address.
  • the top switch has 128 input ports 202, and 64K Z rings (not shown) each with multiple connections to a DPLM unit via output ports 206.
  • Concentrator 150 has 64K (65,536) input ports 208 and 128 output ports 210.
  • the bottom switch 112 has 128 input ports and 128 output ports 204.
  • the concentrator follows the same control timing and signaling rules for input and output as the top and bottom switches and the logic modules.
  • a top switch may have fewer output Z rings and associated DPEVI units.
  • the DPEVI units can contain multiple R rings so that the same total data size remains unchanged.
  • the illustrated PRAM shown in FIGURE 2 includes DPEVI units 114 containing logic modules 314 that connect directly to communication ring C 302 and storage ring R 304.
  • DPIM units 114 connect to packet concentrator 150 that feeds output data into bottom switch 112.
  • nodes 330 on ring C send control signals to nodes 312 on ring Z of the top switch, permitting individual nodes 312 of the ring Z to send a packet to the logic modules L.
  • logic module L may perform one of several actions. First, the logic module L can begin placing the packet on the C ring. Second, the logic module L can begin to use the data in the packet immediately. Third, logic module L can immediately begin to send a generated packet into concentrator 150 without placing the packet on the C ring. A logic module Li can begin to place a packet P on the C ring.
  • Logic modules can insert data to either the C ring or the R ring, or can send data to the concentrator 150. Control of a packet entering the concentrator is aided by control signals on line 324 from the concentrator.
  • Logic modules 314 associated with a ring R 304 may include additional send and receive interconnections to an auxiliary device (not shown) that can be associated with the ring R.
  • the auxiliary device can have various structures and perform various functions depending on the purpose and functionality of the system.
  • an auxiliary device is a system controller.
  • PRAM 200 has DPLMs containing logic modules 314 that all have the same logic type and perform the same function.
  • a first DPEVI S at a particular address may have logic modules of different type and function.
  • a second DPEVI T may have a logic modules of the same or different types in comparison to the first DPLM S.
  • one data word is stored in a single storage ring R. As data circulates in ring R, the logic modules may modify the data. In the PRAM, the logic modules alter the contents of the storage ring R, which may store program instructions as well as data.
  • the PRAM stores and retrieves data using packets defined to include fields, defined as follows:
  • the BIT field set to 1 to indicate that a packet is present enters the generic system 100.
  • the ADl field designates the address of a specific DPEVI, which includes a data storage ring R 304 containing the desired data.
  • the top switch routes the packet to a DPLM(ADl) specified by address ADl .
  • the OP1 field is a single bit that designates the operation to be executed. For example, a logic value 1 specifies a READ request and a logic value 0 specifies a WRITE request.
  • a READ request the receiving logic module in the DPLM at location ADl sends data stored on ring R to address AD2 of the bottom switch 112.
  • the PAYLOAD field of the packet is placed on the ring R at address ADl .
  • AD2 is an address designation that is used to route data through the bottom switch 112 only in a READ request and specifies the location to which the content of the memory is sent.
  • OP2 optionally describes the operation that a device at address AD2 is to perform on the data sent to the AD2 device. If operation OP1 is a READ request, the logic module that executes the READ operation does not use the PAYLOAD field.
  • the PRAM includes only a single type of logic module - a type that executes both READ and WRITE operations.
  • other types of logic modules are used, including types with separate READ elements and WRITE elements.
  • the illustrative PRAM 200 begins an operation by receiving a packet into the top switch 110 at a suitable time.
  • the packet P is routed through the top switch and arrives at a target ring Z located at address ADl .
  • a node S (not shown) and a node T (not shown) are defined to describe message timing.
  • the node S is defined as a node 330 of ring R; and a node T is defined as a node 312 of the ring Zj so that the node S is positioned to send a control signal to the node T on control line 328.
  • a node S 330 of ring Rj determines the occurrence of a timing bit arrival time at node S. If a timing bit with value 1 arrives at node S at a timing bit arrival time, then node S sends a blocking signal on line 328 to node T 312 on ring Z, prohibiting node T from sending a packet down a line 326 to a logic unit L. If node S does not receive a bit with value 1 at a timing bit arrival time, then no message is entering node S from node C and node S sends a non-blocking control signal to node T.
  • the global timing is such that the control signal arrival time at node T is concurrent with a message arrival time at node T from ring Z or from a node U positioned one level above ring Z in the top switch.
  • the packet exits the top switch 110 from node 312 on path 326 to the logic module.
  • the logic module may place the packet on the communication ring C 302, or may process the packet immediately without placing the packet on ring C.
  • the packet P has the form:
  • the packet P travels down line 326 from ring Z to logic module L.
  • a node Nz on ring Z sends a control signal to inform a higher-level node W in the top switch of a non-blocking condition at node Nz-
  • the control signal grants node W the right to route data to node Nx positioned to receive data from node Nz-
  • Logic module L operates on packets arriving on line 326 and packets arriving on ring C in the same way with respect to timing. Packet P enters logic module L, which parses and executes the command in the OP1 field.
  • communication ring C has the same length as the storage ring R. Bits travel through rings C and R in a bit-serial manner at a rate governed by a common clock.
  • the first bit of the PAYLOAD field of the packet is aligned with the first bit of the DATA field in ring R. Therefore, in the case of a READ request, data in ring R can be copied into the payload section of the packet In the case of a WRITE request, the data in the payload section of the packet can be transferred from the packet to storage ring R
  • a packet P has the form
  • a logic module of the DPEVI at address AD 1 identifies a READ request by examining the operation code OP 1 field
  • the logic module replaces the PAYLOAD field of the packet with the DATA field from ⁇ ng R
  • the updated packet is then sent through the concentrator into the bottom switch that directs the packet to a computation unit (CU) 126 or other device at address AD2
  • the CU or other device can execute the instruction designated by operation code 2 (OP2) in conjunction with data in the PAYLOAD field
  • the packet P enters a node T 312 on ring Z Node T, in response to the timing bit of packet P entering node T and to a non-blockmg control signal from a node 330 on ring C, begins to send packet P down a data path 326 to a logic module L
  • a control signal on line 324 also has arrived at logic module L, indicating whether the concentrator 150, or bottom switch if the structure includes no concentrator, can accept a message If the control signal indicates that the concentrator cannot accept a message, then logic module L begins transferring packet P to ring C Packet P moves to the next logic module on ring C
  • one of the logic modules L on ring C receives a not busy control signal from below in the hierarchy At that time logic module L begins transferring the packet P to an input node 320 on interconnect structure B
  • the logic module strips the OP1 field from the packet and begins sending the packet on path 322 to an input node 320 of the concentrator First, the logic module sends the BIT field, followed by the AD2 field, followed by the OP2 field Timing is set so that the last bit of the OP2 field leaves the logic module at the same time that the first bit of the DATA field on storage ⁇ ng R arrives at the logic module The logic module leaves the DATA field in storage ring R unchanged, puts a copy of DATA in the PAYLOAD field of the packet sent downward, and continues sending the packet in a bit-serial manner into the concentrator Data in ring R remains unchanged The packet enters and leaves the concentrator unchanged, and enters bottom switch 112 having the form:
  • the PAYLOAD field now contains the DATA field from ring R.
  • the AD2 field is removed.
  • the packet exits output port 204 at address AD2 of the bottom switch. Upon exit, the packet has the form:
  • the OP2 field is a code that can be used in a variety of ways. One use is to indicate the operation that a bottom-switch output device performs with the data contained in the PAYLOAD field.
  • the interconnected structures of the PRAM inherently have a circular timing that results in efficient, parallel generation and access of data.
  • a plurality of external resources at different input ports 202 may request READ operations for the same DATA field at a particular DPEVI 114.
  • a plurality of READ requests can enter a particular target ring Z 306 of the top switch at different nodes 312, and subsequently enter different logic modules L of the target DPEVI.
  • the READ requests can enter different logic modules on ring C during the same cycle period.
  • Communication ring C 320 and memory ring R 304 are always synchronized with regard to the movement of packets in the target ring Z of the top switch, and input interconnect structure B of the concentrator.
  • a READ request always arrives at a logic module at the correct time for the data from ring R to be appended in the proper PAYLOAD location of the forwarded packet.
  • the advantageous result is that multiple requests for the same data in ring R can be issued at the same time.
  • the same DATA field is accessed by a plurality of requests.
  • the data from ring R is sent to multiple final destinations.
  • the plurality of READ operations execute in parallel and the forwarded packets reach a plurality of output ports 204 at the same time.
  • the multiple READ requests are executed in overlapping manner by simultaneously reading from different locations in ring R by different logic modules.
  • other multiple READ requests are executed in the same cycle period at different addresses of the PRAM memory.
  • FIGURES 4A, 4B, and 4C illustrate timing for a single READ.
  • Storage ring R is the same length as the communication ring C.
  • Ring R contains circulating data 414 of length PayL.
  • Remaining storage elements in ring R contain zeroes, or "blanks," or are ignored and can have any value.
  • the BLANK field 412 is the set of bits that are not contained in the DATA field 414.
  • a logic module contains at least two bits of the set of shift registers constituting ring C, and at least two bits of the shift registers constituting ring R.
  • the DPEVI 314 contains a plurality of logic modules 314.
  • a logic module is positioned to read two bits of the communication ring 302 in a single clock period.
  • the logic module examines the BIT field and the OPl field. Li the illustrated embodiment, the logic module reads the entire OPl field and the BIT field together. In other embodiments, the OPl and BIT fields may be read in multiple operations.
  • an unblocked logic module 314 sends the packet into the concentrator or bottom switch at the correct time to align the packet with other bits in the input of the concentrator or bottom switch.
  • a blocked logic module places the packet on ring C where the packet will move to the next logic module.
  • the next logic module may be blocked or unblocked. If a subsequent logic module is blocked, the blocked logic module similarly sends packet on ring C to the next module. If the packet enters the right-most logic module LR that is blocked, then logic module LR sends the packet through the FIFO on ring C. Upon exiting the FIFO the packet enters the left-most logic module. The packet circulates until the packet encounters a logic module that is unblocked.
  • the length of ring C is set so that a circulating packet always fits completely on the ring. Alternatively stated, the packet length, P , is never greater than the ring length, F L .
  • a packet has the form:
  • Address field ADl indicates the target address of the ring R 304 that contains the desired data.
  • Operation field OPl indicates a READ request.
  • Address field AD2 is the target address of the output port 204 of the bottom switch where the result are sent.
  • Operation code OP2 designates a function to be performed by the output device.
  • the output device is the same as the input device.
  • a single device is connected to an input 202 and output 204 port of the PRAM.
  • the PAYLOAD field is ignored by the logic module and may have any value.
  • the PAYLOAD field contains data to be placed on ring R 304 associated with the DPEVI at address ADl .
  • the altered packet leaving the logic module has the form:
  • FIGURES 4A, 4B, and 4C illustrate timing coordination between communication ring C, data storage ring R, and the concentrator B.
  • a logic module 314 is capable of reading multiple bits at one time.
  • logic module L receives only one bit per clock period.
  • the concentrator B includes a plurality of input nodes 320 on a FIFO 308 that can accept a packet from a logic module.
  • a logic module is positioned to inject data into the top level of the concentrator through input port 322.
  • BIT field 402 is set to 1 and arrives at the logic module at the same time as the first bit, B 0 408, of the BLANK field 412 on the data ring R. Relative timing of circulating data is arranged so that the first bit of DATA in ring R is aligned (as shown by line 410) with the first bit of the payload field of the request packet in ring C.
  • a global packet-arrival-timing signal informs node 316 of a time when packets may enter. If a packet already in the concentrator enters the node 316, then node 316 sends a blocking signal on path 324 to a logic module connected to the node 316. In response to the blocking signal, logic module L forwards a READ request packet into communication ring C, as described hereinbefore. If no blocking signal arrives from below in the hierarchy, then logic module L sends a packet on line 322 to an input node 320 in the concentrator B downstream from the node 316.
  • the logic module has sufficient information to determine that the logic module has received a READ request and that the request is not blocked from below.
  • the logic module examines the BIT and OPl fields, and responds to three conditions: No busy signal is received on line 324 from below,
  • Logic module L sends the BIT field down line 322 to an input port of the concentrator. After the shift, C-ring registers contain the second and third bits of the packet, the single-bit OPl field and the first bit of the AD2 field, respectively.
  • the logic module also contains the second and third bits, Bl and B2, of the BLANK field of ring R.
  • the packet from ring Z may have entered a logic module (not shown) to the left of the logic module illustrated.
  • the packet is therefore not wholly contained within ring C.
  • the remainder of the packet is within the top switch 110 or may remain in the process of wormholing from an input port through the top switch and exiting from ring Z, while still entering logic module L 314.
  • FIGUREs 4A, 4B and 4C show the READ request packet entirely contained on ring C for ease of understanding.
  • Logic module L sends output data in two directions. First, the logic module L returns a zeroed packet back to ring C. Second, the logic module L sends the DATA field downward. All bits sent to ring C are set to zero 430 so that subsequent logic modules on ring C do not repeat the READ operation. Alternatively stated, the request packet is cleared from the communications ring C when a logic module L successfully processes the request, advantageously allowing other logic modules on the same ring an opportunity to accept other request packets during the same cycle period. Packets are desirably processed in wormhole fashion by logic modules, and a plurality of different request packets can be processed by a particular DPIM during one cycle period.
  • the first bit of the payload is in a position to be replaced by zero by L and the first data bit Di on ring R is positioned to be sent to the bottom switch or to a concentrator that transfers data to the bottom switch.
  • the process continues as shown in FIGURE 4C.
  • the logic module sends a second DATA bit D 2 to the concentrator while the logic module reads a third DATA bit D 3 from the data ring R.
  • the entire packet has been removed from the communication ring R, and a packet has the form:
  • the packet is sent to the input port 320 of the concentrator or to the bottom switch.
  • DATA is copied from the DATA field of ring R to the concentrator.
  • DATA field 414 in data ring R is left unchanged.
  • logic modules LI 504 and L2 502 execute simultaneous
  • READ requests Different request packets PI and P2 are generally sent from different input ports 202 and enter the top switch, resulting in processing of a plurality of READ requests in a wormhole manner in a single DPEVI. All requests in the illustrative example are for the same PRAM address, specified in the ADl field of the respective requesting packets. Packets PI and P2 reach different logic modules LI and L2, respectively, in the target DPLM. The respective logic modules process the requests independently of one another. In the illustrative example, the first-arriving READ request P2 is processed by module L2 502. Module L2 has previously read and processed the BIT field, the OPl field, and five bits of the AD2 field.
  • Module L2 has previously sent the BIT field and 4 bits of the AD2 field into input node 512 of the concentrator.
  • module LI has previously read and processed two bits of the AD2 field of packet PI, and sent the first AD2 bit into node 514 below.
  • the AD2 fields of the two respective packets are different, consequently the DATA field 414 is sent to two different output ports of the bottom switch. Processing of the two requests occurs in overlapped manner with the second request occurring only a few clock periods behind the first request.
  • the DPEVI has T logic modules and can potentially process T READ requests in the same cycle period. As a result of processing a READ request, a logic module always puts zeros 430 on ring C.
  • Wormhole routing of requests and responses through the top and bottom switches, respectively, allows any input port to send request packets at the same time as other input ports.
  • any input port 202 may send a READ request to any DPIM independently of simultaneous requests being sent from other input ports.
  • PRAM 200 supports parallel, overlapped access to a single database from multiple requestors, supporting a plurality of requests to the same data location.
  • the ADl field of a packet is used to route the packet through the top switch.
  • the packet leaves node 312 of the top switch in position to enter ring C.
  • the OPl field designates a WRITE request.
  • no data is sent to the concentrator. Therefore the logic module ignores a control signal from the concentrator.
  • the logic module sends '0' to input port 320 of the concentrator to convey information that no packet is being sent.
  • a WRITE request at ring Z is always allowed to enter the first logic module encountered on ring C.
  • FIGURE 6 illustrates a WRITE request at time T - K+5.
  • the WRITE packet on ring C and the data in the ring R rotate together in synchronization through a logic module.
  • the last bit of the OP2 field is discarded by the logic module at the same time the logic module is aligned with the last bit of the BLANK field of storage ring R.
  • logic module L removes the first bit from the ring C and places the first bit in the DATA field of ring R. The process continues until the entire PAYLOAD field is transferred from the communication ring to the DATA field of ring R.
  • Logic module L zeroes the packet, desirably removing the packet from ring C so that other logic modules do not repeat the WRITE operation.
  • FIGURE 6 illustrates the data packet during movement from ring C to ring R.
  • Data typically arrives from the top switch. More specifically, data is disseminated over the top switch.
  • a request to a DPIM can be a request to send the largest value in all of R rings, or a request to send the sum of the values in a subset of the R rings.
  • a DPEVI request can be a request to send each copy of a word containing a specified sub-field to a computed address, therefore allowing an efficient search for certain types of data.
  • the BLANK field is ignored, and can have any value.
  • the BLANK field can be defined to assist various operations.
  • the BLANK field is used for a scoreboard function.
  • a system includes N processors with the number of processors N less than B L . All N processors must read the DATA field before the DATA field is allowed to be overwritten. When a new DATA value is placed in storage ring R, the BLANK field is set to all zeros. When a processor W of the N processors reads the data, then bit W of BLANK is set to 1. Only when the proper N-bit sub-field of
  • BLANK is set to the all-one condition can the DATA portion of the ring R be overwritten.
  • the BLANK field is reset back to all zeros.
  • the scoreboard function is only one of many types of BLANK field use. Those having ordinary skill in the art will be able to effectively use the BLANK field for many applications in computing and communications.
  • a computation logic module 314 sends a signal to a local counter (not shown) upon receipt of a READ request entry.
  • a local counter not shown
  • No two computation logic modules in a single DPLM receive the first bit of a read packet at the same time, so that a common DPEVI bus (not shown) is conveniently used to step a counter connected to all logic modules.
  • the counter can respond to all of the computation logic modules so that when the "leaky bucket runs over" all of the proper logic modules are notified, and respond to the information by modifying the AD2 and OP2 fields to generate a suitable reply to the proper destination.
  • FIGURE 1 a schematic block diagram illustrates a computational engine 100 that is constructed using network interconnect structures as fundamental elements.
  • Various embodiments of the computational engine include core elements of the generic system 100 described in the discussion of FIGURE 1.
  • a bottom switch 112 sends packets to computational units 126 including one or more processors and memory or storage.
  • computational logic modules associated with ring R execute part of the overall computing function of the system.
  • Computational units 126 that receive data from the bottom switch 112 execute additional logical operations.
  • the logic modules execute both conventional and novel processor operations depending on the overall function desired for the computational engine.
  • a first example of a system 100 is a scaleable, parallel computational system.
  • the system executes a parallel SORT that includes a parallel compare suboperation of the SORT operation.
  • a logic module L accepts a first data element from a packet and a second data element from storage ring R 304.
  • the logic module places the larger of the two data elements on the storage ring R, placing the smaller value in the PAYLOAD field and sending the smaller value to a prescribed address in the AD2 field of the packet. If two such logic modules are connected serially, as shown in FIGURE 3, the second logic module can execute a second compare on the data coming from the first logic module within only a few clock cycles.
  • the compare and replace process is a common unit of work in many sorting algorithms, and one familiar with prior art can integrate the compare and replace process into a larger, parallel sorting engine.
  • the DPLMs handle bit-serial data movement and perform computations of a type that move a large amount of data.
  • a CU includes one or more processors, such as a general-purpose processor and conventional RAM. The CU effectively executes "number crunching" operations on a data set local to the CU, and generates, transmits, and receives packets.
  • One important function of the DPIMs is to supply data to the CUs in a low-latency, parallel manner, and in a form that is convenient for further processing.
  • the region contains a representation of objects in three- dimensional space, and a sub-region is a partition of the space.
  • a condition of interest is defined as a condition of a gravitational force exceeding a threshold on a body of interest.
  • DPLMs forward data from sub-regions containing data consistent with the condition of interest to the CU.
  • DPLMs are useful as bookkeepers and task schedulers.
  • One example is a task scheduler that utilizes a plurality of K computation units (CUs) in a collection H.
  • the collection H CUs typically perform a variety of tasks in parallel computation.
  • N of the K CUs are assigned a new task.
  • a data storage ring R that is capable of storing at least K bits of data, zeroes a K-long word W. Each bit location in the word W is associated with a particular CU in the collection H.
  • the CU sends a packet M to the DPEVI containing the ring R.
  • a logic module LI on data storage ring R modifies the word W by inserting 1 in the bit location associated with the CU that sends the packet M.
  • Another logic module L2 on data storage ring R tracks the number of ones in the word W.
  • word W has N bits
  • the N idle CUs in H begin new tasks.
  • the new tasks are begun by multicasting a packet to the N processors. An efficient method of multicasting to subcollection of a collection H is discussed hereinbelow.
  • FIGURE 7 a schematic block diagram illustrates a structure and technique for performing a multicast operation using indirect addressing. Multicasting of a packet to a plurality of destinations designated by a corresponding address is a highly useful function in computing and communication applications. A single first address points to a set of second addresses. The second addresses are destinations for multicast copies of the packet payload.
  • an interconnect structure system has a collection C of output ports with the property that, under some conditions, the system sends a predetermined packet payload to all output ports in the collection C 0 .
  • Each of the collections C 0 , C b C 2 , .... C. M is a set of output ports so that for a particular integer N less than J, all ports in a set C N can receive the same particular packet as a result of a single multicast request.
  • a multicasting interconnect structure 700 stores a set of output addresses of the set C in a storage ring R 704.
  • Each of the rings has a capacity of FMAX addresses.
  • a bottom switch includes 64 output ports.
  • the output port address can be stored in a 6-bit binary pattern.
  • Ring R includes five fields 702 labeled F 0 , Fi, F 2 , F 3 and F 4 that hold output port locations in the collection C N .
  • Each of the fields is seven bits in length. The first bit in the seven-bit field is set to 1 if a location of C N is stored in the next six bits of the field. Otherwise, the first bit is set to 0.
  • At least two types of packets can arrive at multicast logic module, MLM 714, including MULTICAST READ and MULTICAST WRITE packets.
  • a first type of packet, PW has an OPl field that signifies a MULTICAST WRITE operation.
  • the WRITE packet arrives at communication ring 302 and has the form:
  • FIGURE 7 illustrates a logic module that is connected to a special hardware DPIM 714 supporting a multicast capability.
  • the system performs an operation where fields F 0 , Fj, F 2 , F 3 , and F are transferred from rings Z and C to a data storage ring R 304.
  • Operation code field OPl follows the BIT field. In the MULTICAST WRITE operation, OPl indicates that the payload is to be transferred from the packet to the storage ring, replacing any data that is currently on the storage ring. Data is transferred serially from the MLM to storage ring R.
  • data is transferred through a rightmost line 334.
  • Data arrives in the correct format and at the proper time and location to be placed on the storage ring 704.
  • a control signal on line 722 from the bottom switch to the MLM may be ignored.
  • Another type of packet, PR signifying a MULTICAST READ request, can arrive at communication ring 302, and has the form:
  • the BLANK section in the example, is six bits in length.
  • the BLANK field is replaced with a target address from one of the fields of C N .
  • the OPl field may or may not be used for a particular packet or application.
  • a group of packets enters the bottom switch 112 with the form:
  • Address field AD2 originates from a ring R field.
  • Operation field OP2 and PAYLOAD originate from the MULTICAST READ packet.
  • storage ring R 704 located at a target address ADl stores three output port addresses, for example, 3, 8, and 17.
  • Output address 3 is stored in field Fo-
  • the most significant bit of address 3 appears first, followed by the next most-significant bit, and so on.
  • the standard six-bit binary pattern representing base-ten integer 3 is 00001 1.
  • the header bits are used in the order of the most significant bit to the least significant bit. Most suitably, the header bits are stored with the most significant bit stored first, so that in field Fo, the field representing the target output 3, is represented by the six bit pattern 110000.
  • the entire field F 0 including the timing bit has a seven-bit pattern 1100001.
  • field Fi stores the decimal number 8 in the pattern 0001001.
  • Field F 2 stores the decimal number 17 as 1000101. Since no additional output ports are addressed, fields F 3 and F are set to all zeros, 0000000.
  • Control signals on line 722 indicate an unblocked condition at the bottom switch, allowing packets to enter the switch on line 718. If a control signal on line 722 from the bottom switch to logic module 714 indicates a busy condition, then no data is sent down.
  • a "not busy" control signal arrives at an MLM, the data field of addresses in ring R is properly positioned to generate and send responses down to reading units 708 and to the bottom switch 112.
  • the MLM begins sending a plurality of MULTICAST READ response packets to the collection C N of addresses through the bottom switch 112.
  • the system has a capability to send a MULTICAST READ packet to the DPLM at address ADl and then multicast the packet's PAYLOAD field to the multiple addresses stored in the collection C stored in ring R 704.
  • the multicasting system contains hardware that is capable of performing a large variety of computing and data storage tasks.
  • a multicast capability is attained through use of a DPLM unit 700 that is specially configured to hold and transmit multicast addresses.
  • a generalization of the multicast function described hereinabove is a specific mode in which a single packet M is broadcast to a predetermined subset of the output ports having addresses designating membership in the collection C N .
  • a bit mask indicating which members are to be sent is called a send mask.
  • addresses 3, 8, and 17 are three members of collection C N .
  • a send mask 0,0,1, 0,1 indicates that the first and third output ports in the list C N are to receive packets.
  • Response packets are multicast to output ports 3 and 17.
  • a control signal indicates whether all of the input ports are ready to receive a packet, or whether one or more input ports are blocked.
  • a list of unblocked output ports is stored.
  • the list is a mask called a block mask.
  • the value 1 in the Nth position in the send mask indicates that the Nth member of C N is desired to be sent.
  • the value 1 in the Nth position of the block mask indicates that the Nth member of C N is unblocked, and therefore is free to be sent. For a 1 value in the Nth position of both masks, the packet M will be sent to the Nth output port in the list.
  • the packet to be multicast to a subset of the output ports listed in C N for the subset indicated by the send mask has the form:
  • Address field AD2 is not used because an address normally in the AD2 field is contained in the data stored in address field ADl .
  • the BIT field and the OPl code are sent into the logic module 714 from ring C or ring Z.
  • the send mask and the block mask enter the logic module at the same time.
  • PAYLOAD is sent to address Fj if the Jth bit of the send mask is set to 1 and the Jth bit of the block mask is set to 1 as well.
  • the rest of the operation proceeds in the manner of the multicast mode without a mask.
  • the set of output ports in the collection C N is denoted p 0 , p ... p m .
  • the output ports are divided into groups that contain, at most, the number of members of C N that can be stored on a data storage ring R. In case a data ring R has five output addresses and the collection C N has nine output ports, then the first four output ports are stored in group 0, the next four output ports are stored in group 1, and the last output port is stored in group 3.
  • the output port sequence p 0 , pi, ... pg may otherwise be indexed as q 0 o, qoi, >, qo3> Qio, qn, ia. qi3, q2o- In this way the physical address of a target can be completely described by the two integers indicating group number and address field index.
  • the packet's payload carries the following information:
  • the PAYLOAD field has the form:
  • FIGURE 7 also illustrates a system for using indirect addresses in multicasting.
  • a more simple operation is indirect addressing to a single output port.
  • data storage ring R contains a single field that represents the indirect address.
  • the storage ring R of the DPEVI at address 17 contains the value 153.
  • a packet sent to address 17 is forwarded to output port 153 of the bottom switch.
  • all logic modules associated with a given ring R send data to the bottom switch 112.
  • one DPLM sends a burst of traffic while other
  • the individual rings R send packets to a group of rings B rather than the same ring.
  • the rings R send packets to a concentrator 150 that delivers the data to the bottom switch 112.
  • information in both the data storage ring R 304 and the communication ring R 302 circulates in the manner of a circularly connected FIFO.
  • One variation is a system in which information in ring R 704 is static. Data from the level zero ring in the top switch 110 can be connected to enter a static buffer. Data in the static buffer can interact in a manner that is logically equivalent to the circulating model described hereinbefore. An advantage of the static model is possibly more efficient storage of the data.
  • data X is sent to a ⁇ ng R that holds data Y
  • a computational ring C receives both data X and data Y streams as input signals, executes a mathematical function F on data X an Y, and sends the result of the computation to a target output port
  • the target may be stored in a field of ring R, or in the AD2 field of the packet Alternatively the target may be conditional based on the outcome of F(X,Y), or may be generated by another function G(X,Y)
  • multiple operations can be performed on the data X and the data Y, and results of the multiple operations can be transferred to a plurality of destinations
  • the result of function F(X,Y) is sent to the destination designated by address AD2
  • the result of function H(X,Y) can be sent to the destination designated by an address AD3 in the packet
  • Multiple operation advantageously permits system 100 to efficiently perform a wide variety of transforms in parallel

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

Selon la présente invention, de multiples processeurs peuvent accéder aux mêmes données en parallèle en mettant en oeuvre diverses techniques innovatrices. Tout d'abord, plusieurs processeurs à distance peuvent effectuer une requête de lecture depuis un même emplacement de données et les requêtes peuvent être satisfaites dans des périodes de temps se chevauchant. Ensuite, plusieurs processeurs peuvent accéder à un élément de données qui est situé au même emplacement et peut lire, écrire ou réaliser plusieurs opérations aux mêmes moments se chevauchant d'élément de données, puis un paquet de données peut être diffusé sélectivement vers plusieurs emplacements et plusieurs paquets peuvent être diffusés sélectivement vers plusieurs ensembles d'emplacements cibles.
PCT/US2001/050543 2000-10-19 2001-10-19 Structure d'interconnexion adaptable autorisant un traitement parallele et l'acces a une memoire parallele WO2002033565A2 (fr)

Priority Applications (5)

Application Number Priority Date Filing Date Title
AU2002229127A AU2002229127A1 (en) 2000-10-19 2001-10-19 Scaleable interconnect structure for parallel computing and parallel memory access
JP2002536883A JP4128447B2 (ja) 2000-10-19 2001-10-19 並列演算及び並列メモリーアクセスのためのスケーラブルなインターコネクト構造
MXPA03003528A MXPA03003528A (es) 2000-10-19 2001-10-19 Estructura de interconexion escalable para operaciones de computo paralelas y acceso paralelo a memoria.
CA2426422A CA2426422C (fr) 2000-10-19 2001-10-19 Structure d'interconnexion adaptable autorisant un traitement parallele et l'acces a une memoire parallele
EP01987920A EP1360595A2 (fr) 2000-10-19 2001-10-19 Structure d'interconnexion adaptable autorisant un traitement parallele et l'acces a une memoire parallele

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US69360300A 2000-10-19 2000-10-19
US09/693,603 2000-10-19

Publications (2)

Publication Number Publication Date
WO2002033565A2 true WO2002033565A2 (fr) 2002-04-25
WO2002033565A3 WO2002033565A3 (fr) 2003-08-21

Family

ID=24785344

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/050543 WO2002033565A2 (fr) 2000-10-19 2001-10-19 Structure d'interconnexion adaptable autorisant un traitement parallele et l'acces a une memoire parallele

Country Status (7)

Country Link
EP (1) EP1360595A2 (fr)
JP (1) JP4128447B2 (fr)
CN (1) CN100341014C (fr)
AU (1) AU2002229127A1 (fr)
CA (1) CA2426422C (fr)
MX (1) MXPA03003528A (fr)
WO (1) WO2002033565A2 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9292900B2 (en) 2008-03-31 2016-03-22 Intel Corporation Partition-free multi-socket memory system architecture
US10168923B2 (en) 2016-04-26 2019-01-01 International Business Machines Corporation Coherency management for volatile and non-volatile memory in a through-silicon via (TSV) module

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833439B (zh) * 2010-04-20 2013-04-10 清华大学 基于分合思想的并行计算硬件结构
CN102542525B (zh) * 2010-12-13 2014-02-12 联想(北京)有限公司 一种信息处理设备以及信息处理方法
US10236043B2 (en) * 2016-06-06 2019-03-19 Altera Corporation Emulated multiport memory element circuitry with exclusive-OR based control circuitry
FR3083350B1 (fr) * 2018-06-29 2021-01-01 Vsora Acces memoire de processeurs
US10872038B1 (en) * 2019-09-30 2020-12-22 Facebook, Inc. Memory organization for matrix processing
CN117294412B (zh) * 2023-11-24 2024-02-13 合肥六角形半导体有限公司 基于单比特位移的多通道串转并自动对齐电路及方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4977582A (en) * 1988-03-31 1990-12-11 At&T Bell Laboratories Synchronization of non-continuous digital bit streams
EP0804005A2 (fr) * 1996-04-25 1997-10-29 Compaq Computer Corporation Commutateur de réseau
WO1998033304A1 (fr) * 1997-01-24 1998-07-30 Interactic Holdings, Llc Commutateur a faible latence a geometrie variable, utilisable dans une structure d'interconnexion
EP0459757B1 (fr) * 1990-05-29 1999-07-28 Advanced Micro Devices, Inc. Adaptateur de réseau

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4977582A (en) * 1988-03-31 1990-12-11 At&T Bell Laboratories Synchronization of non-continuous digital bit streams
EP0459757B1 (fr) * 1990-05-29 1999-07-28 Advanced Micro Devices, Inc. Adaptateur de réseau
EP0804005A2 (fr) * 1996-04-25 1997-10-29 Compaq Computer Corporation Commutateur de réseau
WO1998033304A1 (fr) * 1997-01-24 1998-07-30 Interactic Holdings, Llc Commutateur a faible latence a geometrie variable, utilisable dans une structure d'interconnexion

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9292900B2 (en) 2008-03-31 2016-03-22 Intel Corporation Partition-free multi-socket memory system architecture
US10168923B2 (en) 2016-04-26 2019-01-01 International Business Machines Corporation Coherency management for volatile and non-volatile memory in a through-silicon via (TSV) module

Also Published As

Publication number Publication date
EP1360595A2 (fr) 2003-11-12
CA2426422A1 (fr) 2002-04-25
CA2426422C (fr) 2012-04-10
AU2002229127A1 (en) 2002-04-29
JP4128447B2 (ja) 2008-07-30
CN1489732A (zh) 2004-04-14
MXPA03003528A (es) 2005-01-25
JP2004531783A (ja) 2004-10-14
CN100341014C (zh) 2007-10-03
WO2002033565A3 (fr) 2003-08-21

Similar Documents

Publication Publication Date Title
KR900006791B1 (ko) 패킷 스위치식 다중포트 메모리 n×m 스위치 노드 및 처리 방법
US5797035A (en) Networked multiprocessor system with global distributed memory and block transfer engine
Dally et al. Deadlock-free adaptive routing in multicomputer networks using virtual channels
Tamir et al. Dynamically-allocated multi-queue buffers for VLSI communication switches
JP4478390B2 (ja) クラス・ネットワーク経路指定
JP3599197B2 (ja) 待ち時間が可変の、プロセッサをメモリに接続する相互接続ネットワーク
EP1602030B1 (fr) Systeme et procede pour l'ordonnancement dynamique dans un processeur de reseau
Felperin et al. Routing techniques for massively parallel communication
US5630162A (en) Array processor dotted communication network based on H-DOTs
US7434016B2 (en) Memory fence with background lock release
Kim et al. An Evalutation of Planar-Adaptive Routing (PAR).
CA2426422C (fr) Structure d'interconnexion adaptable autorisant un traitement parallele et l'acces a une memoire parallele
US20050036445A1 (en) Processing data packets
WO1986003038A1 (fr) Ordinateur a flux d'instructions
Liu Architecture and performance of processor-memory interconnection networks for MIMD shared memory parallel processing systems
Fan et al. Turn grouping for multicast in wormhole-routed mesh networks supporting the turn model
US20050038902A1 (en) Storing data packets
Coll et al. A strategy for efficient and scalable collective communication in the Quadrics network
Jurczyk et al. Interconnection networks for parallel computers
Krichene Check for AINOC: New Interconnect for Future Deep Neural Network Accelerators Hana Krichene (), Rohit Prasad®, and Ayoub Mouhagir Université Paris-Saclay, CEA, List, 91120 Palaiseau, France
Golota et al. A universal, dynamically adaptable and programmable network router for parallel computers
Lay High-performance communication in parallel computers
SERGIO et al. Routing Techniques for Massively
Bhatt et al. The Fluent abstract machine
Wang Linked crossbar architecture for multicomputer interconnection

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: PA/a/2003/003528

Country of ref document: MX

Ref document number: 2002536883

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 2426422

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2001987920

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 018208878

Country of ref document: CN

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 2001987920

Country of ref document: EP