US8612725B2 - Multi-processor system with mesh topology routers comprising local cache storing for each data information indicating redundancy in neighbor router cache for cache management - Google Patents

Multi-processor system with mesh topology routers comprising local cache storing for each data information indicating redundancy in neighbor router cache for cache management Download PDF

Info

Publication number
US8612725B2
US8612725B2 US12/874,495 US87449510A US8612725B2 US 8612725 B2 US8612725 B2 US 8612725B2 US 87449510 A US87449510 A US 87449510A US 8612725 B2 US8612725 B2 US 8612725B2
Authority
US
United States
Prior art keywords
data
routers
processor
transferred
processor elements
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US12/874,495
Other versions
US20110173415A1 (en
Inventor
Jun Tanabe
Hiroyuki Usui
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Assigned to KABUSHIKI KAISHA TOSHIBA reassignment KABUSHIKI KAISHA TOSHIBA ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TANABE, JUN, USUI, HIROYUKI
Publication of US20110173415A1 publication Critical patent/US20110173415A1/en
Application granted granted Critical
Publication of US8612725B2 publication Critical patent/US8612725B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • G06F15/17368Indirect interconnection networks non hierarchical topologies
    • G06F15/17375One dimensional, e.g. linear array, ring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache

Definitions

  • Embodiments described herein relate generally to a multi-processor system and a data transfer method.
  • processors and the shared cache memory are connected by a network including a plurality of routers (see Japanese Patent Application Laid-Open No. 2009-54083).
  • the shared cache memory is connected to an external memory via a bridge.
  • Japanese Patent Application Laid-Open No. 2000-20489 discloses that a cache memory is provided in a communication control device, which relays data transfer between a CPU and an external apparatus, and transfer control information written by the CPU in a descriptor of a main storage unit is read out and written in the cache memory, whereby efficiency of data transfer between the CPU and the communication control device is realized.
  • the invention disclosed in Japanese Patent Application Laid-Open No. 2000-20489 is applied to the routers of the multi-processor system, the routers access the shared cache memory and the external memory to write data in the cache memory. Therefore, the problem of the increase in the loads on the routers connected to the shared cache memory is not solved.
  • FIG. 1 is a block diagram of the configuration of a multi-processor system according to a first embodiment
  • FIG. 2 is a diagram of the schematic configuration of an inter-processor network of the multi-processor system according to the first embodiment
  • FIG. 3 is a diagram of an example of the structure of data stored in an intra-router cache mechanism
  • FIG. 4 is a diagram of an example of the structure of data stored in an intra-router cache mechanism in a multi-processor system according to a second embodiment
  • FIG. 5 is a diagram of a state in which a certain processor element accesses data having a read-only attribute stored in a shared cache memory
  • FIG. 6 is a diagram of a state in which another processor element accesses the data having the read-only attribute stored in the shared cache memory
  • FIG. 7 is a diagram of a state in which still another processor element accesses the data having the read-only attribute stored in the shared cache memory
  • FIG. 8 is a diagram of a state in which a certain processor element accesses another data having the read-only attribute stored in the shared cache memory.
  • FIG. 9 is a diagram of the schematic configuration of an inter-processor network in a multi-processor system of a reference example learned by the inventor.
  • a multi-processor system includes: a plurality of processor elements; and a network that connects the processor elements.
  • the network includes: a plurality of routers that relay an access generated from each of the processor elements and data addressed to the processor element; and an access processing unit that transmits, according to the access from the processor element, target data of the access to the processor element as a request source, and each of the routers includes: a cache mechanism that stores data transferred to the other routers or the processor elements; and a transmitter that reads out, when an access generated from the processor element is transferred thereto, if target data of the access is stored in the cache mechanism, the data from the cache mechanism and transmits the data to the processor element as the request source.
  • FIG. 1 is a block diagram of the configuration of a multi-processor system according to a first embodiment.
  • the multi-processor system 1 has a configuration in which a plurality of processor elements PE 0 to PE 9 and a shared cache memory 12 are connected via an inter-processor network 11 including a plurality of routers.
  • the shared cache memory 12 includes two cache memories (M 0 and M 1 ).
  • the shared cache memory 12 is connected to an external memory 2 via a bridge 13 .
  • the schematic configuration of the inter-processor network 11 of the multi-processor system according to this embodiment is shown in FIG. 2 .
  • the inter-processor network 11 is constructed so that a mesh topology (a lattice shape) in which routers R 00 to R 23 are arranged on lattice points is applied to the inter-processor network 11 .
  • the routers R 00 to R 23 include cache mechanisms (intra-router cache mechanism C 00 to C 23 ).
  • the intra-router cache mechanisms C 00 to C 23 cache read-only data accessed by the processor elements PE 0 to PE 9 .
  • Each of the intra-router cache mechanisms C 00 to C 23 can be configured by using a static random access memory (SRAM)-type memory cell and a memory controller in the same manner as the general cache memory.
  • SRAM static random access memory
  • FIG. 3 An example of the structure of data stored in the intra-router cache mechanisms C 00 to C 23 is shown in FIG. 3 .
  • a two-way cache is shown as an example. However, the number of ways is not limited to a specific number.
  • the intra-router cache mechanisms C 00 to C 23 store data in a structure substantially the same as that in an instruction cache of the normal processor.
  • the intra-router cache mechanisms C 00 to C 23 have, in common for two ways, replace bits for specifying replacement conditions of ways.
  • the intra-router cache mechanisms C 00 to C 23 have, for each of ways, a valid bit, a tag address, and data.
  • the access travels through a path from PE 0 /PE 4 to R 00 , R 01 , R 02 , R 12 , R 22 , and M 1 .
  • read data travels from the cache memory M 1 to the processor elements PE 0 or PE 4 through a path from M 1 to R 22 , R 12 , R 02 , R 01 , R 00 , and PE 0 /PE 4 .
  • the access travels through a path from PE 2 to R 02 , R 01 , R 11 , R 21 , and M 0 .
  • read data travels from the cache memory M 0 to the processor element PE 2 through a path from M 0 to R 21 , R 11 , R 01 , R 02 , and PE 2 .
  • the intra-router cache mechanisms C 00 to C 23 operate when read accesses by the processor elements PE 0 to PE 9 are made to an area having the read-only attribute of the shared cache memory 12 .
  • Memory Management Unit (MMU) information or the like of the processor elements is sent to the routers R 00 to R 23 together with read requests, whereby the routers R 00 to R 23 determine whether read accesses are made to the area having the read-only attribute of the shared cache memory 12 .
  • MMU Memory Management Unit
  • the access is checked by the routers through which the access passes until reaching the shared cache memory 12 .
  • the router reads out the target data of the access as read data and transmits the data to the processor element as an access source.
  • the target data of the access is present in none of the intra-router cache mechanisms C 00 to C 23 of the routers through which the data passes, a read request of the processor element travels to the shared cache memory 12 (the cache memories M 0 and M 1 ). Read data is transmitted from the shared cache memory 12 .
  • the read data transmitted from the shared cache memory 12 (or read data that hits in the intra-router cache mechanisms C 00 to C 23 and is transmitted from the routers R 00 to R 23 ) is cached in the intra-router cache mechanisms C 00 to C 23 of the routers R 00 to R 23 on the path on which the read data passes.
  • FIG. 9 the schematic configuration of an inter-processor network 11 ′ in a multi-processor system of a reference example learned by the inventor is shown in FIG. 9 .
  • This is a configuration equivalent to that of the inter-processor network 11 shown in FIG. 1 .
  • accesses from processor elements PE 0 ′ to PE 9 ′ to a shared cache memory 12 ′ are concentrated on routers (R 21 ′ and R 22 ′) directly connected to the shared cache memory 12 ′. Therefore, high loads are applied to the routers R 21 ′ and R 22 ′, which are a bottleneck for an entire network.
  • a routing policy can be changed between a read access to the area having the read-only attribute of the shared cache memory 12 and other accesses (a read access and a write access to the area not having the read-only attribute of the shared cache memory 12 ).
  • the router R 11 when a routing policy in the lateral direction on the paper surface to the longitudinal direction on the paper surface shown in FIG. 2 is applied, the router R 11 is likely to relay accesses and data concerning eight processor elements (PE 0 to PE 5 , PE 7 , and PE 8 ). The router R 20 is likely to relay an access and data concerning one processor element (PE 6 ).
  • the router R 11 when a routing policy in the longitudinal direction on the paper surface to the lateral direction on the paper surface is applied, the router R 11 is likely to relay an access and data concerning one processor element (PE 1 ).
  • the router 20 is likely to relay accesses and data concerning four processor elements (PE 0 and PE 4 to PE 6 ). Therefore, it is possible to reduce a difference among loads applied to the routers R 00 to R 23 by changing a routing policy between the read access to the area having the read-only attribute and the other accesses.
  • accesses from the processor elements PE 0 to PE 9 to the external memory 2 and transfer of data from the external memory 2 to the processor elements PE 0 to PE 9 are performed via the shared cache memory 12 and the bridge 13 . Therefore, concerning the access to the external memory 2 , it is possible to relax the concentration of accesses on the routers R 21 and R 22 directly connected to the shared cache memory 12 by caching read data in the intra-router cache mechanisms C 00 to C 23 . The same holds true for a configuration in which the external memory 2 is connected not via the shared cache memory 12 .
  • the intra-router cache mechanisms are provided in the routers, read requests of the processor elements do not always reach the shared cache memory.
  • Data cached in the intra-router cache mechanism is data relayed to the other routers and the processor elements.
  • the routers do not voluntarily access the shared cache memory and cache data. Therefore, because access concentration on the routers directly connected to the shared cache memory is relaxed, it is possible to eliminate the bottleneck for the entire inter-processor network.
  • a multi-processor system is explained below.
  • the configuration of the entire multi-processor system and the schematic configuration of an inter-processor network are the same as those in the first embodiment.
  • the structure of the intra-router cache mechanisms C 00 to C 23 is different from that in the first embodiment.
  • FIG. 4 An example of the structure of data stored in the intra-router cache mechanisms C 00 to C 23 is shown in FIG. 4 .
  • a routing path bit is provided for each of entries of ways.
  • the bit is provided by the number of routers (or processor elements) that are likely to transfer read data.
  • a path to which read data of a cache entry is transferred is stored in the bit.
  • the router R 11 is likely to transfer read data to the routers R 01 , R 10 , and R 12 . Therefore, the router R 11 has three routing path bits corresponding to these routers the entries of the ways.
  • the router R 00 is likely to transfer read data to the processor elements PE 0 and PE 4 . Therefore, the router R 00 has two routing path bits corresponding to these processor elements in the entries of the ways.
  • FIGS. 5 to 8 are diagrams of operations shown in time series. The operations are performed when, after the processor elements PE 5 , PE 1 , and PE 8 access, in order, data A having the read-only attribute stored in the cache memory M 0 , the processor element PE 1 accesses data B having the read-only attribute stored in the cache memory M 0 . It is assumed that the data B is data stored in an entry same as an entry of the data A in the intra-router cache mechanisms C 00 to C 23 .
  • FIG. 5 A state in which the processor element PE 5 accesses the data A having the read-only attribute stored in the cache memory M 0 is shown in FIG. 5 .
  • a read request travels from PE 5 to R 10 , R 11 , R 21 , and M 0 and read data travels through the routers in order from M 0 to R 21 , R 11 , R 10 , and PE 5 .
  • the data A is stored in the intra-router cache mechanisms.
  • FIG. 6 A state in which the processor element PE 1 accesses the data A having the read-only attribute stored in the cache memory M 0 is shown in FIG. 6 .
  • a read request travels from PE 1 to R 01 and R 11 and a cache hits in the router R 11 . Therefore, the data A is read out from the intra-router cache mechanism C 11 and read data travels through the routers in order from R 11 to R 0 and PE 1 .
  • the data A is stored in the intra-router cache mechanism C 01 .
  • a bit corresponding to the router R 01 of the routing path bit of the way/entry in which the data A is stored is changed from “0” to “1”.
  • FIG. 7 A state in which the processor element PE 8 accesses the data A having the read-only attribute stored in the cache memory M 0 is shown in FIG. 7 .
  • a read request travels from PE 8 to R 13 , R 12 , and R 11 and a cache hits in the router R 11 . Therefore, the data A is read out from the intra-router cache mechanism C 11 and read data travels through the routers in order from R 11 to R 12 and R 13 .
  • the data A is stored in the cache mechanisms C 12 and C 13 in the routers, respectively.
  • a bit corresponding to the router R 12 of the routing path bit of the way/entry in which the data A is stored is changed from “0” to “1”.
  • all the three routing path bits of the way/entry in which the data A is stored change “1”, which indicates that the same data is cached in the routers (the routers R 10 , R 01 , and R 12 ) that can be a transfer destination of data (an all path transferred state). This means that it is unnecessary to cache the data A in the intra-router cache mechanism of the router R 11 .
  • FIG. 8 A state in which the processor element PE 1 accesses the data B having the read-only attribute stored in the cache memory M 0 is shown in FIG. 8 .
  • a read request travels from PE 1 to R 01 , R 11 , R 21 , and M 0 .
  • Read data travels through the routers in order from M 0 to R 21 , R 11 , R 01 , and PE 1 .
  • the data B is stored in the intra-router cache mechanisms C 21 , C 11 , and c 01 , respectively.
  • the embodiments are examples of implementation of the present invention.
  • the present invention is not limited to the embodiments.
  • access concentration on the routers directly connected to the shared cache memory is reduced.
  • the topology of the inter-processor network is not limited to the mesh type of the square lattice shape and can be other shapes (an arbitrary mesh type of a shape other than the square lattice shape, a hypercube type, etc.).

Abstract

According to one embodiment, each of routers includes: a cache mechanism that stores data transferred to the other routers or processor elements; and a unit that reads out, when an access generated from each of the processor elements is transferred thereto, if target data of the access is stored in the cache mechanism, the data from the cache mechanism and transmits the data to the processor element as a request source.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2010-3159, filed on Jan. 8, 2010; the entire contents of which are incorporated herein by reference.
FIELD
Embodiments described herein relate generally to a multi-processor system and a data transfer method.
BACKGROUND
In the past, in a multi-processor system including a plurality of processor elements and a shared cache memory, processors and the shared cache memory are connected by a network including a plurality of routers (see Japanese Patent Application Laid-Open No. 2009-54083). The shared cache memory is connected to an external memory via a bridge.
In such a multi-processor system, accesses by the processor elements reach the shared cache memory respectively through the several routers. In this case, because all the memory accesses are concentrated on the shared cache memory, usually, loads on the routers to which the shared cache memory is connected increases, which is a bottleneck for an entire network.
Japanese Patent Application Laid-Open No. 2000-20489 discloses that a cache memory is provided in a communication control device, which relays data transfer between a CPU and an external apparatus, and transfer control information written by the CPU in a descriptor of a main storage unit is read out and written in the cache memory, whereby efficiency of data transfer between the CPU and the communication control device is realized. However, even if the invention disclosed in Japanese Patent Application Laid-Open No. 2000-20489 is applied to the routers of the multi-processor system, the routers access the shared cache memory and the external memory to write data in the cache memory. Therefore, the problem of the increase in the loads on the routers connected to the shared cache memory is not solved.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of the configuration of a multi-processor system according to a first embodiment;
FIG. 2 is a diagram of the schematic configuration of an inter-processor network of the multi-processor system according to the first embodiment;
FIG. 3 is a diagram of an example of the structure of data stored in an intra-router cache mechanism;
FIG. 4 is a diagram of an example of the structure of data stored in an intra-router cache mechanism in a multi-processor system according to a second embodiment;
FIG. 5 is a diagram of a state in which a certain processor element accesses data having a read-only attribute stored in a shared cache memory;
FIG. 6 is a diagram of a state in which another processor element accesses the data having the read-only attribute stored in the shared cache memory;
FIG. 7 is a diagram of a state in which still another processor element accesses the data having the read-only attribute stored in the shared cache memory;
FIG. 8 is a diagram of a state in which a certain processor element accesses another data having the read-only attribute stored in the shared cache memory; and
FIG. 9 is a diagram of the schematic configuration of an inter-processor network in a multi-processor system of a reference example learned by the inventor.
DETAILED DESCRIPTION
In general, according to one embodiment, a multi-processor system includes: a plurality of processor elements; and a network that connects the processor elements. The network includes: a plurality of routers that relay an access generated from each of the processor elements and data addressed to the processor element; and an access processing unit that transmits, according to the access from the processor element, target data of the access to the processor element as a request source, and each of the routers includes: a cache mechanism that stores data transferred to the other routers or the processor elements; and a transmitter that reads out, when an access generated from the processor element is transferred thereto, if target data of the access is stored in the cache mechanism, the data from the cache mechanism and transmits the data to the processor element as the request source.
Exemplary embodiments of a multi-processor system and a data transfer method will be explained below in detail with reference to the accompanying drawings. The present invention is not limited to the following embodiments.
FIG. 1 is a block diagram of the configuration of a multi-processor system according to a first embodiment. The multi-processor system 1 has a configuration in which a plurality of processor elements PE0 to PE9 and a shared cache memory 12 are connected via an inter-processor network 11 including a plurality of routers. The shared cache memory 12 includes two cache memories (M0 and M1). The shared cache memory 12 is connected to an external memory 2 via a bridge 13.
The schematic configuration of the inter-processor network 11 of the multi-processor system according to this embodiment is shown in FIG. 2. The inter-processor network 11 is constructed so that a mesh topology (a lattice shape) in which routers R00 to R23 are arranged on lattice points is applied to the inter-processor network 11. The routers R00 to R23 include cache mechanisms (intra-router cache mechanism C00 to C23). The intra-router cache mechanisms C00 to C23 cache read-only data accessed by the processor elements PE0 to PE9. Each of the intra-router cache mechanisms C00 to C23 can be configured by using a static random access memory (SRAM)-type memory cell and a memory controller in the same manner as the general cache memory.
An example of the structure of data stored in the intra-router cache mechanisms C00 to C23 is shown in FIG. 3. A two-way cache is shown as an example. However, the number of ways is not limited to a specific number. As shown in FIG. 3, the intra-router cache mechanisms C00 to C23 store data in a structure substantially the same as that in an instruction cache of the normal processor. The intra-router cache mechanisms C00 to C23 have, in common for two ways, replace bits for specifying replacement conditions of ways. The intra-router cache mechanisms C00 to C23 have, for each of ways, a valid bit, a tag address, and data.
In this embodiment, it is assumed that routing is fixed and a path for access to a certain cache memory by a certain processor element is always uniquely determined. Accesses and read data by the processor elements first move in the inter-processor network 11 shown in FIG. 2 in a lateral direction on the paper surface and thereafter moves in a longitudinal direction on the paper surface.
As a specific example, when the processor element PE0 or PE4 read-accesses the cache memory M1, the access travels through a path from PE0/PE4 to R00, R01, R02, R12, R22, and M1. Conversely, read data travels from the cache memory M1 to the processor elements PE0 or PE4 through a path from M1 to R22, R12, R02, R01, R00, and PE0/PE4.
As another example, when the processor element PE2 read-accesses the cache memory M0, the access travels through a path from PE2 to R02, R01, R11, R21, and M0. Conversely, read data travels from the cache memory M0 to the processor element PE2 through a path from M0 to R21, R11, R01, R02, and PE2.
On the other hand, all read accesses to an area not having a read-only attribute of the shared cache memory 12 reach the shared cache memory 12. Read data read out from the shared cache memory 12 (the cache memories M0 and M1) is returned to the processor elements at request sources (transmission sources of the read accesses).
The intra-router cache mechanisms C00 to C23 operate when read accesses by the processor elements PE0 to PE9 are made to an area having the read-only attribute of the shared cache memory 12. Memory Management Unit (MMU) information or the like of the processor elements is sent to the routers R00 to R23 together with read requests, whereby the routers R00 to R23 determine whether read accesses are made to the area having the read-only attribute of the shared cache memory 12.
When any one of the processor elements accesses the area having the read-only attribute of the shared cache memory 12, the access is checked by the routers through which the access passes until reaching the shared cache memory 12. When any one of the routers that relays the access caches target data of the access in the intra-router cache mechanisms C00 to C23, the router reads out the target data of the access as read data and transmits the data to the processor element as an access source. When the target data of the access is present in none of the intra-router cache mechanisms C00 to C23 of the routers through which the data passes, a read request of the processor element travels to the shared cache memory 12 (the cache memories M0 and M1). Read data is transmitted from the shared cache memory 12.
The read data transmitted from the shared cache memory 12 (or read data that hits in the intra-router cache mechanisms C00 to C23 and is transmitted from the routers R00 to R23) is cached in the intra-router cache mechanisms C00 to C23 of the routers R00 to R23 on the path on which the read data passes.
When such an operation is performed, whereby the target data of the access hits in the intra-router cache mechanisms C00 to C23, the read access does not reach the shared cache memory 12. Therefore, it is possible to relax access concentration that occurs in the routers R21 and R22 connected to the shared cache memory 12.
For comparison, the schematic configuration of an inter-processor network 11′ in a multi-processor system of a reference example learned by the inventor is shown in FIG. 9. This is a configuration equivalent to that of the inter-processor network 11 shown in FIG. 1. In this network configuration, accesses from processor elements PE0′ to PE9′ to a shared cache memory 12′ (cache memories M0′ and M1′) are concentrated on routers (R21′ and R22′) directly connected to the shared cache memory 12′. Therefore, high loads are applied to the routers R21′ and R22′, which are a bottleneck for an entire network.
In this embodiment, a routing policy can be changed between a read access to the area having the read-only attribute of the shared cache memory 12 and other accesses (a read access and a write access to the area not having the read-only attribute of the shared cache memory 12).
For example, when a routing policy in the lateral direction on the paper surface to the longitudinal direction on the paper surface shown in FIG. 2 is applied, the router R11 is likely to relay accesses and data concerning eight processor elements (PE0 to PE5, PE7, and PE8). The router R20 is likely to relay an access and data concerning one processor element (PE6). On the other hand, when a routing policy in the longitudinal direction on the paper surface to the lateral direction on the paper surface is applied, the router R11 is likely to relay an access and data concerning one processor element (PE1). The router 20 is likely to relay accesses and data concerning four processor elements (PE0 and PE4 to PE6). Therefore, it is possible to reduce a difference among loads applied to the routers R00 to R23 by changing a routing policy between the read access to the area having the read-only attribute and the other accesses.
In the multi-processor system 1, accesses from the processor elements PE0 to PE9 to the external memory 2 and transfer of data from the external memory 2 to the processor elements PE0 to PE9 are performed via the shared cache memory 12 and the bridge 13. Therefore, concerning the access to the external memory 2, it is possible to relax the concentration of accesses on the routers R21 and R22 directly connected to the shared cache memory 12 by caching read data in the intra-router cache mechanisms C00 to C23. The same holds true for a configuration in which the external memory 2 is connected not via the shared cache memory 12.
As explained above, in the multi-processor system according to this embodiment, because the intra-router cache mechanisms are provided in the routers, read requests of the processor elements do not always reach the shared cache memory. Data cached in the intra-router cache mechanism is data relayed to the other routers and the processor elements. The routers do not voluntarily access the shared cache memory and cache data. Therefore, because access concentration on the routers directly connected to the shared cache memory is relaxed, it is possible to eliminate the bottleneck for the entire inter-processor network.
A multi-processor system according to a second embodiment is explained below. The configuration of the entire multi-processor system and the schematic configuration of an inter-processor network are the same as those in the first embodiment. However, in the second embodiment, the structure of the intra-router cache mechanisms C00 to C23 is different from that in the first embodiment.
An example of the structure of data stored in the intra-router cache mechanisms C00 to C23 is shown in FIG. 4. In this embodiment, a routing path bit is provided for each of entries of ways. The bit is provided by the number of routers (or processor elements) that are likely to transfer read data. A path to which read data of a cache entry is transferred is stored in the bit. For example, the router R11 is likely to transfer read data to the routers R01, R10, and R12. Therefore, the router R11 has three routing path bits corresponding to these routers the entries of the ways. The router R00 is likely to transfer read data to the processor elements PE0 and PE4. Therefore, the router R00 has two routing path bits corresponding to these processor elements in the entries of the ways.
An example of a change in the routing path bits is explained with reference to FIGS. 5 to 8. FIGS. 5 to 8 are diagrams of operations shown in time series. The operations are performed when, after the processor elements PE5, PE1, and PE8 access, in order, data A having the read-only attribute stored in the cache memory M0, the processor element PE1 accesses data B having the read-only attribute stored in the cache memory M0. It is assumed that the data B is data stored in an entry same as an entry of the data A in the intra-router cache mechanisms C00 to C23.
A state in which the processor element PE5 accesses the data A having the read-only attribute stored in the cache memory M0 is shown in FIG. 5. When the processor element PE5 accesses the data A, a read request travels from PE5 to R10, R11, R21, and M0 and read data travels through the routers in order from M0 to R21, R11, R10, and PE5. In the routers R21, R11, and R10, the data A is stored in the intra-router cache mechanisms.
In the intra-router cache mechanism C11 of the router R11, “1” is input to a bit corresponding to the router R10 of the routing path bit of a way/entry in which the data A is stored. “0” is input to bits corresponding to the routers R01 and R12 in other paths.
A state in which the processor element PE1 accesses the data A having the read-only attribute stored in the cache memory M0 is shown in FIG. 6. When the processor element PE1 accesses the data A, a read request travels from PE1 to R01 and R11 and a cache hits in the router R11. Therefore, the data A is read out from the intra-router cache mechanism C11 and read data travels through the routers in order from R11 to R0 and PE1. At this point, in the router R01, the data A is stored in the intra-router cache mechanism C01.
In the intra-router cache mechanism C11 of the router R11, a bit corresponding to the router R01 of the routing path bit of the way/entry in which the data A is stored is changed from “0” to “1”.
A state in which the processor element PE8 accesses the data A having the read-only attribute stored in the cache memory M0 is shown in FIG. 7. When the processor element PE8 accesses the data A, a read request travels from PE8 to R13, R12, and R11 and a cache hits in the router R11. Therefore, the data A is read out from the intra-router cache mechanism C11 and read data travels through the routers in order from R11 to R12 and R13. At this point, in the routers R12 and R13, the data A is stored in the cache mechanisms C12 and C13 in the routers, respectively.
In the intra-router cache mechanism C11 of the router R11, a bit corresponding to the router R12 of the routing path bit of the way/entry in which the data A is stored is changed from “0” to “1”. In the intra-router cache mechanism C11, at this timing, all the three routing path bits of the way/entry in which the data A is stored change “1”, which indicates that the same data is cached in the routers (the routers R10, R01, and R12) that can be a transfer destination of data (an all path transferred state). This means that it is unnecessary to cache the data A in the intra-router cache mechanism of the router R11.
A state in which the processor element PE1 accesses the data B having the read-only attribute stored in the cache memory M0 is shown in FIG. 8. When the processor element PE1 accesses the data B, a read request travels from PE1 to R01, R11, R21, and M0. Read data travels through the routers in order from M0 to R21, R11, R01, and PE1. In the routers R21, R11, and R01, the data B is stored in the intra-router cache mechanisms C21, C11, and c01, respectively.
At this point, concerning the routers R10 and R21, not all the routing path bits of the way/entry in which the data A is stored change to “1”. Therefore, when replacement of data is necessary in storing the data B, the data to be replaced is determined on a replace bit by applying a normal replace policy (least recently used (LRU), etc.) to the replacement. “1” is input to a bit corresponding to the router R11 or the processor element PE1 of a routing path bit of a way/entry in which the data B is stored. “0” is input to bits corresponding to the routers R20 and R22 or the routers R00 and R02 in other paths.
On the other hand, concerning the router R11, when the data B is stored in the intra-router cache mechanism C11, all the three routing path bits corresponding to the data A are “1”. It is known that the data A is unnecessary. Therefore, as long as a valid bit of the way in which the data A is stored is not “0” (invalid), irrespectively of the normal replace policy, the data B is always stored in the way in which the data A is stored (in other words, the data A is overwritten and erased irrespectively of the normal routing policy). “1” is input to a bit corresponding to the router R01 of the routing path bit of the way/entry in which the data B is stored. “0” is input to bits corresponding to the routers R10 and R12 in other paths.
As explained above, in this embodiment, it is determined based on the routing path bit whether the same information is cached in a transfer destination of read data. Therefore, it is possible to suppress the intra-router cache mechanisms of the routers from redundantly having the same data and effectively utilize the intra-router cache mechanisms.
The operation for changing the priority of replacement of data based on the routing path bits is explained above. However, when a predetermined percentage (e.g., the majority) of the routing path bits change to “1” in an arbitrary router, it is also possible to cause, at that point, the router to operate to transfer data to a router or a processor element in which the routing path bits are “0”. In this case, as in the above explanation, all the routing path bits change to “1” at a point when the transfer of the data is finished. Therefore, the transferred data can be preferentially overwritten and erased. In other words, it is possible to suppress the intra-router cache mechanisms of the routers from redundantly having the same data and effectively utilize the intra-router cache mechanisms.
The embodiments are examples of implementation of the present invention. The present invention is not limited to the embodiments.
For example, in the example explained in the embodiments, access concentration on the routers directly connected to the shared cache memory is reduced. However, it is also possible to relax concentration of accesses on routers directly connected to processor elements having high operation ratios compared with the external memory (the bridge) and the other processor elements.
The topology of the inter-processor network is not limited to the mesh type of the square lattice shape and can be other shapes (an arbitrary mesh type of a shape other than the square lattice shape, a hypercube type, etc.).
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.

Claims (10)

What is claimed is:
1. A multi-processor system comprising:
a plurality of processor elements; and
a network that is constructed in a mesh topology and connects the plurality of processor elements, wherein
the network includes:
a plurality of routers arranged on lattice points of the network and relay an access generated from each of the processor elements and data addressed to the processor element; and
an access processing unit that transmits, according to the access from the processor element, target data of the access to the processor element as a request source, and
each of the routers includes:
a cache mechanism that stores data transferred to the other routers or the processor elements; and
a transmitter that reads out and transmits the data to the processor element as the request source if target data of the access is stored in the cache mechanism when an access generated from the processor element is transferred to the transmitter, wherein
the cache mechanism stores, concerning each of data being stored, routing path information indicating whether the data is transferred to each of the routers and the processor elements that can be a transfer destination, and
the router changes, based on the routing path information, an operation of the cache mechanism.
2. The multi-processor system according to claim 1, wherein, when the routing path information of any one of the data being stored in the cache mechanism changes to an all path transferred state indicating that the data is already transferred to all the routers and processor elements that can be the transfer destination, the router more preferentially rewrites an entry in which the data is stored than an entry in which data having the routing path information not in the all path transferred state is stored.
3. The multi-processor system according to claim 2, wherein
the cache mechanism has a plurality of ways, and
the router rewrites an entry of a way in which data in the all path transferred state is stored.
4. The multi-processor system according to claim 2, wherein, when the routing path information is not in the all path transferred state in all the data being stored in the cache mechanism, the router determines, according to a replace policy decided in advance, an entry to be rewritten.
5. The multi-processor system according to claim 4, wherein the replace policy is least recently used.
6. The multi-processor system according to claim 1, wherein, when any one of routing path information being stored in the cache mechanism changes to a state indicating that data is already transferred to a majority of the routers and the processor elements that can be the transfer destination, the router transfers the data to the routers and the processor elements to which the data is not transferred yet.
7. The multi-processor system according to claim 6, wherein, after the data is already transferred to the majority of the routers and the processor elements that can be the transfer destination and the data is transferred to the routers and the processor elements to which the data is not transferred yet, the router more preferentially rewrites an entry in which the data is stored than an entry in which data indicated by the routing path information as not being transferred to all the routers and processor elements that can be the transfer destination is stored.
8. A data transfer method in a multi-processor system that includes:
a plurality of processor elements;
a plurality of routers arranged on lattice points of a network constructed in a mesh topology, which connect the processor elements and relay an access generated from each of the processor elements and data addressed to the processor elements; and
a shared memory shared by the processor elements, wherein the shared memory transmits target data of the access to the processor element as a request source according to an access from the processor element,
the data transfer method comprising:
storing data transferred to the other routers or the processor elements in a cache mechanism by each of the routers; and
reading out the data from the cache mechanism and transmitting the data to the processor element as the request source if target data of the access is stored in the cache mechanism when an access generated from the processor element is transferred to each of the routers, wherein each of the routers stores, in the cache mechanism, concerning each of data being stored, routing path information indicating whether the data is transferred to each of the routers and the processor elements that can be a transfer destination, and, when the routing path information of any one of the data being stored in the cache mechanism changes to an all path transferred state indicating that the data is already transferred to all of the routers and the processor elements that can be the transfer destination, preferentially rewrites the data than data having the routing path information not in the all path transferred state.
9. A data transfer method in a multi-processor system that includes:
a plurality of processor elements;
a plurality of routers arranged on lattice points of a network constructed in a mesh topology, which connect the processor elements and relay an access generated from each of the processor elements and data addressed to the processor elements; and
a shared memory shared by the processor elements, wherein the shared memory transmits target data of the access to the processor element as a request source according to an access from the processor element,
the data transfer method comprising:
storing data transferred to the other routers or the processor elements in a cache mechanism by each of the routers; and
reading out the data from the cache mechanism and transmitting the data to the processor element as the request source if target data of the access is stored in the cache mechanism when an access generated from the processor element is transferred to each of the routers
wherein, when any one of routing path information being stored in the cache mechanism changes to a state indicating that data is already transferred to a majority of the routers and the processor elements that can be the transfer destination, the router transfers the data to the routers and the processor elements to which the data is not transferred yet.
10. The data transfer method according to claim 9, wherein, after the data is already transferred to the majority of the routers and the processor elements that can be the transfer destination and the data is transferred to the routers and the processor elements to which the data is not transferred yet, the router more preferentially rewrites an entry in which the data is stored than an entry in which data indicated by the routing path information as not being transferred to all of the routers and the processor elements that can be the transfer destination is stored.
US12/874,495 2010-01-08 2010-09-02 Multi-processor system with mesh topology routers comprising local cache storing for each data information indicating redundancy in neighbor router cache for cache management Expired - Fee Related US8612725B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2010003159A JP5404433B2 (en) 2010-01-08 2010-01-08 Multi-core system
JP2010-003159 2010-02-11

Publications (2)

Publication Number Publication Date
US20110173415A1 US20110173415A1 (en) 2011-07-14
US8612725B2 true US8612725B2 (en) 2013-12-17

Family

ID=44259413

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/874,495 Expired - Fee Related US8612725B2 (en) 2010-01-08 2010-09-02 Multi-processor system with mesh topology routers comprising local cache storing for each data information indicating redundancy in neighbor router cache for cache management

Country Status (2)

Country Link
US (1) US8612725B2 (en)
JP (1) JP5404433B2 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2899644A4 (en) * 2012-07-17 2017-09-27 Sanechips Technology Co., Ltd. Device and method for inter-core communication in multi-core processor
US9924490B2 (en) 2013-10-09 2018-03-20 International Business Machines Corporation Scaling multi-core neurosynaptic networks across chip boundaries
US9690494B2 (en) * 2015-07-21 2017-06-27 Qualcomm Incorporated Managing concurrent access to multiple storage bank domains by multiple interfaces

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5373927A (en) 1976-11-10 1978-06-30 Fujitsu Ltd Replacing system of intermediate buffer memory
JPH06208547A (en) 1993-01-08 1994-07-26 Sony Corp Communication controller
US5900015A (en) 1996-08-09 1999-05-04 International Business Machines Corporation System and method for maintaining cache coherency using path directories
JP2000020489A (en) 1998-06-29 2000-01-21 Toshiba Corp Data transfer device for computer
US6507854B1 (en) * 1999-11-05 2003-01-14 International Business Machines Corporation Enhanced network caching and mirroring system
US20040111489A1 (en) 2001-12-12 2004-06-10 Yuji Yamaguchi Image processing apparatus and method thereof
US20050120134A1 (en) * 2003-11-14 2005-06-02 Walter Hubis Methods and structures for a caching to router in iSCSI storage systems
WO2006056900A1 (en) 2004-11-24 2006-06-01 Koninklijke Philips Electronics N.V. Coherent caching of local memory data
JP2009054083A (en) 2007-08-29 2009-03-12 Hitachi Ltd Processor, data transfer unit, and multi-core processor system
US7619545B2 (en) 2007-03-12 2009-11-17 Citrix Systems, Inc. Systems and methods of using application and protocol specific parsing for compression
US8392664B2 (en) * 2008-05-09 2013-03-05 International Business Machines Corporation Network on chip

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS61166651A (en) * 1985-01-18 1986-07-28 Fujitsu Ltd Replacing system for buffer memory
JPH05120130A (en) * 1991-10-29 1993-05-18 Nec Eng Ltd Memory access processing system
JP3309425B2 (en) * 1992-05-22 2002-07-29 松下電器産業株式会社 Cache control unit
JPH09128296A (en) * 1995-11-02 1997-05-16 Hitachi Ltd Data processor
JP2000010860A (en) * 1998-06-16 2000-01-14 Hitachi Ltd Cache memory control circuit, processor, processor system, and parallel processor system
US6457100B1 (en) * 1999-09-15 2002-09-24 International Business Machines Corporation Scaleable shared-memory multi-processor computer system having repetitive chip structure with efficient busing and coherence controls
JP2003256276A (en) * 2002-02-27 2003-09-10 Nec Corp Switch device with incorporated cache with inter-switch data transfer function, and control method
JP2006079218A (en) * 2004-09-08 2006-03-23 Fujitsu Ltd Memory control device and control method
JP2009252165A (en) * 2008-04-10 2009-10-29 Toshiba Corp Multi-processor system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5373927A (en) 1976-11-10 1978-06-30 Fujitsu Ltd Replacing system of intermediate buffer memory
JPH06208547A (en) 1993-01-08 1994-07-26 Sony Corp Communication controller
US5900015A (en) 1996-08-09 1999-05-04 International Business Machines Corporation System and method for maintaining cache coherency using path directories
JP2000020489A (en) 1998-06-29 2000-01-21 Toshiba Corp Data transfer device for computer
US6507854B1 (en) * 1999-11-05 2003-01-14 International Business Machines Corporation Enhanced network caching and mirroring system
US20040111489A1 (en) 2001-12-12 2004-06-10 Yuji Yamaguchi Image processing apparatus and method thereof
US7333115B2 (en) 2001-12-12 2008-02-19 Sony Corporation Image processing apparatus and method thereof
US20050120134A1 (en) * 2003-11-14 2005-06-02 Walter Hubis Methods and structures for a caching to router in iSCSI storage systems
WO2006056900A1 (en) 2004-11-24 2006-06-01 Koninklijke Philips Electronics N.V. Coherent caching of local memory data
US7619545B2 (en) 2007-03-12 2009-11-17 Citrix Systems, Inc. Systems and methods of using application and protocol specific parsing for compression
JP2009054083A (en) 2007-08-29 2009-03-12 Hitachi Ltd Processor, data transfer unit, and multi-core processor system
US8392664B2 (en) * 2008-05-09 2013-03-05 International Business Machines Corporation Network on chip

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JP Office Action issued in corresponding JP Application No. 2010-003159 on Jul. 23, 2013, along with English translation thereof.

Also Published As

Publication number Publication date
JP5404433B2 (en) 2014-01-29
US20110173415A1 (en) 2011-07-14
JP2011141831A (en) 2011-07-21

Similar Documents

Publication Publication Date Title
US9753872B2 (en) Information processing apparatus, input and output control device, and method of controlling information processing apparatus
EP2472412B1 (en) Explicitly regioned memory organization in a network element
JP2012150830A (en) Software caching with bounded-error delayed update
US20090259813A1 (en) Multi-processor system and method of controlling the multi-processor system
US8612725B2 (en) Multi-processor system with mesh topology routers comprising local cache storing for each data information indicating redundancy in neighbor router cache for cache management
US8464004B2 (en) Information processing apparatus, memory control method, and memory control device utilizing local and global snoop control units to maintain cache coherency
US20110185128A1 (en) Memory access method and information processing apparatus
US11556471B2 (en) Cache coherency management for multi-category memories
US20190042428A1 (en) Techniques for requesting data associated with a cache line in symmetric multiprocessor systems
JP2007156821A (en) Cache system and shared secondary cache
US10970208B2 (en) Memory system and operating method thereof
US8015372B2 (en) Apparatus and method for memory migration in a distributed memory multiprocessor system
US9983994B2 (en) Arithmetic processing device and method for controlling arithmetic processing device
US20080104333A1 (en) Tracking of higher-level cache contents in a lower-level cache
JP6213366B2 (en) Arithmetic processing apparatus and control method thereof
JP2018195183A (en) Arithmetic processing unit and method of controlling arithmetic processing unit
US20160378671A1 (en) Cache memory system and processor system
US9436613B2 (en) Central processing unit, method for controlling central processing unit, and information processing apparatus
US9430397B2 (en) Processor and control method thereof
JP5881568B2 (en) Scan transmission gateway device
JP5045334B2 (en) Cash system
JPH06103244A (en) Parallel computer
US7818508B2 (en) System and method for achieving enhanced memory access capabilities
CN101127011A (en) Information processing board, information processing system, and method of updating tag
JP2017058951A (en) Memory system

Legal Events

Date Code Title Description
AS Assignment

Owner name: KABUSHIKI KAISHA TOSHIBA, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TANABE, JUN;USUI, HIROYUKI;REEL/FRAME:024931/0119

Effective date: 20100818

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.)

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20171217