WO2007080718A1

WO2007080718A1 - Bridge, information processor, information processing system, and method of managing global address

Info

Publication number: WO2007080718A1
Application number: PCT/JP2006/323947
Authority: WO
Inventors: Takeshi Yamazaki; Hideyuki Saito; Yuji Takahashi; Hideki Mitsubayashi
Original assignee: Sony Corporation; Sony Computer Entertainment Inc.
Priority date: 2006-01-16
Filing date: 2006-11-30
Publication date: 2007-07-19
Also published as: JP2009110032A

Abstract

To provide resource transparency between nodes on a computer network. In an information processing system composed of a plurality of processor units connected to each other by a switching apparatus, a global address space having a map of the effective addresses of the processor units and shared between the processor units is introduced. A bridge for relaying the input/output bus of each processor unit to the input/output bus of the switching apparatus receives an access request packet having an effective address of a target node from the processor unit, converts the effective address of the target node into a global address by adding to the effective address the target node's node identification number necessary for switching, and outputs the access request packet having the global address to the switching apparatus.

Description

Specification

Bridge, information processing apparatus, information processing system, and global address management method

Technical field

The present invention relates to an access technology between nodes on a computer network.

Background art

Conventionally, on a computer network, nodes that communicate with each other cannot see each other's resources such as memory and IO devices, and have no resource transparency.

[0003] Therefore, for example, in a network such as Ethernet (registered trademark), access between nodes is performed by network hardware such as a NIC (network 'interface' card) provided in the node or devices of these hardware. This is done via a driver, and there is a problem that the overhead of the system is heavy and leads to overhead.

[0004] Further, in a computer network such as a multiprocessor system in which a plurality of processors are connected, there is no resource transparency between the processors, and therefore, a shared memory that can be accessed by each processor is provided. It is conceivable that data is transferred between processors via a PC. In this case, for example, when processor A wants to pass data to processor B, direct delivery is not possible. Processor A copies the data to shared memory and processor B reads the data from the shared memory and copies it. Processing is required, and overhead problems may occur as well.

Disclosure of the invention

Problems to be solved by the invention

[0005] The present invention has been made in view of the above circumstances, and an object thereof is to provide resource transparency between nodes on a computer network.

Means for solving the problem

[0006] One embodiment of the present invention is a bridge. This bridge relays the input / output node of a processor unit to the input / output node of a switching device to which a plurality of processor units are interconnected, and includes an upstream port, a conversion unit, and a downstream port. . [0007] The upstream port is assumed to be one of the target nodes of the plurality of processor units from the processor unit on the premise of a global address space shared among the plurality of processor units to which the effective addresses of the processor units are mapped. An access request packet specifying the effective address of is received. Here, the effective address is an address indicating a predetermined position in the effective address space. Further, the effective address space is a combination of a part of the memory space that is partially cut off from each of the storage means including the main memory and the like scattered in each processor unit. One effective memory space corresponds to one processor unit. In other words, there are as many effective memory spaces as there are processor units, and each processor unit uses its own effective memory space as work memory. By optimizing the internal structure of the effective memory space, the corresponding processor unit can operate at maximum performance.

[0008] The conversion unit converts the effective address of the target node into a global address by adding the node identification number necessary for switching of the target node to the upstream port and the effective address of the target node.

[0009] The downstream port outputs a conversion unit and an access request packet in which a glow address is specified to the switching device.

[0010] In addition, when the global address space is logically divided by an application identification number that uniquely identifies a distributed application operating on a plurality of processor units, the conversion unit accesses an access with a global address specified. You may want to store the identification number in the request packet.

[0011] When the downstream port receives an access request packet in which the effective address is specified with the processor unit as a target node from a source node that is one of a plurality of processor units, the conversion unit receives the access request packet. The upstream port may pass the application identification number acquired from the access request packet to the processor unit together with the effective address of the processor unit specified in the access request packet.

Another aspect of the present invention is an information processing apparatus. In this apparatus, a processor unit and an input / output bus of the processor unit are connected to each other by a plurality of processor units interconnected. And a bridge that relays to the I / O bus of the switching device. The bridge can be a bridge as described above.

[0013] Yet another embodiment of the present invention is an information processing system. The information processing system includes a plurality of processor units, a switching device that interconnects the plurality of processor units, and a bridge that relays the input / output bus of each processor unit to the input / output bus of the switching device. This bridge can also be a bridge of the above-described embodiment.

[0014] It should be noted that a configuration in which the constituent elements and expressions of the present invention are mutually replaced among methods, systems, programs, storage media storing programs, and the like are also effective as an aspect of the present invention. The invention's effect

[0015] The present invention can provide resource transparency between nodes on a computer network.

Brief Description of Drawings

FIG. 1 is a diagram showing an information processing system used for explaining the outline of the present invention.

2 is a diagram showing a configuration of nodes in the information processing system shown in FIG.

FIG. 3 is a diagram (part 1) illustrating the concept of a global address space.

FIG. 4 is a diagram showing a configuration of a bridge in the node shown in FIG.

FIG. 5 is a diagram showing a global address format corresponding to the global address space shown in FIG. 3.

FIG. 6 is a diagram (part 2) illustrating the concept of the global address space.

7 is a diagram showing an example of a global address format corresponding to the global address space shown in FIG.

FIG. 8 is a diagram showing a configuration of an information processing system according to an embodiment of the present invention.

9 is a diagram showing a configuration example of a node in the information processing system shown in FIG.

10 is a diagram showing a configuration of a bridge in the node shown in FIG.

FIG. 11 is a diagram illustrating an example of a conversion table used when converting an effective address to a physical address and a conversion result.

Explanation of symbols

[0017] 20 bridges, 22 upstream ports, 24 converters, 26 downstream ports, 30 processes Subunit, 40 nodes, 50 switches, 80 switches, 100 nodes, 110 bridges, 112 First input / output section, 114 Bridge controller, 116 registers, 11 8 Second input / output section, 120 Multi-core processor, 130 SPE, 132 cores, 13 4 Local memory, 136 MFC, 138 DMAC, 140 PPE, 142 cores, 14 4 L1 cache, 145 L2 cache, 146 MFC, 148 DMAC, 150 ring bus, 160 IOIF, 164 IO controller, 170 memory Controller, 180 main memory.

BEST MODE FOR CARRYING OUT THE INVENTION

[0018] Before describing the details of the embodiment of the present invention, first, an outline of the technique proposed by the present inventor will be described.

Consider the information processing system shown in FIG. This information processing system has a plurality of nodes 40 as an example here, and these nodes are connected by a switching device (hereinafter simply referred to as a switch) 50.

FIG. 2 shows the configuration of the node 40.

The node 40 includes a processor unit 30 and a bridge 20 that relays an input / output bus (not shown) of the processor unit 30 to an input / output bus (not shown) of the switch 50. The processor unit 30 may be a single processor or may be composed of a plurality of processors. Here, before explaining the details of the bridge 20, the global address space used in this technology will be explained.

[0022] FIG. 3 illustrates the concept of the global address space. As shown in FIG. 3, in the global address space, the effective address power of each processor unit 30 is mapped so as to be associated with the node to which this processor unit 30 belongs. This global address space is shared among the processor units 30, and the address in the space is hereinafter referred to as a global address.

[0023] Access between the nodes 40 is performed using this global address space.

When accessing the target node, the processor unit 30 of the source node issues an access request packet specifying the effective address of the target node and outputs it to the bridge 20. This access request packet is issued by, for example, DMA (direct 'memory' access). B) using architecture.

FIG. 4 shows the configuration of the bridge 20. The bridge 20 includes an upstream port 22 that transmits / receives data to / from the processor unit 30 via the input / output bus of the processor unit 30, and a downstream port 26 that transmits data to / from the switch 50 via the input / output bus of the conversion unit 24 and switch 50. Is provided.

The upstream port 22 receives the access request packet issued by the processor unit 30, and the conversion unit 24 uses the effective address of the target node as a node identification number (hereinafter referred to as a node) of the target node for switching. By adding ID), the effective address of the target node is converted to a global address. Then, the downstream port 26 outputs an access request packet specifying this global address to the switch 50.

[0027] The node ID can indicate the physical location of the node 40 in the network. For example, the number of the connection port to which the node is connected in the switch 50 can be used.

[0028] FIG. 5 shows a format of a global address. As shown in the figure, the global address consists of the node ID and effective address.

[0029] The downstream port 26 of the bridge 20 outputs an access request packet specifying such a global address to the switch 50, and the switch 50 reads the node ID included in the global address of the received access request packet. Then, the access request packet is transferred to the node 40 connected to the connection port indicated by the node ID.

The bridge 20 of the node 40 serving as the target node receives the access request packet through the downstream port 26. The upstream port 22 outputs the effective address designated by the access request packet to the processor unit 30. In this case, the conversion unit 24 may read out the effective address of the global address and pass it to the upstream port 22, and when the switch 50 transfers the access packet, only the effective address included in the global address is transmitted. It may be outputted to the bridge 20 so that the intervention of the conversion unit 24 is not required. Alternatively, the access request packet may be directly passed to the processor unit 30 and the processor unit 30 may read the effective address.

[0031] The processor unit 30 converts the effective address into a physical address.

[0032] Thus, by mapping a part of the user space to the global address space, Therefore, resource transparency can be obtained between nodes on a computer network. This enables access between nodes without operating network hardware and device drivers for these hardware, and low-overhead memory access is realized.

Next, a case will be described in which this technique is applied to a distributed application system in which a plurality of nodes on the network can process the same application in parallel.

[0034] In this case, an area is allocated for each active application in a node that executes an application, specifically, a memory in a processor unit included in the node. The technique proposed by the present inventor makes it easy to access an area allocated for this application between the same applications operating on different nodes in a distributed application system.

[0035] For this purpose, an application identification number (hereinafter referred to as application ID or APID) is introduced, and the global address space shown in FIG. 3 is logically divided using this application ID.

[0036] The application ID is assigned to each active application and uniquely identifies the application in the entire system. That is, the same application running on different nodes will have the same application ID.

FIG. 6 shows the concept of the global address space in this case. The effective address on the right is the effective address space corresponding to each node and the application running on that node. The central address space is a set of effective address spaces corresponding to the same application scattered in the information processing system. The address space on the left is a set of address spaces corresponding to each application in the information processing system.

[0038] Access between nodes in this case will be described using the system configuration shown in FIG.

When accessing the target node, the processor unit 30 of the source node issues an access request packet specifying the effective address of the target node and outputs it to the bridge 20.

[0040] The conversion unit 24 of the bridge 20 adds the APID and the target node to the effective address of the target node. By adding the node ID of the host node, the effective address of the target node is converted to a global address, this global address is specified via the downstream port, and the node of the source node that is the request source of the access The access request packet with the ID added is output to switch 50.

FIG. 7 shows an example of a global address format. As shown in the figure, this global address is composed of an application ID (APID), a node ID, and an effective address.

Note that the format of the global address is not limited to the example shown in FIG. The A PID is included in the access request packet and can be read by the bridge 20 of the target node. For example, the format shown in Fig. 5 is used as the format of the global address, and the APID is stored in the access request packet. Well ...

[0043] Upon receiving the access request packet, the bridge 20 of the node 40 serving as the target node reads the APID from the access request packet, and, along with this APID, the effective address specified by the access request packet as the node Pass to 40.

[0044] The processor unit 30 converts the effective address into a physical address of an area provided for the application corresponding to the access request source node ID or APID.

[0045] Thus, by logically dividing the global space in which the effective address in each node is mapped using A PID and adding the APID to the global address or storing the APID in the access request packet, User applications can easily access memory space used by applications across nodes. Furthermore, in this case, it is preferable that the processor unit uses the APID to which the bridging power is also passed to determine whether or not to permit this access. Alternatively, the processor unit may determine permission or disapproval using the access request source node ID.

[0046] For example, the processor unit may use an address conversion table for conversion when converting an effective address into a physical address. A permission identification number indicating access permission Z prohibition to the physical address space that is the access request destination is added to this address conversion table. The permission identification number is, for example, the node ID of the access request source node that permits access for each physical address space, or the access request source application. APID, which is the identification information of the client. When the processor unit converts the effective address to the physical address by referring to this conversion table, it corresponds to the effective address in the conversion request table and the access request node ID or APID to which the bridge power is also passed. Access to this effective address can be determined based on whether or not it matches the authorization identification number assigned to it.

[0047] By doing this, the internal configuration of each effective memory space is optimized so that the application operates across the nodes and the processor unit or the application operates at the maximum performance in each node. In this case, the mutual access between the nodes in the effective memory space can be facilitated, and the area power allocated to a given application can be prevented from being accessed.

[0048] Further, by performing conversion from the execution address to the physical address on the processor unit side, the configuration of the bridge can be simplified. Furthermore, the conversion table can be adjusted independently of the bridge on the processor unit side. Furthermore, even if nodes of different types of IO devices and memory sizes are mixed, access between nodes can be made possible by the bridge with the same configuration.

[0049] Hereinafter, a system will be described by embodying the above outline of the embodiment of the present invention.

FIG. 8 shows a configuration of an information processing system according to an embodiment related to the present invention. This information processing system includes a plurality of nodes 100 and a switch 80 that connects these nodes to form a network. The node 100 is connected to the switch 80 by a connection bus (not shown), and this connection bus is, for example, a PCI Express (registered trademark) bus.

FIG. 9 shows a configuration example of the node 100 in the information processing system shown in FIG. The node 100 includes a bridge 110, a multi-core processor 120, and a main memory 180.

[0052] The multi-core processor 120 is formed in one chip, and includes a main processing unit PPE (Power Processing Element) 140, a plurality of, in the illustrated example, eight sub-processing units SPE (Synergistic Processing Element) 130, IO Interface (hereinafter referred to as IOIF) 160 and memory controller 170, which are connected by ring bus 150 Is done.

The main memory 180 is a shared memory of the processing units of the multi-core processor 120, and is connected to the memory controller 170.

The memory controller 170 mediates access to the main memory 180 by the PPE 140 and each SPE 130. In the example shown in FIG. 9, the main memory 180 is a multi-core processor 1

The force provided outside the 20 may be provided so as to be included in the multi-core processor 120.

The IOIF 160 is connected to the bridge 110 via an IOIF bus (not shown), and allows access between the nodes 100 in cooperation with the bridge 110. The IOIF 160 includes an IO controller 1 64.

[0056] The SPE 130 includes a core 132, a low-power memory 134, and a memory flow controller (hereinafter referred to as MFC) 136. The MFC 136 includes a DMAC (Direct Memory Access Controller) 138. It is desirable that the local memory 134 is not a conventional hardware cache memory. For this purpose, a hardware cache circuit built in the chip or placed outside the chip to realize the hardware cache memory function, a cache memory, etc. There is no register or cache memory controller.

[0057] The PPE 140 includes a core 142, an L1 cache 144, an L2 cache 145, and an MFC 146. The MFC 146 includes a DMAC 148.

[0058] Normally, the operating system (hereinafter also referred to as OS) of the multi-core processor 120 operates in the PPE 140, and a program operating in each SPE 130 is determined based on the basic processing of the OS. The program that operates on the SPE 130 may be a program that forms part of the OS functions (for example, a device driver or a part of a system program). Note that the instruction set architectures of PPE140 and SPE130 have different instruction sets. Further, instead of operating on the PPE 140, the OS may operate on each SPE 130. That is, an OS using a part of the core 132 and the local memory 134 included in each SPE 130 operates. The OS of each SPE130 works together to form one OS as a whole. In addition, the OS running on each SPE130 can communicate data with other SPE130s from itself by executing instructions from PPE140 when executing any application. It is possible to check and acquire the tasks that are included in the application and wait for execution by performing communication or accessing the main memory 180, and execute them if they can be executed. When an OS runs on a PPE140 and the OS decides a program that runs on each SPE130, the force that may have caused any SPE130 to not run is due to the OS running on each SPE130. All SPE130s can continue to run for application execution.

The information processing system shown in FIG. 8 is a distributed application system in which the same application can operate on a plurality of multi-core processors 120. In this system, the global address space shown in Fig. 6 is used.

FIG. 10 shows the configuration of the bridge 110. The bridge 110 includes a first input / output unit 112, a bridge controller 114, and a second input / output unit 118. The bridge controller 114 includes a register group 116.

[0061] The first input / output unit 112 and the second input / output unit 118 are the same as the upstream port 22 and the downstream port 26 of the bridge 20 shown in FIG. Omitted.

[0062] When the node to which the bridge controller 114 belongs is a source node, the bridge controller 114 converts the effective address designated by the access request packet issued by the multi-core processor 120 into a global address. Also, when the node to which it belongs is accessed as the target node, the global address strength APID included in the access request packet received by the second input / output unit 118 is read, and the access request packet is specified along with this APID. The effective address is passed to the multi-core processor 120.

[0063] In the bridge controller 114, there is an IO address space for converting the effective address in each node 100 and the global address in the entire information processing system. The IO address space is divided into one or more segments, and each segment is further divided into one or more pages. The bridge controller 114 includes a register for performing conversion for each page, and a set of these registers is a register group 116. The conversion register provided in the bridge controller 114 includes the node ID of the access request source node that is permitted to access the page or the APID that is identification information of the access request source application, and the main mapped in association with the page. In memory 180 A physical address or the like is included for each page.

[0064] Information for each page is stored in the register group 1 via the IOIF 160 by the system program.

16 is written to the corresponding register.

Here, access between the nodes 100 will be described.

First, the operation of the node 100 that performs access will be described.

[0067] The multi-core processor 120 included in the node 100 accesses another node as a target node. In accessing, the multi-core processor 120 issues an access request packet to the bridge 110. This access request packet includes the effective address (in this embodiment, an offset in the IO address space of the target node) of the access request destination application in the target node. This access request packet is issued by the DMAC power of any processing unit included in the multi-core processor 120.

[0068] The bridge controller 114 of the bridge 110 converts the effective address into a global address. Here, the format shown in Fig. 7 is used for the global address. In other words, the global address obtained by the bridge controller 114 is composed of the APID of the application requesting access, the node ID of the target node, and the effective address power at the target node. Here, the node ID of the source node that is the access request source is added to the access request packet.

[0069] Bridge 110 reads the node ID included in the access request packet specifying this global address, and forwards the access request packet to the target node indicated by this node ID.

Next, the operation of the accessed node 100, that is, the target node will be described.

When the bridge 110 of the node 100 serving as the target node receives the access request packet, the bridge controller 114 reads out the APID included in the access request packet. Then, the effective address included in the access request packet is converted into an IO address.

[0072] The bridge 110 transmits the IO address thus obtained to the IOIF 160 of the multi-core processor 120 together with the APID. [0073] The IO controller 164 of the IOIF 160 converts the IO address into a physical address in the main memory 180. Figure 11 shows an example of the conversion table used for this conversion and the result of the conversion.

[0074] The main memory 180 is divided into segments, and each segment is also divided into pages of a predetermined page size. When the page size is ΚΒ, each bit of 36-bit IO address is defined as follows: IO Address [34: 28] = Segment, IO Address [27: 12] = Page, IO Address [ 11: 0] = Offset.

[0075] As shown in FIG. 11, the conversion table maps the physical address (space) for each page included in the segment, and indicates access permission Z prohibition by the permission identification number IOID. Here, IOID indicates the node ID of the access request source node that is permitted to access the corresponding physical address (space), or APID that is the identification information of the access request source application.

The IO controller 164 refers to the conversion table for the segment number and page number represented by the IO address received from the bridge 110 and finds whether access is permitted. For example, if the access request source node ID received with an IO address indicating “Segment = 1, Page = 2, Offset = 0”, or APID is C, the conversion table will have segment = 1, page = 2 Since IOID = C is associated with the access request node ID or the application identified by APID = C is permitted to access physical address d, IO controller 164 Allows access to physical address d. On the other hand, if the access request source node ID received with an IO address indicating `` Segment = 127, Page = 1, Offset = 0 '', or APID is C, the conversion table is set to Segment = 127 and Page = 1 Since IOID = D is associated with this, indicating that access by the application identified by the access request source node ID or APID = C is not permitted, the IO controller 164 returns an error signal and accesses To refuse.

[0077] As can be seen from the conversion table, for the same access request source node or the same application (for example, the access request source node corresponding to IOID = A or the application), a continuous area (in memory) ( In the area indicated by physical addresses a and b) Be rubbed.

[0078] The present invention has been described based on the embodiments. It is to be understood by those skilled in the art that the embodiments are illustrative, and that various modifications can be made to combinations of the respective constituent elements and processing processes, and that such modifications are also within the scope of the present invention. It is understood.

As such a variation, the switch 80 may be configured to use a switch that interconnects buses of standards other than the force PCI that interconnected PCI buses. Industrial applicability

[0079] The present invention can be applied to the field of computer network systems.

Claims

The scope of the claims

[1] A bridge that relays an input / output bus of a processor unit to an input / output bus of a switching device to which a plurality of processor units are interconnected.

Assuming a global address space shared among the plurality of processor units to which the effective addresses of the respective processor units are mapped, the effective address of one of the target nodes of the plurality of processor units is obtained from the processor unit. An upstream port that receives the specified access request packet; and

A conversion unit that converts the effective address of the target node into a global address by adding a node identification number necessary for switching of the target node to the effective address of the target node;

And a downstream port that outputs an access request packet in which the global address is specified to the switching device.

[2] When the global address space is logically divided by an application identification number that uniquely identifies a distributed application operating on the plurality of processor units, the conversion unit is designated with the global address. The bridge according to claim 1, wherein the application identification number is stored in a received access request packet.

[3] When the downstream port receives an access request packet designating the effective address from the source node which is one of the plurality of processor units with the processor unit as the target node, The application identification number is acquired from the access request packet, and the upstream port includes the application identification number acquired from the access request packet together with the effective address of the processor unit specified in the access request packet. The bridge according to claim 2, wherein the bridge is passed to the processor unit.

[4] processor unit;

A bridge that relays the input / output bus of the processor unit to the input / output bus of a switching device to which the plurality of processor units are interconnected;

The bridge is Assuming a global address space shared among the plurality of processor units to which the effective addresses of the respective processor units are mapped, the effective address of one of the target nodes of the plurality of processor units is obtained from the processor unit. An upstream port that receives the specified access request packet; and

And a downstream port that outputs an access request packet in which the global address is designated to the switching device.

[5] When the global address space is logically divided by an application identification number that uniquely identifies a distributed application operating on the plurality of processor units, the conversion unit is designated with the global address. 5. The information processing apparatus according to claim 4, wherein the application identification number is stored in an access request packet.

[6] When the downstream port receives an access request packet in which the effective address is specified with the processor unit as a target node from a source node that is one of the plurality of processor units, the conversion unit: The application identification number is acquired from the access request packet, and the upstream port includes the application identification number acquired from the access request packet together with the effective address of the processor unit specified in the access request packet. 6. The information processing apparatus according to claim 5, wherein the information processing apparatus is passed to a processor unit.

[7] The processor unit according to claim 6, wherein the processor unit determines whether or not access to the effective address of the processor unit is permitted based on the application identification number to which the bridging power is also passed. Information processing device.

[8] The processor unit includes an address conversion table for converting an effective address into a physical address, and the address conversion table uses an identification number of a distributed application that permits access to the effective address as an effective identification number. Stored in correspondence. When the processor unit receives an access request to the effective address of the processor unit and refers to the address conversion table to convert the effective address to a physical address, the processor unit associates the effective address with the effective address in the address conversion table. 8. The information processing according to claim 7, wherein whether or not access to the effective address is permitted is determined based on whether or not the authorization identification number obtained matches the application identification number passed from the bridge. apparatus.

[9] With multiple processor units,

A switching device interconnecting a plurality of processor units;

A bridge for relaying the input / output bus of each processor unit to the input / output bus of the switching device,

The bridge is

Assuming a global address space shared among the plurality of processor units to which the effective address of each processor unit is mapped, the effective address of the target node which is one of the plurality of processor units from the processor unit An upstream port that receives an access request packet specifying

[10] A global address management method in an information processing system in which each of a plurality of processor units is connected to a switching device via a bridge,

Assuming a global address space that is shared among the plurality of processor units to which the effective addresses of the respective processor units are mapped, the effective address of one of the plurality of processor units from one processor unit Receiving an access request packet specifying

By adding a node identification number required for switching of the target node to the effective address of the target node, the effective address of the target node is grouped. A step of converting to a bar address;

Transferring the access request packet in which the global address is designated to the switching device.

[11] When the global address space is logically divided by an application identification number that uniquely identifies a distributed application operating on the plurality of processor units, an access request packet in which the global address is specified is included. 11. The global address management method according to claim 10, wherein the application identification number is stored in and then transferred to the switching device.

[12] When an access request packet designating an effective address is received from any one of the plurality of processor units as a target node, the stored in the access request packet 12. The global address management method according to claim 11, wherein by collating the application identification number, access to the effective address of the one processor unit is permitted, but access to a distributed application is prohibited.