JP3628595B2 - Interconnected processing nodes configurable as at least one NUMA (NON-UNIFORMMOMERYACCESS) data processing system - Google Patents

Interconnected processing nodes configurable as at least one NUMA (NON-UNIFORMMOMERYACCESS) data processing system Download PDF

Info

Publication number
JP3628595B2
JP3628595B2 JP2000180619A JP2000180619A JP3628595B2 JP 3628595 B2 JP3628595 B2 JP 3628595B2 JP 2000180619 A JP2000180619 A JP 2000180619A JP 2000180619 A JP2000180619 A JP 2000180619A JP 3628595 B2 JP3628595 B2 JP 3628595B2
Authority
JP
Japan
Prior art keywords
plurality
processing
system
data processing
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2000180619A
Other languages
Japanese (ja)
Other versions
JP2001051959A (en
Inventor
ジェームス・ライル・ピータソン
デビッド・ブライアン・グラスコ
ビショップ・チャップマン・ブロック
ラマクリシナン・ラジャモニ
ロナルド・リン・ロックホールド
Original Assignee
インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Maschines Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US09/335,301 priority Critical patent/US6421775B1/en
Priority to US09/335301 priority
Application filed by インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Maschines Corporation filed Critical インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Maschines Corporation
Publication of JP2001051959A publication Critical patent/JP2001051959A/en
Application granted granted Critical
Publication of JP3628595B2 publication Critical patent/JP3628595B2/en
Application status is Expired - Fee Related legal-status Critical
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/177Initialisation or configuration control

Description

[0001]
BACKGROUND OF THE INVENTION
The present invention generally relates to data processing, and more particularly, to a NUMA (non-uniform memory access) data processing system. More specifically, the present invention relates to a collection of interconnected processing nodes that can be configured as one or more data processing systems including at least one NUMA data processing system.
[0002]
[Prior art]
In computer technology, it is well known that high computer system performance can be achieved by linking the processing power of multiple individual processors together. Multiprocessor (MP) computer systems can be designed with a number of different topologies, each of which is more suitable for a particular application depending on the performance requirements and software environment of each application. There is. One common MP computer topology is a symmetric multiprocessor (SMP) configuration in which multiple processors share common resources, such as system memory and I / O subsystems, typically coupled to a shared system interconnect. It is. Such a computer system is called symmetric because all processors in an SMP computer system ideally have the same access latency for data stored in shared system memory. .
[0003]
While SMP computer systems allow the use of relatively simple interprocessor communication and data sharing methodologies, SMP computer systems have limited scalability. In other words, the performance of SMP computer systems can generally be expected to improve with scale (ie with the addition of more processors), but with inherent bus, memory and I / O bandwidth limitations. As such, there is no significant benefit from scaling SMP beyond the implementation-dependent size where utilization of these shared resources is optimized. Thus, the SMP topology itself suffers some damage from bandwidth limitations, particularly in system memory, as the scale of the system increases. SMP computer systems do not scale well from a manufacturing efficiency perspective. For example, some components can be optimized for use with both a single processor and a small SMP computer system, but such components can be used with a large SMP. Is often inefficient. Conversely, a component designed for use in a large scale SMP can become impractical for use in a small scale system from a cost standpoint.
[0004]
As a result, there has recently been increased interest in MP computer system topologies that address many of the limitations of SMP computer systems, at the expense of some additional complexity, called NUMA (non-uniform memory access). . A typical NUMA computer system includes a plurality of interconnected nodes, each including one or more processors and local “system” memory. Such a computer system is said to have non-uniform memory access when each processor is in a remote node with respect to data stored in its local node's system memory. This is because it has a lower access latency than that associated with data stored in the system memory. NUMA systems can be further classified as either non-coherent or cache coherent depending on whether data coherency is maintained between the caches of different nodes. The complexity of a cache coherent NUMA (CC-NUMA) system is not only between the various levels of cache memory and system memory within each node, but also between cache memory and system memory within different nodes. It is largely attributed to the additional communication required for hardware to maintain data coherency. However, the NUMA computer system addresses the scalability limitations of regular SMP computer systems because each node in the NUMA computer system can be implemented as a smaller single processor or SMP system. Thus, the shared components within each node can be optimized for use by one or a few processors, while the entire system can take advantage of greater scale parallelism while maintaining relatively low latency. Profit.
[0005]
[Problems to be solved by the invention]
The present invention recognizes that the expense of large-scale NUMA data processing systems is difficult to justify in some computing environments, such as environments with varying workloads. That is, in some computing environments, the processing resources of a large NUMA data processing system are less often needed to run a single application, and multiple operating systems or multiple applications for running different applications Often, smaller data processing systems are required. Prior to the present invention, the changing workload of such a computing environment was physically re-established by multiple computer systems of different scales or by connecting or disconnecting nodes as needed. It could only be dealt with by configuring.
[0006]
[Means for Solving the Problems]
To address the aforementioned shortcomings of the art, the present invention provides a data processing system that includes a plurality of processing nodes each including at least one processor and a data storage device. Multiple processing nodes are coupled together by a system interconnect. The data processing system further includes a configuration utility that resides in at least one data storage device of the plurality of processing nodes. The configuration utility selectively configures a plurality of processing nodes into either a single NUMA (non-uniform memory access) system or a plurality of independent data processing systems via communication over the system interconnect.
[0007]
DETAILED DESCRIPTION OF THE INVENTION
System overview
Referring now to the drawings, and specifically to FIG. 1, an embodiment of a data processing system according to the present invention is shown. The illustrated embodiment can be implemented, for example, as a workstation, server, or mainframe computer. As can be seen, the data processing system 6 includes a plurality of processing nodes 8 (four in this case) interconnected by a node interconnect 22. As described further below, inter-node data coherence is maintained by an interconnect coherence unit (ICU) 36.
[0008]
Referring now to FIG. 2, each processing node 8a-8d includes one or more processors 10a-10m, a local interconnect 16, and a system memory 18 accessed via a memory controller 17. It is. The processors 10a to 10m are preferably identical (not necessary). In addition to the registers used to execute program instructions, instruction sequencing logic and execution units, generally indicated as processor core 12, each of processors 10a-10m has associated processor memory from system memory 18. An on-chip cache hierarchy 14 is included that is used to stage data in the core 12. Each of the cache hierarchies 14 includes, for example, a level 1 (L1) cache having a storage capacity of 8 to 32 kilobytes (kB) and a level 2 (L2) cache having a storage capacity of 1 to 16 megabytes (MB). be able to.
[0009]
Each processing node 8 a-8 d further includes a respective node controller 20 coupled between the local interconnect 16 and the node interconnect 22. Each of the node controllers 20 acts as a local agent for the remote processing node 8 by performing at least two functions. First, each node controller 20 snoops the associated local interconnect 16 to facilitate transmission of local communication transactions to the remote processing node 8. Second, each node controller 20 snoops communication transactions on the node interconnect 22 and masters related communication transactions (eg, read requests) on the associated local interconnect 16. Communication on each local interconnect 16 is controlled by an arbiter 24. Arbiter 24 coordinates access to local interconnect 16 based on bus request signals generated by processor 10 and compiles coherency responses for snooped communication transactions on local interconnect 16.
[0010]
The local interconnect 16 is coupled to the mezzanine bus 30 via a mezzanine bus bridge 26, which can be implemented as a peripheral component interconnect (PCI) local bus, for example. The mezzanine bus bridge 26 includes a low latency path through which the processor 10 can directly access devices of the input / output devices 32 and storage 34 that are mapped to a bus memory or input / output address space, and input / output Device 32 and storage 34 provide both high bandwidth paths through which system memory 18 can be accessed. Input / output devices 32 may include, for example, display devices, keyboards, graphical pointers, and serial and parallel ports for connection to external networks or attached devices. On the other hand, the storage device 34 may include an optical or magnetic disk that provides non-volatile storage for the operating system and application software.
[0011]
Local interconnect 16 is further coupled to memory bus 40 and service processor bus 44 via host bridge 38. The memory bus 40 is coupled to a non-volatile random access memory (NVRAM) 42 that stores processing node 8 configuration data and other critical data. Service processor bus 44 supports service processor 50, which acts as a boot processor for processing node 8. Processing node 8 boot code typically includes power-on self test (POST), basic input / output system (BIOS) and operating system loader code, and processing node 8 boot code is flash Stored in the memory 48. After booting, the service processor 50 acts as a system monitor for the processing node 8 software and hardware by executing system monitoring software from the service processor dynamic random access memory (SP DRAM) 46. .
[0012]
System configuration possibility
In the preferred embodiment of the present invention, the BIOS boot code stored in flash memory 48 allows the data processing system 6 to be selectively partitioned into one or more independently operable subsystems. Is included. As described in detail below, the data processing system 6 is responsive to the expected characteristics of the processing load by the configuration software as a single NUMA data processing system, as multiple NUMA data processing subsystems, or Other combinations of single node or multi-node (ie, NUMA) data processing subsystems may be advantageously configured. For example, if a large amount of processing power is required to run a single application, the data processing system 6 can be configured as a single NUMA computer system and thus have the processing power available to run that application. It is desirable to maximize. On the other hand, if it is necessary to run multiple separate applications or multiple separate operating systems, the data processing system 6 is configured as multiple NUMA data processing subsystems or multiple single node subsystems. It may be desirable.
[0013]
When the data processing system 6 is configured as a plurality of data processing subsystems, those data processing subsystems include disjoint, possibly different sized sets of processing nodes 8. Each of the multiple data processing subsystems can independently configure, run, shut down, reboot, and repartition without interfering with the operation of other data processing subsystems. Importantly, reconfiguration of the data processing system 6 does not require the addition or disconnection of the processing node 8 to the node interconnect 22.
[0014]
Memory coherency
Since the data stored in the system memory 18 can be requested, accessed and modified by any processor 10 in a given data processing subsystem, the cache coherence protocol is implemented. Maintain coherence both between caches in the same processing node and between caches in different processing nodes within the same data processing subsystem. The cache coherence protocol that is implemented is implementation dependent. However, in the preferred embodiment, the cache layer 14 and arbiter 24 implement the normal Modified, Exclusive, Shared, Invalid (modified, exclusive, shared, invalid, MESI) protocol or variations thereof. Inter-node cache coherency is preferably maintained through a directory-based mechanism that is centralized in the ICU 36 connected to the node interconnect 22, but instead in a directory maintained by the node controller 20. Can be dispersed. This directory-based coherence mechanism preferably recognizes the M, S and I states and regards the E state as merged with the M state for correctness. That is, data held exclusively by the remote cache is considered changed regardless of whether or not the data has actually been changed.
[0015]
Interconnect architecture
Local interconnect 16 and node interconnect 22 may be implemented using a variety of interconnect architectures. However, in the preferred embodiment, at least the node interconnect 22 is implemented as a switch-based interconnect controlled by a 6xx communication protocol developed by IBM Corporation of Armonk, New York. This point-to-point communication methodology allows the node interconnect 22 to route address and data packets from the source processing node 8 only to the processing node 8 in the same data processing subsystem. Become.
[0016]
The local interconnect 16 and the node interconnect 22 allow split transactions because there is no fixed timing relationship between the address ownership and data ownership that make up the communication transaction, and the data packet Can be ordered differently from the associated address packet. The use of local interconnect 16 and node interconnect 22 is preferably enhanced by pipelined communication transactions, where before the master of the previous communication transaction receives a coherency response from each destination. Subsequent communication transactions can be provided.
[0017]
Configuration utility
Referring now to FIG. 3, there is a high level logic flow diagram illustrating the process for partitioning and configuring a multi-node data processing system, such as data processing system 6, into one or more data processing subsystems in accordance with the present invention. It is shown. As can be seen, the process begins at block 80 in response to all processing nodes 8a-8d being powered on, and then proceeds to block 82 where each processing node 8's service The processor 50 executes the POST code from the flash memory 48 to initialize the local hardware to a known stable state. After POST, each service processor 50 executes conventional BIOS routines to interface with key peripherals (eg, keyboard and display) and initializes interrupt handling. Thereafter, as shown in block 84 and thereafter, the processor of each processing node 8 (ie, service processor 50 and / or processor 10) is responsible for the independent data processing subsystem into which data processing system 6 is partitioned. The execution of the BIOS configuration utility described above is initiated by obtaining input specifying the number and the specific processing node 8 belonging to each data processing subsystem. The input shown in block 84 can be obtained from any of a plurality of sources, such as, for example, a file residing on a data storage medium or operator input at one or more processing nodes 8.
[0018]
In the preferred embodiment of the present invention, the input shown in block 84 is obtained from an operator at such processing node 8 in response to a series of menu screens displayed at one or more processing nodes 8. . This input is then used at each processing node 8 to create a partition mask indicating the other processing nodes 8 that are grouped with that processing node 8 to form the data processing subsystem. For example, if each of the four processing nodes 8 in the data processing system 6 is assigned one bit of a 4-bit mask, the NUMA configuration including all processing nodes can be represented by 1111 The node NUMA subsystem can be represented by 0011 and 1100 or 1010 and 0101, the two node NUMA subsystem and the two single node subsystems are 0011, 1000 and 0100 (and other similar node combinations) Can be represented by If the input indicating the desired section of the data processing system 6 is supplied in less than all of the processing nodes 8, an appropriate mask is transmitted to the other processing nodes 8 via the node interconnect 22. . In this way, each processing node 8 has a record of those processing nodes when there are other processing nodes 8 grouped together.
[0019]
After block 84, the process proceeds to block 86 where each data processing subsystem of data processing system 6 independently completes its configuration, as described in detail below with respect to FIGS. Thereafter, processing continues at block 88.
[0020]
Referring now to FIGS. 4 and 5, a high-level logic flow diagram that illustrates the processes shown in block 86 of FIG. It is shown. The illustrated process will be described together to show details of the communication between them, but is preferably implemented as part of the BIOS configuration utility described above.
[0021]
The process shown in FIG. 4 represents the operation of the processing node 8 that is the master, and the process shown in FIG. 5 represents the operation of the processing node 8 that is the client (if present). After block 84, blocks 100 and 140 are started in parallel. As shown in blocks 102 and 142, respectively, each processing node 8 in the data processing subsystem determines whether it is the master processing node 8 responsible for completing the configuration of that data processing subsystem. judge. Although the master processing node 8 of the data processing subsystem can be determined by a number of well-known mechanisms including voting and competition, in the preferred embodiment the master processing node 8 is the bit set in the partition mask. Is set by default as the processing node 8 in the data processing subsystem having the least significant bit. The master processor of processing node 8 that has been determined to be the master (ie, either service processor 50 or designated processor 10) has its data as described in detail in blocks 104-130 of FIG. Manage the configuration of the processing subsystem.
[0022]
Referring to block 104, if there is a client processing node 8 belonging to the data processing subsystem, the master processor issues a message on the local interconnect 16 targeting that processing node 8. This message, represented by arrow A, claims that the processing node 8 is the master. This message is snooped by the local node controller 20 and forwarded via the node interconnect 22 to the processing node 8 of the indicated client. As shown in blocks 144 and 146, respectively, the client processing node 8 waits until it receives this message from the master, and in response to receipt of the message, acknowledges the acknowledgment message indicated by arrow B. Send to processing node 8. As shown in blocks 106 and 108 of FIG. 4, the master waits until an acknowledgment message is received from the client processing node 8, and after receiving the acknowledgment, the additional client processing node 8 may If the partition mask indicates that the message has not yet been contacted, return to block 104. This master assertion-acknowledgment protocol (alternatively, it can also be executed in parallel for multiple client processing nodes 8) is performed by all processing nodes 8 in the data processing subsystem. In addition to ensuring agreement on what is the master, it advantageously also synchronizes the various processing nodes 8 in the subsystem that can be powered on at different times and boot at different speeds. Work as well.
[0023]
Master processing node 8 acknowledges its mastership from all client processing nodes 8 (if any) in the data processing subsystem, as indicated by the process proceeding from block 108 to block 110 in FIG. After receiving, the master processing node 8 requests configuration information (eg, resource list) from the client processing node 8 (if present). This request for configuration information may include one or more messages to the client, but is represented by arrow C. As indicated by blocks 148 and 150 in FIG. 5, the client processing node 8 waits for this resource list request and, in response to receiving the resource list request, its I / O resources, the amount of system memory 18 present. Respond by sending one or more messages to the master processing node 8 specifying the number of processors 10 contained therein and other configuration information. This configuration information response is represented by arrow D. Blocks 112 and 114 of FIG. 4 indicate that the master processing node 8 waits for a response from the client processing node 8 and adds the specified resource to the subsystem resource list after receiving the response. As indicated at block 116, the master processing node 8 performs blocks 110-114 for each of the client processing nodes 8 specified in the partition mask.
[0024]
After the master has obtained the resource list from each client (if any), the master processor of the master processing node 8 is responsible for the overall subsystem, as shown by the process proceeding from block 116 to block 118 in FIG. The configuration is determined and a method for remapping each resource of the client processing node 8 is calculated. Next, at block 120, the master processor of the master processing node 8 specifies to the client processing node 8 (if present) one or more methods that the client processing node 8 remaps its resources. Send multiple messages (represented by arrow E). For example, the master processor can specify the range of physical addresses associated with the storage location of the added system memory 18 to the memory controller 17 of the client processing node 8. Furthermore, the master processor can specify the memory mapped address of the input / output device 32 in the client processing node 8. In some embodiments, the master processor can also specify the processor ID of each processor 10 of the client processing node 8.
[0025]
In the preferred embodiment, all of the processors 10 in each data processing subsystem share a single physical memory space, where each physical address is one single location in the system memory 18. Means only relevant. Thus, the overall contents of the data processing subsystem's system memory, which is generally accessible to all processors 10 in the data processing subsystem, are between the system memory 18 in the processing nodes 8 comprising the data processing subsystem. Can be viewed as being separated by For example, in an example embodiment where each processing node 8 includes 1 GB of system memory 18 and the data processing system 6 is configured as two NUMA data processing subsystems, each NUMA data processing subsystem has 2 It has a gigabyte (GB) physical address space.
[0026]
As shown in blocks 152 and 154 of FIG. 5, the client processing node 8 waits for a remapping request from the master processing node 8 and is represented by arrow F in response to receiving the remapping request. Respond with an acknowledgment of the remapping request. As shown in blocks 122 and 124, the master processing node 8 waits for this remapping request acknowledgment and responds to receipt of the remapping request acknowledgment by processing other clients indicated by the partition mask. Blocks 120 and 122 are repeated for each of the nodes 8.
[0027]
After block 124 of FIG. 4 and block 154 of FIG. 5, each of the master processing node 8 and the client processing node 8 is determined by the master processing node 8 as shown in blocks 126 and 156, respectively. Remap each local resource according to As shown in block 158 of FIG. 5, each of the client processing nodes 8 then processes the processor 10 until the operating system (OS) of the data processing subsystem schedules the processor 10 for work. Stop. In contrast, as shown in block 128 of FIG. 4, the master processing node 8 boots an operating system for its data processing subsystem from one of the storage devices 34, for example. As described above, when a plurality of data processing subsystems are formed from the processing nodes 8 of the data processing system 6, the plurality of data processing subsystems are Windows NT and SCO (Santa Cruz Operation) UNIX. It is possible to run different operating systems. Thereafter, processing by the master processing node 8 continues at block 130.
[0028]
As explained above, the present invention provides a method for configuring a set of interconnected processing nodes into either a single NUMA data processing system or a selected number of independently operable data processing subsystems. I will provide a. In accordance with the present invention, partitioning of processing nodes into multiple data processing subsystems is accomplished without connecting or disconnecting any of the processing nodes.
[0029]
While the invention has been particularly shown and described with respect to preferred embodiments, those skilled in the art will recognize that various changes in form and detail may be made without departing from the spirit and scope of the invention. Let's go. For example, although aspects of the invention have been described in terms of a computer system executing software that directs the methods of the invention, the invention is instead provided as a computer program product for use with a computer system. It should be understood that it can be implemented. A program defining functions of the present invention includes a non-rewritable storage medium (for example, a CD-ROM), a rewritable storage medium (for example, a floppy diskette or a hard disk device), and communication media such as a computer network and a telephone network. Can be distributed to computer systems via various signal bearing media including, without limitation. Thus, it should be understood that such signal bearing media represent an alternative embodiment of the present invention when carrying or encoding computer readable instructions that direct the method functions of the present invention.
[0030]
In summary, the following matters are disclosed regarding the configuration of the present invention.
[0031]
(1) system interconnection;
The plurality of processing nodes coupled to the system interconnect, each of the plurality of processing nodes including at least one processor and a data storage device;
A configuration utility resident in at least one system memory of the plurality of processing nodes, the configuration utility connecting the plurality of processing nodes to a single NUMA via communication over the system interconnect. A non-uniform memory access and a configuration utility that selectively configures one of a plurality of independent data processing systems;
Including data processing system.
(2) The at least one of the plurality of independent data processing systems is a NUMA (non-uniform memory access) system including at least two of the plurality of processing nodes. Data processing system.
(3) The data processing system according to (1), wherein the plurality of independent data processing systems includes a disjoint subset of the plurality of processing nodes.
(4) The data processing system includes boot code stored in at least one data storage device of the plurality of processing nodes, and the configuration utility forms part of the boot code. A data processing system according to 1).
(5) In the above (1), the communication includes a request for configuration information sent from a master processing node of the plurality of processing nodes to at least one other processing node of the plurality of processing nodes. The data processing system described.
(6) The communication includes a response message sent from the at least one other processing node of the plurality of processing nodes to the master processing node, and the response message includes requested configuration information. The data processing system according to (5).
(7) coupling each of the plurality of processing nodes with a system interconnect, each of the plurality of processing nodes including at least one processor and a data storage device;
Sending at least one configuration message over the system interconnect;
Using the at least one configuration message, the plurality of processing nodes coupled to the system interconnect are designated as one of a single non-uniform memory access (NUMA) system and a plurality of independent data processing systems. Steps to configure
Configuring a plurality of interconnected processing nodes into one or more data processing systems.
(8) The step of configuring the plurality of processing nodes into a plurality of independent data processing systems includes at least one NUMA (non-uniform) including at least two of the plurality of processing nodes. The method according to (7) above, comprising the step of configuring in a memory access) subsystem.
(9) The step of configuring the plurality of processing nodes into a plurality of independent data processing systems configures the plurality of processing nodes into a plurality of independent data processing systems including a disjoint subset of the plurality of processing nodes. The method according to (7) above, comprising a step.
(10) storing a configuration utility that forms part of a boot code in at least one data storage device of the plurality of processing nodes;
Executing the configuration utility to configure the plurality of processing nodes;
The method according to (7), further comprising:
(11) sending at least one configuration message from the master processing node of the plurality of processing nodes to sending at least one other processing node of the plurality of processing nodes a request for configuration information; The method according to (7) above.
(12) sending at least one configuration message further comprises sending a response message from the at least one other processing node of the plurality of processing nodes to the master processing node, the response message comprising: The method according to (11) above, including the requested configuration information.
(13) A program product for configuring a data processing system including a system interconnection coupled with a plurality of processing nodes, each of the plurality of processing nodes including at least one processor and a data storage device. The program product is
A data processing system usable medium; and
A configuration utility encoded in the data processing system usable medium, wherein the configuration utility connects the plurality of processing nodes to a single NUMA (non-uniform) via communication over the system interconnect. a configuration utility for selectively configuring one of a memory access system and a plurality of independent data processing systems;
Including program products.
(14) The at least one of the plurality of independent data processing systems is a NUMA (non-uniform memory access) system including at least two of the plurality of processing nodes. Program product.
(15) The program product according to (13), wherein the plurality of independent data processing systems includes a disjoint subset of the plurality of processing nodes.
(16) The program product according to (13), wherein the configuration utility forms a boot code portion.
(17) In the above (13), the communication includes a request for configuration information sent from a master processing node of the plurality of processing nodes to at least one other processing node of the plurality of processing nodes. The listed program product.
(18) The communication includes a response message transmitted from the at least one other processing node of the plurality of processing nodes to the master processing node, and the response message includes requested configuration information. The program product according to (17).
[Brief description of the drawings]
FIG. 1 illustrates an embodiment of a multiple node data processing system in which the present invention can be advantageously used.
FIG. 2 is a more detailed block diagram of a processing node in the data processing system shown in FIG.
FIG. 3 is a high level logic flow diagram illustrating a method for selectively partitioning the data processing system of FIG. 1 into one or more data processing subsystems.
FIG. 4 is a high level logic flow diagram of a method by which a master processing node configures a data processing subsystem according to an embodiment of the present invention.
FIG. 5 is a high level logic flow diagram of a method for configuring a client processing node according to an embodiment of the present invention.
[Explanation of symbols]
6 Data processing system
8 processing nodes
8a processing node
8b Processing node
8c processing node
8d processing node
10a processor
10b processor
10c processor
10d processor
10e processor
10f processor
10g processor
10h processor
10i processor
10j processor
10k processor
10l processor
10m processor
12 processor cores
14 Cache hierarchy
16 Local interconnect
17 Memory controller
18 System memory
20 node controller
22 node interconnection
24 Arbiter
26 Mezzanine Bus Bridge
30 Mezzanine Bath
32 I / O devices
34 Storage device
36 Interconnect Coherence Unit (ICU)
38 Host Bridge
40 Memory bus
42 Nonvolatile Random Access Memory (NVRAM)
44 Service Processor Bus
46 Service Processor Dynamic Random Access Memory (SP DRAM)
48 Flash memory
50 Service processor

Claims (12)

  1. A data processing system ,
    System interconnection,
    Each, at least one processor and including a data storage device, the processing node of said number multiple, which is coupled to the system interconnect,
    A configuration utility resident in at least one system memory of the plurality of processing nodes, the plurality of processing nodes via communication via the system interconnection based on information regarding a partition of the data processing system which contained said configuration utility for selectively configured in one of a single NUMA (non-uniform memory access) and a plurality of independent data processing system,
    Said communication, said sent from the master processing node of the plurality of processing nodes to other processing nodes belonging to the same classification as the master processing node, seen including a request for a list of resources that the other processing node has, The data processing system, wherein the master processing node determines an overall configuration of the partition based on the obtained resource list and calculates a method for remapping the resource list .
  2. The data processing system according to claim 1, wherein at least one of the plurality of independent data processing systems is a non-uniform memory access (NUMA) system including at least two of the plurality of processing nodes.
  3. The data processing system of claim 1, wherein the plurality of independent data processing systems includes a disjoint subset of the plurality of processing nodes.
  4. The data processing system includes boot code stored in at least one data storage device of the plurality of processing nodes, and the configuration utility forms part of the boot code. Data processing system.
  5. A plurality of processing nodes coupled to the system interconnect is a method of making one or more data processing systems, each of the plurality of processing nodes, see contains at least one processor and a data storage device, The method
    Sending at least one configuration message over the system interconnect;
    Based on said data processing system section information about the at least one configuration using the message, pre-Symbol a plurality of processing nodes, a single NUMA (non-uniform memory access) system and a plurality of independent data processing system look including a step of configuration to one of the,
    Sending the at least one configuration message, wherein the plurality of other processing nodes from the master processing nodes belonging to the master processing node same section and of the processing node, a list of resources to which the other processing node has viewing including the step of sending the request,
    The method comprising the step of the step of the master processing node determining an overall configuration of the partition based on the obtained list of resources and calculating a method for remapping the list of resources .
  6. Wherein the step of configuring is said plurality of processing nodes, comprising at least one NUMA (non-uniform memory access) steps constituting a subsystem comprising at least two of said plurality of processing nodes, claim 5 The method described in 1.
  7. Configuring the plurality of processing nodes into a plurality of independent data processing systems includes configuring the plurality of processing nodes into a plurality of independent data processing systems including a disjoint subset of the plurality of processing nodes. The method according to claim 5 .
  8. Storing a configuration utility that forms part of a boot code in at least one data storage device of the plurality of processing nodes;
    6. The method of claim 5 , further comprising executing the configuration utility to configure the plurality of processing nodes.
  9. A computer readable recording medium recording a program for configuring a data processing system including a system interconnection coupled with a plurality of processing nodes,
    Wherein each of the plurality of processing nodes, and at least one processor and a data storage device, said computer,
    Based on information related to the classification of the data processing system, a list of resources of the other processing node is transferred from the master processing node of the plurality of processing nodes to another processing node belonging to the same classification as the master processing node. Sending at least one configuration message that is a request over the system interconnect ;
    It said plurality of processing nodes, comprising the steps of: configuring the one of the single NUMA (non-uniform memory access) system and a plurality of independent data processing system, the master processing node, resulting A computer-readable recording medium having recorded thereon a program for determining a whole configuration of the partition based on a list of resources and calculating a method of remapping the list of resources .
  10. The computer-readable medium of claim 9 , wherein at least one of the plurality of independent data processing systems is a NUMA (non-uniform memory access) system including at least two of the plurality of processing nodes. Recording medium .
  11. The computer-readable recording medium of claim 9 , wherein the plurality of independent data processing systems includes a disjoint subset of the plurality of processing nodes.
  12. The computer-readable recording medium of claim 9 , wherein the configuration utility forms part of a boot code.
JP2000180619A 1999-06-17 2000-06-15 Interconnected processing nodes configurable as at least one NUMA (NON-UNIFORMMOMERYACCESS) data processing system Expired - Fee Related JP3628595B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US09/335,301 US6421775B1 (en) 1999-06-17 1999-06-17 Interconnected processing nodes configurable as at least one non-uniform memory access (NUMA) data processing system
US09/335301 1999-06-17

Publications (2)

Publication Number Publication Date
JP2001051959A JP2001051959A (en) 2001-02-23
JP3628595B2 true JP3628595B2 (en) 2005-03-16

Family

ID=23311187

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2000180619A Expired - Fee Related JP3628595B2 (en) 1999-06-17 2000-06-15 Interconnected processing nodes configurable as at least one NUMA (NON-UNIFORMMOMERYACCESS) data processing system

Country Status (4)

Country Link
US (1) US6421775B1 (en)
JP (1) JP3628595B2 (en)
SG (1) SG91873A1 (en)
TW (1) TW457437B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6714994B1 (en) * 1998-12-23 2004-03-30 Advanced Micro Devices, Inc. Host bridge translating non-coherent packets from non-coherent link to coherent packets on conherent link and vice versa
US6519649B1 (en) * 1999-11-09 2003-02-11 International Business Machines Corporation Multi-node data processing system and communication protocol having a partial combined response
US6591307B1 (en) 1999-11-09 2003-07-08 International Business Machines Corporation Multi-node data processing system and method of queue management in which a queued operation is speculatively cancelled in response to a partial combined response
US6519665B1 (en) 1999-11-09 2003-02-11 International Business Machines Corporation Multi-node data processing system and communication protocol in which a stomp signal is propagated to cancel a prior request
US6848003B1 (en) 1999-11-09 2005-01-25 International Business Machines Corporation Multi-node data processing system and communication protocol that route write data utilizing a destination ID obtained from a combined response
US6671712B1 (en) 1999-11-09 2003-12-30 International Business Machines Corporation Multi-node data processing system having a non-hierarchical interconnect architecture
US6865695B2 (en) * 2001-07-26 2005-03-08 International Business Machines Corpoation Robust system bus recovery
JP2003173325A (en) * 2001-12-06 2003-06-20 Hitachi Ltd Initialization method and power supply cutting method for computer system
US6973544B2 (en) * 2002-01-09 2005-12-06 International Business Machines Corporation Method and apparatus of using global snooping to provide cache coherence to distributed computer nodes in a single coherent system
US6807586B2 (en) * 2002-01-09 2004-10-19 International Business Machines Corporation Increased computer peripheral throughput by using data available withholding
US7171568B2 (en) * 2003-06-13 2007-01-30 International Business Machines Corporation Remote power control in a multi-node, partitioned data processing system
US7194660B2 (en) * 2003-06-23 2007-03-20 Newisys, Inc. Multi-processing in a BIOS environment
US7007128B2 (en) * 2004-01-07 2006-02-28 International Business Machines Corporation Multiprocessor data processing system having a data routing mechanism regulated through control communication
US7308558B2 (en) * 2004-01-07 2007-12-11 International Business Machines Corporation Multiprocessor data processing system having scalable data interconnect and data routing mechanism
US7484122B2 (en) * 2004-06-17 2009-01-27 International Business Machines Corporation Controlling timing of execution of test instruction by target computing device
JP4945949B2 (en) * 2005-08-03 2012-06-06 日本電気株式会社 Information processing device, CPU, information processing device activation method, and program
US7640426B2 (en) * 2006-03-31 2009-12-29 Intel Corporation Methods and apparatus to manage hardware resources for a partitioned platform
US7702893B1 (en) * 2006-09-22 2010-04-20 Altera Corporation Integrated circuits with configurable initialization data memory addresses
US7818508B2 (en) * 2007-04-27 2010-10-19 Hewlett-Packard Development Company, L.P. System and method for achieving enhanced memory access capabilities
US20080270708A1 (en) * 2007-04-30 2008-10-30 Craig Warner System and Method for Achieving Cache Coherency Within Multiprocessor Computer System
US7904676B2 (en) * 2007-04-30 2011-03-08 Hewlett-Packard Development Company, L.P. Method and system for achieving varying manners of memory access
KR101249831B1 (en) * 2007-08-06 2013-04-05 삼성전자주식회사 Computer system and method for booting the same
ITMI20071829A1 (en) * 2007-09-21 2009-03-22 Screenlogix S R L Arciettura machine consists of a software level and a hardware level iteragenti indipendemtemente between them from the configuration of oartenza of said machine and procedure for the realization of said machine architecture
US8782779B2 (en) * 2007-09-26 2014-07-15 Hewlett-Packard Development Company, L.P. System and method for achieving protected region within computer system
US8612973B2 (en) * 2007-09-26 2013-12-17 Hewlett-Packard Development Company, L.P. Method and system for handling interrupts within computer system during hardware resource migration
US9207990B2 (en) * 2007-09-28 2015-12-08 Hewlett-Packard Development Company, L.P. Method and system for migrating critical resources within computer systems

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4925311A (en) * 1986-02-10 1990-05-15 Teradata Corporation Dynamically partitionable parallel processors
US5561768A (en) * 1992-03-17 1996-10-01 Thinking Machines Corporation System and method for partitioning a massively parallel computer system
US5642506A (en) * 1994-12-14 1997-06-24 International Business Machines Corporation Method and apparatus for initializing a multiprocessor system
US5887146A (en) 1995-08-14 1999-03-23 Data General Corporation Symmetric multiprocessing computer with non-uniform memory access architecture
US5710907A (en) 1995-12-22 1998-01-20 Sun Microsystems, Inc. Hybrid NUMA COMA caching system and methods for selecting between the caching modes
US5887138A (en) 1996-07-01 1999-03-23 Sun Microsystems, Inc. Multiprocessing computer system employing local and global address spaces and COMA and NUMA access modes
US5938765A (en) * 1997-08-29 1999-08-17 Sequent Computer Systems, Inc. System and method for initializing a multinode multiprocessor computer system
EP0908825B1 (en) * 1997-10-10 2002-09-04 Bull S.A. A data-processing system with cc-NUMA (cache coherent, non-uniform memory access) architecture and remote access cache incorporated in local memory
JP3614650B2 (en) * 1998-03-20 2005-01-26 富士通株式会社 Multiprocessor control system and boot device and boot control device used therefor
US6247109B1 (en) * 1998-06-10 2001-06-12 Compaq Computer Corp. Dynamically assigning CPUs to different partitions each having an operation system instance in a shared memory space
US6275907B1 (en) * 1998-11-02 2001-08-14 International Business Machines Corporation Reservation management in a non-uniform memory access (NUMA) data processing system
US6108764A (en) * 1998-12-17 2000-08-22 International Business Machines Corporation Non-uniform memory access (NUMA) data processing system with multiple caches concurrently holding data in a recent state from which data can be sourced by shared intervention
US6148361A (en) * 1998-12-17 2000-11-14 International Business Machines Corporation Interrupt architecture for a non-uniform memory access (NUMA) data processing system

Also Published As

Publication number Publication date
TW457437B (en) 2001-10-01
SG91873A1 (en) 2002-10-15
US6421775B1 (en) 2002-07-16
JP2001051959A (en) 2001-02-23

Similar Documents

Publication Publication Date Title
JP6202756B2 (en) Assisted coherent shared memory
Stuecheli et al. CAPI: A coherent accelerator processor interface
US10514939B2 (en) Parallel hardware hypervisor for virtualizing application-specific supercomputers
RU2577470C2 (en) Interruption warning means
RU2569805C2 (en) Virtual non-uniform memory architecture for virtual machines
US8103835B2 (en) Low-cost cache coherency for accelerators
TWI470435B (en) Providing hardware support for shared virtual memory between local and remote physical memory
JP5179597B2 (en) Configuration virtual topology changes
US7730205B2 (en) OS agnostic resource sharing across multiple computing platforms
US6332180B1 (en) Method and apparatus for communication in a multi-processor computer system
US6075938A (en) Virtual machine monitors for scalable multiprocessors
US5864738A (en) Massively parallel processing system using two data paths: one connecting router circuit to the interconnect network and the other connecting router circuit to I/O controller
JP3836838B2 (en) Method and data processing system for microprocessor communication using processor interconnections in a multiprocessor system
JP4615643B2 (en) Method for identifying hazardous condition data in a data storage system
US6816947B1 (en) System and method for memory arbitration
US6826653B2 (en) Block data mover adapted to contain faults in a partitioned multiprocessor system
US6651143B2 (en) Cache management using a buffer for invalidation requests
US7403952B2 (en) Numa system resource descriptors including performance characteristics
CN102081556B (en) Instruction set architecture-based inter-sequencer communicating with a heterogeneous resource
US6308255B1 (en) Symmetrical multiprocessing bus and chipset used for coprocessor support allowing non-native code to run in a system
US8838907B2 (en) Notification protocol based endpoint caching of host memory
Li et al. NUMA-aware shared-memory collective communication for MPI
US5940870A (en) Address translation for shared-memory multiprocessor clustering
US8281303B2 (en) Dynamic ejection of virtual devices on ejection request from virtual device resource object within the virtual firmware to virtual resource driver executing in virtual machine
EP0917057B1 (en) Multiprocessor computer architecture with multiple operating system instances and software controlled resource allocation

Legal Events

Date Code Title Description
A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20040127

A601 Written request for extension of time

Free format text: JAPANESE INTERMEDIATE CODE: A601

Effective date: 20040413

A602 Written permission of extension of time

Free format text: JAPANESE INTERMEDIATE CODE: A602

Effective date: 20040416

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20040709

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20041130

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20041208

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20071217

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081217

Year of fee payment: 4

LAPS Cancellation because of no payment of annual fees