WO2016122534A1

WO2016122534A1 - Multiple computers on a reconfigurable circuit board

Info

Publication number: WO2016122534A1
Application number: PCT/US2015/013545
Authority: WO
Inventors: Rachid M. Kadri; Vlad CATOI; James Cole VAGTBORG
Original assignee: Hewlett Packard Enterprise Development Lp
Priority date: 2015-01-29
Filing date: 2015-01-29
Publication date: 2016-08-04

Abstract

Disclosed is a computer system with a plurality of processing units that be configured as different computing nodes. The computer system includes a management controller to enable out-of-band management of the computer system. The computer system also include two Southbridges communicatively coupled to the management controller. The first Southbridge is configured to initialize a first computing node, and the second Southbridge configured to initialize a second computing node comprising a different one or more of the processing units.

Description

MULTIPLE COMPUTERS ON A RECONFIGURABLE CIRCUIT BOARD

BACKGROUND

[0001] Multiprocessing refers to computing techniques wherein multiple central processing units are included within a single computer. The processing units can be separate dies coupled to a same mother board. The processing units can share resources, such as main memory and peripheral Input/Output (I/O) devices, and can simultaneously process software programs by sharing main memory and processing tasks.

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] Certain examples are described in the following detailed description and in reference to the drawings, in which:

[0003] Fig. 1 is a block diagram of an example computer system that is configurable to operate as two distinct computer sub-systems;

[0004] Fig. 2 is a block diagram of another example of a computer system that is configurable to operate as two distinct computer sub-systems;

[0005] Fig. 3 is a process flow diagram summarizing a method of operating a computer system that is configurable to operate as two distinct computer subsystems; and

[0006] Fig. 4 is a block diagram showing a simplified example of a computer system in accordance with the techniques described herein.

DETAILED DESCRIPTION OF SPECIFIC EXAMPLES

[0007] The techniques disclosed herein describe a computer system with multiple processing units coupled to a same mother board and configured to operate as two distinct computer subsystems. Each distinct computer sub-system is separately booted, runs its own separate operating system, and executes software programs independently from the other sub-system. However, the two subsystems, being coupled to the same motherboard, are able to share resources, which reduces the use of redundant components and results and overall lower cost. [0008] In some examples, the computer system is configurable to operate in one of two selectable modes. In the first mode, the computer system is configured as a single multiprocessor computer. In the second mode, the computer system is configured to operate as two separate computer subsystems coupled to a same motherboard. In this way, a single computer system can be manufactured and used to satisfy different product offerings with only minimal changes to the system's components. Furthermore, the computer system can be easily repurposed by the user, such as a system administrator.

[0009] Fig. 1 is a block diagram of an example computer system that is configurable to operate as two distinct computer sub-systems. The computer system is generally referred to by the reference number 100. The functional blocks and devices shown in Fig. 1 can include hardware elements such as circuitry or a combination of both hardware and software elements including processors and computer code stored on a non-transitory, computer-readable medium. Additionally, the functional blocks and devices of the computer system 100 are only one example of functional blocks and devices that may be implemented in examples of the present techniques.

[0010] In some examples, the computer system 100 is a general-purpose computing device, for example, a desktop computer, laptop computer, business server, a blade server, and the like. The computer system 100 includes a number of processing units 102. As used herein, the term "processing unit" refers to a general purpose processor that encompasses a complete semiconductor die. A single processing unit 102 may include a plurality of processor cores. In the example shown in Fig. 1 , the computer system 100 includes four processing units 102.

However, the computer system 100 can have a different number of processing units 102, including two, six, or eight processing units, or more.

[0011] The computer system 100 can also have one or more types of tangible, non-transitory, computer-readable media, such as a system memory 104 that is used as a working memory. As used herein, the term working memory refers to the memory used by a processor during the execution of programming instructions. The system memory 104 can include Random Access Memory (RAM), including volatile memory such as Static Random-Access Memory (SRAM) and Dynamic Random- Access Memory (DRAM), non-volatile memory such as Resistive random-access memory (RRAM), and any other suitable memory types or combination thereof. Each processing unit 102 may have an integrated memory controller (not shown) for controlling the system memory 104. However, other configurations are also possible. For example, the processing units 102 may be coupled to the system memory 104 through an external memory controller.

[0012] Each processing unit 102 may also include an integrated I/O bus 106 that enables the processing unit 102 to communicate with one or more I/O devices 108. The I/O busses 106 and I/O devices 108 can communicate using any suitable expansion bus communication standard, including PCI Express® and Ethernet among others. The I/O devices 108 may include storage controllers, network interface controllers (N ICs), Infiniband controllers, and the like.

[0013] At least one of the I/O devices may be a storage controller (not shown) that enables the processing unit 102 to access a storage device 1 10, such as a disk drive, solid state drive, array of disk drives, array of solid state drives, or a network attached storage appliance, among others. The storage device 1 10 can also include other tangible, non-transitory, computer-readable storage media for the long-term storage of operating programs and data, including the operating programs and data such as user files.

[0014] The computer system 100 can also include one or more processor interconnect links 1 12, which enable communication between the processing units 102 and sharing of system memory 104 and other platform resources. When operating together as part of the same computer system or subsystem, the processor interconnect links 1 12 may be used to share computing tasks during the execution of a software program. The processor interconnect links 1 12 may be Quick Path Interconnect (QPI) connections, for example.

[0015] The computer system 100 also includes a management controller 1 14, which enables a system administrator to remotely monitor and control the computer system 100. The management controller 1 14 is an out-of-band management controller that can provide management capabilities regardless of whether the computer system 100 is powered on or has an installed or functional operating system. The management controller 1 14 is a stand-alone function that remains powered when the rest of the system 100 is turned off. Out-of-band management controllers are sometimes referred to as a baseboard management controllers (BMC) or Integrated Lights-Out (il_0) management controllers. The system administrator can access the management controller 1 14 remotely through a dedicated, out-of-band management channel that does not rely on other components of the computer system 100 to operate.

[0016] The management controller 1 14 operates outside of the host operating system and can include its own separate processing resources and memory, and can also include a separate back-up power supply. The management controller 1 14 can receive management through a communication channel (not shown) that bypasses the computing device's main processing units 102. For example, out-of- band management enables an administrator to remotely turn on a computer, update a computer's Basic Input/Output System (BIOS), and monitor computer resources even while a computer's main processing units are powered down.

[0017] The computer system 100 also includes at least one Southbridge, referred to herein as first Southbridge 1 16. The computer system 100 can also include an optional Southbridge, referred to herein as second Southbridge 1 18. Each

Southbridge 1 16 and 1 18 is coupled to the management controller 1 14 and serves various functions of the computer system 100, such as power management, system clocking, Direct Memory Access (DMA) to the system memory 104, Basic

Input/Output System (BIOS), system initialization (Booting), and support for coupling peripheral devices. Each Southbridge 1 16 and 1 18 is also coupled to at least one of the processing units 102 through a communications link such as Direct Media Interface (DMI) link. Each Southbridge may also be used to support additional I/O devices 120, which may include input devices, such as a mouse, touch screen, keyboard, display, and output devices such as display monitors.

[0018] The management controller 1 14 is coupled to each Southbridge by any suitable communication interface, such as PCIe, USB, and others. The

management controller 1 14 can provide management, diagnostic, and configuration services for the computer system 100, such as system initialization, health monitoring, remote administrative control of the computer system 100, and the like. The management controller 1 14 also receives system health and status information from system resources. For example, the management controller 1 14 can monitor temperature data, power usage, and system errors. The management controller 1 14 can log system health parameters. The management controller 1 14 also includes an interface, such as an Ethernet interface, for communicating over a network out-of- band, enabling a system administrator to communicate with the management controller 1 14 to receive system information and initiate management operations, such as firmware updates, system power-up, system re-boot, and others.

[0019] The computer system 100 can also include a BIOS ROM 122, which defines the software interface between the operating system and other platform firmware for each computer subsystem. The management controller 1 14 controls access to the BIOS ROM 122, which is executed when a particular computer subsystem 100 is powered on. The BIOS ROM 122 includes the programing code to perform system discovery and to set up system resources during bootup. The BIOS ROM 122 can include Extensible Firmware Interface (EFI) firmware, Unified

Extensible Firmware Interface (UEFI) firmware or other Basic Input/Output System (BIOS) firmware, for example.

[0020] The computer system 100 can be configured to operate as a single homogeneous system or as a plurality of separate subsystems that operate independently from one another. As used herein, the term "computing node" refers to a group of processing units that operate together to execute a same instance of an operating system. If the computer system 100 is configured as single

homogeneous system, then the computer system 100 is configured as a single computing node. If the computer system 100 is configured as separate subsystems, each subsystem is a different computing node.

[0021] In the example of Fig. 1 , the computer system 100 can be configured to operate as a single computing node or as two separate computing nodes. The first computing node includes processor A, processor B, and the first Southbridge 1 16 and the second sub system includes processor C, processor D, and the second Southbridge 1 18. When configured as two computing nodes, the first computing node runs a first instance of the operating system and the second computing node runs a second instance of the operating system. Furthermore, there is no sharing of system memory 104 between the two computing nodes. In some examples, the processor interconnect links 1 12 between processor A and processor C and between processor B and processor D can be reconfigured to provide

communications between the operating systems running on the two computing nodes. The reconfigured processor interconnect links 1 12 between processor A and processor C may enable the two computing nodes to regularly check the redundancy of the two systems. In some examples, the processor interconnect links 1 12 between processor A and processor C and between processor B and processor D are disabled and inoperative.

[0022] When configured as two separate computing nodes, the management controller 1 14 manages the computing nodes as two separate computers that are able to operate independently from one another. For example, one computing node can be shut down or re-booted without interrupting operation of the other computing node. However, although the two computing nodes operate separately, they are disposed on the same motherboard are able to share resources. For example, both computing nodes use the same management controller 1 14 and the same BIOS ROM 122. The two computing nodes can also share I/O devices. Communication between the two computing nodes takes place through a high-speed processor interconnect link 1 12 that resides on the motherboard and takes the place of a switch which would normally be used to accomplish communication between two different computer nodes.

[0023] The computer system 100 can also be configured as a single computing node, which runs a single instance of an operating system. The processing units 102 can also cooperate to execute instructions of a single software application, such as a database application. In this configuration, all four of the processor interconnect links 1 12 are operable, and all of the processing units 102 can share system memory 104 and processing tasks via the processor interconnect links 1 12. If the computer system 100 is configured to operate as a single computer, the optional second Southbridge 1 18 may be omitted or may be present but inoperative.

[0024] Various techniques may be used to configure the operating mode of the computer system. The term "operating mode" refers to the particular grouping of processing units 102 into computing nodes. In the example of Fig. 1 , there are two operating modes. In the first operating mode, the computer system 100 is configured as a single computing node with four processing units 102. In the second operating mode, the computer system 100 is configured as two computing nodes, each with two processing units 102. The operating mode may be determined during manufacture or later by the user. Additionally, in some examples, the operating mode may be changed by a system administrator. In some examples, the operating mode may be determined based on whether the optional second Southbridge 1 18 is present. At boot-up, the management controller 1 14 can attempt to establish communication between both Southbridges 1 16 and 1 18. If only one Southbridge is detected, the management controller 1 14 configures the computer system as a single four-processor computer. If both Southbridges are detected, the management controller 1 14 configures the computer system 100 as a two separate two-processor computing nodes.

[0025] In some examples, the computer system 100 includes a configuration control component 124 that determines the configuration of the computer system 100. The configuration control component 124 can be a memory device, such as a Non-Volatile RAM (NVRAM), a Write-Once Non-Volatile memory (WO NVM), Electrically Erasable Programmable Read-Only Memory (EEPROM), among others. The configuration control component 124 can be used to store configuration information about the computer system 100, including whether the computer system 100 is to be configured as a single four-processor computing node or two two- processor computing nodes.

[0026] In applications that use a large pool of memory, such as in virtualization and database applications, it may be beneficial to configure the computer system 100 as a single four-processor computing node. In other applications, it may be beneficial to configure the computer system 100 as two two-processor computing nodes. In some examples, each of the two-processor computing nodes may be configured in a clustered configuration. In the clustered configuration, each computing node may be designated as a backup node for the other computing node in the event that one of the computing nodes fails.

[0027] As mentioned above, when configured as two two-processor computing nodes, communication between the two computing nodes can be accomplished through the processor interconnect links 1 12 between processor A and C and processor B and D. However, other configurations are also possible. Fig. 2 shows an additional example technique for enabling communications between the two separate computing nodes of the computer system 100 when configured as two separate two-processor computing nodes.

[0028] Fig. 2 is a block diagram of another example of a computer system that is configurable to operate as two distinct computing nodes. The example computer system 100 shown in Fig. 2 includes the same components shown in Fig. 1 , including the four processing units 102 and the four processor interconnect links 1 12. In the example of Fig. 2, the computer system 100 is configured as two two- processor computing nodes. Processors A and B form one computing node and processors C and D form a separate computing node. The computing node of processor A and B also includes the first Southbridge 1 16, while the second computing node includes the second Southbridge 1 18. The first Southbridge 1 16 and second Southbridge 1 18 operate independently and are both controlled by the management controller 1 14. Each computing node may be running its own separate instance of an operating system.

[0029] The processor interconnect links 1 12 between processor A and processor B are operable to enable communication between one another and to facilitate task and resource sharing, as are the processor interconnect links 1 12 between processor C and processor D. However, the processor interconnect links 1 12 between processor A and processor C are disabled and the processor interconnect links 1 12 between processor B and D are disabled. To enable communication between the two different computing nodes, a direct peer-to-peer link 200 is implemented between the I/O busses of processor A and processor B. The direct peer-to-peer link 200 may be a PCIe protocol link disposed on the computer system's motherboard. The direct peer-to-peer link 200 enables communication between the operating systems of the two computing nodes.

[0030] It is to be understood that the block diagrams of Figs. 1 and 2 are not intended to indicate that the computer system 100 is to include all of the components shown in Fig. 1 . Rather, the computer system 100 can include fewer or additional components not illustrated in Figs. 1 such as more of fewer processors, additional Southbridges, etc. For example, the computer system 100 can include eight processors, which may be configured as a single computing node or partitioned into two four-processor computing nodes or four two-processor computing nodes.

Indeed, the present techniques can be applied to any suitable number of processors, which may be configured as a single computing node or divided into any suitable number of separate computing nodes.

[0031] Fig. 3 is a process flow diagram of a method of starting up a computer system that is configurable to operate as two distinct computing nodes. The method 300 can be performed by one or more components of the computer system 100 shown in Figs. 1 and 2, such as the management controller 1 14.

[0032] At block 302, the management controller detects the application of power to the computer system. The management controller can be configured to automatically begin the start-up process upon the detection of electrical power without waiting for instructions to power up. When the power is determined to be sufficient, the management controller can load management controller firmware (also referred to as il_0 ROM) and load it into the management controller's working memory. The remaining operations may be performed in accordance with the programming included in the management controller firmware.

[0033] At block 304, a determination is made regarding the operating mode of the computer system, for example, whether the computer system is configured as a single computing node or a plurality of separate computing nodes. This

determination may be made by reading configuration information from a memory device or determining the number of enabled Southbridges that reside on the motherboard of the computer system, for example. The presence of one

Southbridge could be used to indicate that the computer system is configured to operate as single computing node, while the presence of multiple Southbridges could be used to indicate that the computer system is configured to operate as multiple independent computing nodes equal to the number of Southbridges.

[0034] At block 306, the management controller initiates a boot process for each of the independent computing nodes residing on the mother board. In some examples, the management controller waits for a signal from the user before initiating the boot process. In some examples, the boot process may be initiated automatically in response to the computer powering on. Each Southbridge may be configured to perform the boot process for its respective computing node. During the boot process, each Southbridge may perform a power-on self-test, load code from BIOS ROM, locate and initialize peripheral devices, and find, load, and start an operating system, among other processes.

[0035] With reference to the example computer system of Figs. 1 and 2, if the computer system 100 is configured as a single four-processor computer, then at block 306 only the first Southbridge 1 16 will perform a boot process. During the boot, the first Southbridge 1 16 configures the entire computer system 100 and loads a single instance of the operating system to execute on all four processing units 102.

[0036] If the computer system 100 is configured as a two two-processor computers, the first Southbridge 1 16 will perform a boot process for the computing node of processor A and processor B, and the second Southbridge 1 18 will perform a boot process for the computing node of processor C and processor D. The two boot processes execute independently and can run in parallel. Additionally, the first Southbridge 1 16 will load a first instance of an operating system for processors A and B, the second Southbridge 1 18 will load a second instance of an operating system for processors C and D. The two independent operating systems may be loaded from a common storage space or different storage spaces. Furthermore, each computing node may load and execute different operating system types or different versions of an operating system. It's also possible for only one of the computing nodes to become operable, while the other computing node remains powered down.

[0037] The process flow diagram of Fig. 3 is not intended to indicate that the operations of the method 300 are to be executed in any particular order, or that all of the operations of the method 300 are to be included in every case. Additionally, the method 300 can include any suitable number of additional operations.

[0038] Fig. 4 is a block diagram showing a simplified example of a computer system in accordance with the techniques described herein. The computer system is generally referred to by the reference number 400.

[0039] The computer system 400 includes a number of processing units 102. In the example shown in Fig. 4, the computer system 400 includes four processing units 102. However, the computer system 400 can have a different number of processing units 102, including two, six, or eight processing units, or more. Each of the processing units may reside on a same motherboard. Some of the processors are communicatively coupled by processor interconnect links 1 12, which may be Quick Path Interconnect (QPI) connections, for example. The computer system 100 also includes a management controller 1 14, which enables a system administrator to remotely monitor and control the computer system 400. The computer system 400 can also include at least one Southbridge, referred to herein as first Southbridge 1 16. The computer system 400 can also include an optional Southbridge, referred to herein as second Southbridge 1 18. Each Southbridge 1 16 and 1 18 is coupled to the management controller 1 14.

[0040] The computer system 400 can be configured to operate in one of several possible operating modes. For example, in a first mode, the processing units 102 can be configured to operate as a single computing node, and in a second mode, the processing units 102 are configured to operate as two independent computing nodes. Other operating modes are possible. For example, in a third mode, the processing units 102 may be configured to operate as four computing nodes.

[0041] In one example, the computer system 400 is configured to operate as two computing nodes, wherein the first Southbridge 1 16 is configured to initialize the first computing node, and the second Southbridge 1 18 is configured to initialize the second computing node. Each computing node includes one or more of the processing units 102. The processors may be divided evenly between the computing nodes, or different computing nodes can have different numbers of processors.

[0042] The present examples may be susceptible to various modifications and alternative forms and have been shown only for illustrative purposes. Furthermore, it is to be understood that the present techniques are not intended to be limited to the particular examples disclosed herein. Indeed, the scope of the appended claims is deemed to include all alternatives, modifications, and equivalents that are apparent to persons skilled in the art to which the disclosed subject matter pertains.

Claims

CLAIMS What is claimed is:

1 . A computer system, comprising:

a plurality of processing units;

a management controller to enable out-of-band management of the computer system;

a first Southbridge communicatively coupled to the management controller and configured to initialize a first computing node comprising one or more of the processing units; and

a second Southbridge communicatively coupled to the management controller and configured to initialize a second computing node comprising a different one or more of the processing units.

2. The computer system of claim 1 , wherein the plurality of processing units comprises four processing units, the first computing node comprises two of the plurality of processing units, and the second computing node comprises a different two of the plurality of processing units.

3. The computer system of claim 1 , wherein the plurality of processing units, the management controller, the first Southbridge, and the second Southbridge, are all mounted to a same motherboard.

4. The computer system of claim 1 , comprising a processor interconnect link that couples one of the processors of the first computing node and one of the processors of the second computing node, wherein the processor interconnect link enables communication between an operating system of the first computing node and an operating system of the second computing node.

5. The computer system of claim 1 , comprising a direct peer-to-peer link that couples Input/Output (I/O) busses of the first computing node and the second computing node and enables communication between an operating system of the first computing node and an operating system of the second computing node.

6. The computer system of claim 1 , comprising a single Basic

Input/Output System (BIOS) Read-Only Memory (ROM) that stores a BIOS for use in both the first computing node and the second computing node.

7. A computer system, comprising:

a plurality of processing units residing on a same motherboard and coupled by processor interconnect links, the plurality of processing units to be configured to operate in a first mode or a second mode, wherein:

in the first mode, the plurality of processing units are configured to operate as a single computing node; and

in the second mode, the plurality of processing units are configured to operate as two independent computing nodes.

8. The computer system of claim 7, comprising a first Southbridge to initialize a first computing node of the two independent computing nodes and a second Southbridge to initialize a second computing node of the two independent computing nodes.

9. The computer system of claim 8, comprising a management controller communicatively coupled to the first Southbridge for management of the first computing node and communicatively coupled to the second Southbridge for management of the second computing node.

10. The computer system of claim 8, comprising a direct peer-to-peer link that couples Input/Output (I/O) busses of the first computing node and the second computing node and enables communication between an operating system of the first computing node and a separate operating system of the second computing node.

1 1 . The computer system of claim 8, wherein communication between an operating system of the first computing node and a separate operating system of the second computing node is performed over a processor interconnect link.

12. A method of operating a computer system comprising a plurality of processing units coupled to a same motherboard, comprising:

detecting, by a management controller, whether a computer system is to operate in a first mode or a second mode;

if the computer system is to operate in a first mode, configuring the plurality of processing units to operate as a single computing node; and

if the computer system is configured to operate in the second mode, configuring the plurality of processing units to operate as two independent computing nodes.

13. The method of claim 12, wherein configuring the plurality of processing units to operate as two independent computing nodes comprises booting, by a first Southbridge, a first computing node and booting, by a second Southbridge, a second computing node.

14. The method of claim 13, wherein configuring the plurality of processing units to operate as two independent computing nodes comprises disabling a processor interconnect link that couples a first processor of the first computing node and a second processor of the second computing node.

15. The method of claim 12, wherein configuring the plurality of processing units to operate as two independent computing nodes comprises initiating a communications link between the first computing node and the second computing node, the communications link disposed on the motherboard.