CN104506362A

CN104506362A - Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server

Info

Publication number: CN104506362A
Application number: CN201410831246.3A
Authority: CN
Inventors: 贡维; 宗艳艳
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2014-12-29
Filing date: 2014-12-29
Publication date: 2015-04-08

Abstract

The invention discloses a method for system state switching and monitoring on a CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server, and belongs to the field of server management. The method includes that a main node server adjusts and transmits signals to notify an internal BMC of operations needed to execute, and the BMC in an auxiliary node is notified by an internal management network interface, executes corresponding system state operations after receiving commands, judges current system states according to different level combinations of received signals in the auxiliary node and pulse signals and notifies a client side host for monitoring. The method has the advantages that synchronous start, power-off, hot restart and cold restart of the CC-NUMA multi-node server are realized, the problem of lack of uniform sequential control of the multi-node server in the prior art is solved, and the whole state change process can be monitored and recorded by the BMC.

Description

A kind of method that on CC-NUMA multi node server, system mode switches and monitors

Technical field

The present invention discloses a kind of method that system mode switches and monitors, and belongs to server admin field, a kind of method that specifically on CC-NUMA multi node server, system mode switches and monitors.

Background technology

High-end server is applied to the very high scene of OLTP Transaction Processing Capability Requirement usually, and as bank, scientific algorithm etc., because the data volume calculating at one time and store is very large, high-end server generally adopts the architecture Design of CC-NUMA.CC-NUMA, Cache Coherent-Non Uniform Memory Access Architecture, i.e. Cache consistency nonuniform memory access, is linked together multiple processor by special interconnect equipment and forms distributed and shared memory space, only run an operating system.Each processor can access oneself Memory memory, also other processors or shared memory can be accessed, between multiple processor, the general mode of backboard or optical fiber that adopts connects, so be exactly morphologically interconnected by multiple server node to form a subregion, run an operating system.In the multi-node server system of traditional C C-NUMA framework, its each node is all generally oneself independently control start, shutdown and reboot operation, operationally each server node is independent control, unified sequencing control and system monitoring is lacked between each node, control easily not cause system normally to start at that time, affect the operating efficiency of server.The invention provides a kind of method that on CC-NUMA multi node server, system mode switches and monitors, all initiated, followed execution from node server by host node server by all system mode handover operations, and whole flow process all can be monitored by BMC administrative unit, realize the method for synchronous start, shutdown, hot restart, cold restart, to lack the problem of unified sequencing control before solving multi node server, and whole state change process can by BMC monitoring record.

Summary of the invention

The present invention is directed in the multi-node server system of traditional C C-NUMA framework, its each node is all generally oneself independently control start, shutdown and reboot operation, operationally each server node is independent control, unified sequencing control and system monitoring is lacked between each node, control easily not cause system normally to start at that time, affect the problem of the operating efficiency of server, a kind of method that on CC-NUMA multi node server, system mode switches and monitors is provided, realize synchronous start, shutdown, hot restart, cold restart, to lack the problem of unified sequencing control before solving multi node server, and whole state change process can by BMC monitoring record.

The concrete scheme proposed is:

On a kind of CC-NUMA multi node server, system mode switches and supervisory control system, comprises host node server, from node server, and host node server and be all provided with BMC monitoring management unit in node server, client host;

Host node server: the distribution in the whole computer address space of host node network in charge, it runs BIOS and OS, is also the promoter that system mode switches;

Several are from node server: provide calculating expanded function from node server, are also that executor is followed in system mode switching;

Host node and from the BMC monitoring management unit in node server: BMC by the communication between " internal management network " responsible node server;

Client host: the BMC in host node server is also connected to client host by an external management network interface, the client host that operates in that system mode switches can be monitored.

A kind of method that on CC-NUMA multi node server, system mode switches and monitors, system mode on a kind of CC-NUMA multi node server described in claim 1 is utilized to switch and supervisory control system, host node server regulates and sends Power_Enable, Power_OK, System_Reset tri-kinds of signals notify that inner BMC needs the operation performed, by internal management network interface notification from the BMC in node, go to perform the operation of corresponding system mode after the BMC in node receives order, in addition the BMC in host node server is according to the Power_Enable received in node, the varying level combination of Power_OK and System_Reset and pulse signal judge the state that system is current, notice client host is to monitor.

Described system mode switching refers to start, shutdown, cold restart, hot restart.

Described system mode switches to start, and concrete steps are:

1. host node server sends high level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power on operation;

Power-on command is passed to BMC from node server by internal management network interface by the BMC 2. in host node server;

3. send high level " Power_Enable " to all from node server from the BMC of node server, notify to perform power-on operation from node server;

4. host node and performing after power-on operation from node server, feedback high level " Power_OK " signal gives respective BMC, indicates that this node has powered on;

5. after " Power_OK ", host node server sends high level " System_Reset " to BMC, and indication host node inner treater, internal memory, chipset reset complete, and wait for from node reset and completing;

6., after the BMC in host node server receives " System_Reset ", other BMC from node are notified by internal management network;

7., after receiving " System_Reset " from node server BMC, reset operation is carried out to the processor in this node, internal memory, chipset;

8., after Servers-all node reset completes, host node server starts to load BIOS and OS.

Described system mode switches to shutdown, and concrete steps are:

1. host node server sends low level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power-off operation;

3. send low level " Power_Enable " to all from node server from the BMC of node server, notify to perform power-off operation from node server;

4. host node and performing after power-on operation from node server, feedback low level " Power_OK " signal gives respective BMC, indicates that this node has shut down;

5. after " Power_OK ", host node server sends low level " System_Reset " to BMC, and indication host node inner treater, internal memory, chipset reset complete, and wait for from node reset and completing;

8., after Servers-all node reset completes, host node server has shut down.

During described start, the level combinations of Power_Enable, Power_OK and System_Reset is high level, high level, high level;

During described shutdown, the level combinations of Power_Enable, Power_OK and System_Reset is low level, low level, low level;

During described cold restart there is a low pulse in Power_Enable, Power_OK and System_Reset;

During described hot restart, Power_Enable, Power_OK are high level, and a low pulse appears in System_Reset.

If when " Power_Enable " and " Power_OK " is all high level, when a low pulse appears in " System_Reset ", system there occurs a hot restart operation; When a low pulse appears in " Power_Enable ", " Power_OK " and " System_Reset ", system there occurs a cold restart operation.

Usefulness of the present invention is: the present invention is initiated, followed execution from node server by host node server by all system mode handover operations, and whole flow process all can be monitored by BMC administrative unit, realize the method for synchronous start, shutdown, hot restart, cold restart, to lack the problem of unified sequencing control before solving multi node server, and whole state change process can by BMC monitoring record.

Accompanying drawing explanation

Fig. 1 is that CC-NUMA multi-node server system state switches block diagram.

Embodiment

The present invention will be further described by reference to the accompanying drawings.

First, to set up on a kind of CC-NUMA multi node server system mode and switch and supervisory control system, comprise host node server, from node server, host node server and be all provided with BMC monitoring management unit in node server, client host;

Utilize above system, realize a kind of method that on CC-NUMA multi node server, system mode switches and monitors, process is that host node server regulates transmission Power_Enable, Power_OK, System_Reset tri-kinds of signals notify that inner BMC needs the operation performed, by internal management network interface notification from the BMC in node, go to perform the operation of corresponding system mode after the BMC in node receives order, in addition the BMC in host node server is according to the Power_Enable received in node, the varying level combination of Power_OK and System_Reset and pulse signal judge the state that system is current, notice client host is to monitor.

Switch startup and shutdown for system mode, illustrate:

System mode switches to start:

1. 1 host node server sends high level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power on operation;

Power-on command passes to from node 0, from node 1, until from the BMC node N by internal management network interface by the BMC 2. in host node server;

System mode switches to shutdown:

8., after Servers-all node reset completes, host node server has shut down.

System status monitoring process is as follows:

The level combinations of each signal and pulse signal under system mode.BMC in host node server judges according to the varying level combination of " Power_Enable ", " Power_OK " and " System_Reset " in its node the state that system is current.

Open state: when " Power_Enable ", " Power_OK " and " System_Reset " are high level, system is in open state;

Off-mode: when " Power_Enable ", " Power_OK " and " System_Reset " are low level, system is in off-mode;

Hot restart: when " Power_Enable " and " Power_OK " is all high level, when a low pulse appears in " System_Reset ", system there occurs a hot restart operation;

Cold restart: when a low pulse appears in " Power_Enable ", " Power_OK " and " System_Reset ", system there occurs a cold restart operation;

Be all abnormality in addition.

Claims

1. on CC-NUMA multi node server, system mode switches and a supervisory control system, it is characterized in that comprising host node server, from node server, and host node server and be all provided with BMC monitoring management unit in node server, client host;

2. the method that on a CC-NUMA multi node server, system mode switches and monitors, system mode on a kind of CC-NUMA multi node server described in claim 1 is utilized to switch and supervisory control system, it is characterized in that host node server regulates and send Power_Enable, Power_OK, System_Reset tri-kinds of signals notify that inner BMC needs the operation performed, by internal management network interface notification from the BMC in node, go to perform the operation of corresponding system mode after the BMC in node receives order, in addition the BMC in host node server is according to the Power_Enable received in node, the varying level combination of Power_OK and System_Reset and pulse signal judge the state that system is current, notice client host is to monitor.

3. the method that on a kind of CC-NUMA multi node server according to claim 2, system mode switches and monitors, is characterized in that described system mode switches and refers to start, shutdown, cold restart, hot restart.

4. the method that on a kind of CC-NUMA multi node server according to claim 3, system mode switches and monitors, it is characterized in that described system mode switches to start, concrete steps are:

5. the method that on a kind of CC-NUMA multi node server according to claim 3, system mode switches and monitors, it is characterized in that described system mode switches to shutdown, concrete steps are:

8., after Servers-all node reset completes, host node server has shut down.

6. the method that on a kind of CC-NUMA multi node server according to claim 3, system mode switches and monitors, it is characterized in that the level combinations of Power_Enable, Power_OK and System_Reset during described start is high level, high level, high level;