CN104506362A - Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server - Google Patents
Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server Download PDFInfo
- Publication number
- CN104506362A CN104506362A CN201410831246.3A CN201410831246A CN104506362A CN 104506362 A CN104506362 A CN 104506362A CN 201410831246 A CN201410831246 A CN 201410831246A CN 104506362 A CN104506362 A CN 104506362A
- Authority
- CN
- China
- Prior art keywords
- node server
- bmc
- power
- node
- reset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Power Sources (AREA)
- Hardware Redundancy (AREA)
Abstract
The invention discloses a method for system state switching and monitoring on a CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server, and belongs to the field of server management. The method includes that a main node server adjusts and transmits signals to notify an internal BMC of operations needed to execute, and the BMC in an auxiliary node is notified by an internal management network interface, executes corresponding system state operations after receiving commands, judges current system states according to different level combinations of received signals in the auxiliary node and pulse signals and notifies a client side host for monitoring. The method has the advantages that synchronous start, power-off, hot restart and cold restart of the CC-NUMA multi-node server are realized, the problem of lack of uniform sequential control of the multi-node server in the prior art is solved, and the whole state change process can be monitored and recorded by the BMC.
Description
Technical field
The present invention discloses a kind of method that system mode switches and monitors, and belongs to server admin field, a kind of method that specifically on CC-NUMA multi node server, system mode switches and monitors.
Background technology
High-end server is applied to the very high scene of OLTP Transaction Processing Capability Requirement usually, and as bank, scientific algorithm etc., because the data volume calculating at one time and store is very large, high-end server generally adopts the architecture Design of CC-NUMA.CC-NUMA, Cache Coherent-Non Uniform Memory Access Architecture, i.e. Cache consistency nonuniform memory access, is linked together multiple processor by special interconnect equipment and forms distributed and shared memory space, only run an operating system.Each processor can access oneself Memory memory, also other processors or shared memory can be accessed, between multiple processor, the general mode of backboard or optical fiber that adopts connects, so be exactly morphologically interconnected by multiple server node to form a subregion, run an operating system.In the multi-node server system of traditional C C-NUMA framework, its each node is all generally oneself independently control start, shutdown and reboot operation, operationally each server node is independent control, unified sequencing control and system monitoring is lacked between each node, control easily not cause system normally to start at that time, affect the operating efficiency of server.The invention provides a kind of method that on CC-NUMA multi node server, system mode switches and monitors, all initiated, followed execution from node server by host node server by all system mode handover operations, and whole flow process all can be monitored by BMC administrative unit, realize the method for synchronous start, shutdown, hot restart, cold restart, to lack the problem of unified sequencing control before solving multi node server, and whole state change process can by BMC monitoring record.
Summary of the invention
The present invention is directed in the multi-node server system of traditional C C-NUMA framework, its each node is all generally oneself independently control start, shutdown and reboot operation, operationally each server node is independent control, unified sequencing control and system monitoring is lacked between each node, control easily not cause system normally to start at that time, affect the problem of the operating efficiency of server, a kind of method that on CC-NUMA multi node server, system mode switches and monitors is provided, realize synchronous start, shutdown, hot restart, cold restart, to lack the problem of unified sequencing control before solving multi node server, and whole state change process can by BMC monitoring record.
The concrete scheme proposed is:
On a kind of CC-NUMA multi node server, system mode switches and supervisory control system, comprises host node server, from node server, and host node server and be all provided with BMC monitoring management unit in node server, client host;
Host node server: the distribution in the whole computer address space of host node network in charge, it runs BIOS and OS, is also the promoter that system mode switches;
Several are from node server: provide calculating expanded function from node server, are also that executor is followed in system mode switching;
Host node and from the BMC monitoring management unit in node server: BMC by the communication between " internal management network " responsible node server;
Client host: the BMC in host node server is also connected to client host by an external management network interface, the client host that operates in that system mode switches can be monitored.
A kind of method that on CC-NUMA multi node server, system mode switches and monitors, system mode on a kind of CC-NUMA multi node server described in claim 1 is utilized to switch and supervisory control system, host node server regulates and sends Power_Enable, Power_OK, System_Reset tri-kinds of signals notify that inner BMC needs the operation performed, by internal management network interface notification from the BMC in node, go to perform the operation of corresponding system mode after the BMC in node receives order, in addition the BMC in host node server is according to the Power_Enable received in node, the varying level combination of Power_OK and System_Reset and pulse signal judge the state that system is current, notice client host is to monitor.
Described system mode switching refers to start, shutdown, cold restart, hot restart.
Described system mode switches to start, and concrete steps are:
1. host node server sends high level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power on operation;
Power-on command is passed to BMC from node server by internal management network interface by the BMC 2. in host node server;
3. send high level " Power_Enable " to all from node server from the BMC of node server, notify to perform power-on operation from node server;
4. host node and performing after power-on operation from node server, feedback high level " Power_OK " signal gives respective BMC, indicates that this node has powered on;
5. after " Power_OK ", host node server sends high level " System_Reset " to BMC, and indication host node inner treater, internal memory, chipset reset complete, and wait for from node reset and completing;
6., after the BMC in host node server receives " System_Reset ", other BMC from node are notified by internal management network;
7., after receiving " System_Reset " from node server BMC, reset operation is carried out to the processor in this node, internal memory, chipset;
8., after Servers-all node reset completes, host node server starts to load BIOS and OS.
Described system mode switches to shutdown, and concrete steps are:
1. host node server sends low level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power-off operation;
Power-on command is passed to BMC from node server by internal management network interface by the BMC 2. in host node server;
3. send low level " Power_Enable " to all from node server from the BMC of node server, notify to perform power-off operation from node server;
4. host node and performing after power-on operation from node server, feedback low level " Power_OK " signal gives respective BMC, indicates that this node has shut down;
5. after " Power_OK ", host node server sends low level " System_Reset " to BMC, and indication host node inner treater, internal memory, chipset reset complete, and wait for from node reset and completing;
6., after the BMC in host node server receives " System_Reset ", other BMC from node are notified by internal management network;
7., after receiving " System_Reset " from node server BMC, reset operation is carried out to the processor in this node, internal memory, chipset;
8., after Servers-all node reset completes, host node server has shut down.
During described start, the level combinations of Power_Enable, Power_OK and System_Reset is high level, high level, high level;
During described shutdown, the level combinations of Power_Enable, Power_OK and System_Reset is low level, low level, low level;
During described cold restart there is a low pulse in Power_Enable, Power_OK and System_Reset;
During described hot restart, Power_Enable, Power_OK are high level, and a low pulse appears in System_Reset.
If when " Power_Enable " and " Power_OK " is all high level, when a low pulse appears in " System_Reset ", system there occurs a hot restart operation; When a low pulse appears in " Power_Enable ", " Power_OK " and " System_Reset ", system there occurs a cold restart operation.
Usefulness of the present invention is: the present invention is initiated, followed execution from node server by host node server by all system mode handover operations, and whole flow process all can be monitored by BMC administrative unit, realize the method for synchronous start, shutdown, hot restart, cold restart, to lack the problem of unified sequencing control before solving multi node server, and whole state change process can by BMC monitoring record.
Accompanying drawing explanation
Fig. 1 is that CC-NUMA multi-node server system state switches block diagram.
Embodiment
The present invention will be further described by reference to the accompanying drawings.
First, to set up on a kind of CC-NUMA multi node server system mode and switch and supervisory control system, comprise host node server, from node server, host node server and be all provided with BMC monitoring management unit in node server, client host;
Host node server: the distribution in the whole computer address space of host node network in charge, it runs BIOS and OS, is also the promoter that system mode switches;
Several are from node server: provide calculating expanded function from node server, are also that executor is followed in system mode switching;
Host node and from the BMC monitoring management unit in node server: BMC by the communication between " internal management network " responsible node server;
Client host: the BMC in host node server is also connected to client host by an external management network interface, the client host that operates in that system mode switches can be monitored.
Utilize above system, realize a kind of method that on CC-NUMA multi node server, system mode switches and monitors, process is that host node server regulates transmission Power_Enable, Power_OK, System_Reset tri-kinds of signals notify that inner BMC needs the operation performed, by internal management network interface notification from the BMC in node, go to perform the operation of corresponding system mode after the BMC in node receives order, in addition the BMC in host node server is according to the Power_Enable received in node, the varying level combination of Power_OK and System_Reset and pulse signal judge the state that system is current, notice client host is to monitor.
Switch startup and shutdown for system mode, illustrate:
System mode switches to start:
1. 1 host node server sends high level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power on operation;
Power-on command passes to from node 0, from node 1, until from the BMC node N by internal management network interface by the BMC 2. in host node server;
3. send high level " Power_Enable " to all from node server from the BMC of node server, notify to perform power-on operation from node server;
4. host node and performing after power-on operation from node server, feedback high level " Power_OK " signal gives respective BMC, indicates that this node has powered on;
5. after " Power_OK ", host node server sends high level " System_Reset " to BMC, and indication host node inner treater, internal memory, chipset reset complete, and wait for from node reset and completing;
6., after the BMC in host node server receives " System_Reset ", other BMC from node are notified by internal management network;
7., after receiving " System_Reset " from node server BMC, reset operation is carried out to the processor in this node, internal memory, chipset;
8., after Servers-all node reset completes, host node server starts to load BIOS and OS.
System mode switches to shutdown:
1. host node server sends low level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power-off operation;
Power-on command passes to from node 0, from node 1, until from the BMC node N by internal management network interface by the BMC 2. in host node server;
3. send low level " Power_Enable " to all from node server from the BMC of node server, notify to perform power-off operation from node server;
4. host node and performing after power-on operation from node server, feedback low level " Power_OK " signal gives respective BMC, indicates that this node has shut down;
5. after " Power_OK ", host node server sends low level " System_Reset " to BMC, and indication host node inner treater, internal memory, chipset reset complete, and wait for from node reset and completing;
6., after the BMC in host node server receives " System_Reset ", other BMC from node are notified by internal management network;
7., after receiving " System_Reset " from node server BMC, reset operation is carried out to the processor in this node, internal memory, chipset;
8., after Servers-all node reset completes, host node server has shut down.
System status monitoring process is as follows:
The level combinations of each signal and pulse signal under system mode.BMC in host node server judges according to the varying level combination of " Power_Enable ", " Power_OK " and " System_Reset " in its node the state that system is current.
Open state: when " Power_Enable ", " Power_OK " and " System_Reset " are high level, system is in open state;
Off-mode: when " Power_Enable ", " Power_OK " and " System_Reset " are low level, system is in off-mode;
Hot restart: when " Power_Enable " and " Power_OK " is all high level, when a low pulse appears in " System_Reset ", system there occurs a hot restart operation;
Cold restart: when a low pulse appears in " Power_Enable ", " Power_OK " and " System_Reset ", system there occurs a cold restart operation;
Be all abnormality in addition.
Claims (6)
1. on CC-NUMA multi node server, system mode switches and a supervisory control system, it is characterized in that comprising host node server, from node server, and host node server and be all provided with BMC monitoring management unit in node server, client host;
Host node server: the distribution in the whole computer address space of host node network in charge, it runs BIOS and OS, is also the promoter that system mode switches;
Several are from node server: provide calculating expanded function from node server, are also that executor is followed in system mode switching;
Host node and from the BMC monitoring management unit in node server: BMC by the communication between " internal management network " responsible node server;
Client host: the BMC in host node server is also connected to client host by an external management network interface, the client host that operates in that system mode switches can be monitored.
2. the method that on a CC-NUMA multi node server, system mode switches and monitors, system mode on a kind of CC-NUMA multi node server described in claim 1 is utilized to switch and supervisory control system, it is characterized in that host node server regulates and send Power_Enable, Power_OK, System_Reset tri-kinds of signals notify that inner BMC needs the operation performed, by internal management network interface notification from the BMC in node, go to perform the operation of corresponding system mode after the BMC in node receives order, in addition the BMC in host node server is according to the Power_Enable received in node, the varying level combination of Power_OK and System_Reset and pulse signal judge the state that system is current, notice client host is to monitor.
3. the method that on a kind of CC-NUMA multi node server according to claim 2, system mode switches and monitors, is characterized in that described system mode switches and refers to start, shutdown, cold restart, hot restart.
4. the method that on a kind of CC-NUMA multi node server according to claim 3, system mode switches and monitors, it is characterized in that described system mode switches to start, concrete steps are:
1. host node server sends high level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power on operation;
Power-on command is passed to BMC from node server by internal management network interface by the BMC 2. in host node server;
3. send high level " Power_Enable " to all from node server from the BMC of node server, notify to perform power-on operation from node server;
4. host node and performing after power-on operation from node server, feedback high level " Power_OK " signal gives respective BMC, indicates that this node has powered on;
5. after " Power_OK ", host node server sends high level " System_Reset " to BMC, and indication host node inner treater, internal memory, chipset reset complete, and wait for from node reset and completing;
6., after the BMC in host node server receives " System_Reset ", other BMC from node are notified by internal management network;
7., after receiving " System_Reset " from node server BMC, reset operation is carried out to the processor in this node, internal memory, chipset;
8., after Servers-all node reset completes, host node server starts to load BIOS and OS.
5. the method that on a kind of CC-NUMA multi node server according to claim 3, system mode switches and monitors, it is characterized in that described system mode switches to shutdown, concrete steps are:
1. host node server sends low level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power-off operation;
Power-on command is passed to BMC from node server by internal management network interface by the BMC 2. in host node server;
3. send low level " Power_Enable " to all from node server from the BMC of node server, notify to perform power-off operation from node server;
4. host node and performing after power-on operation from node server, feedback low level " Power_OK " signal gives respective BMC, indicates that this node has shut down;
5. after " Power_OK ", host node server sends low level " System_Reset " to BMC, and indication host node inner treater, internal memory, chipset reset complete, and wait for from node reset and completing;
6., after the BMC in host node server receives " System_Reset ", other BMC from node are notified by internal management network;
7., after receiving " System_Reset " from node server BMC, reset operation is carried out to the processor in this node, internal memory, chipset;
8., after Servers-all node reset completes, host node server has shut down.
6. the method that on a kind of CC-NUMA multi node server according to claim 3, system mode switches and monitors, it is characterized in that the level combinations of Power_Enable, Power_OK and System_Reset during described start is high level, high level, high level;
During described shutdown, the level combinations of Power_Enable, Power_OK and System_Reset is low level, low level, low level;
During described cold restart there is a low pulse in Power_Enable, Power_OK and System_Reset;
During described hot restart, Power_Enable, Power_OK are high level, and a low pulse appears in System_Reset.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410831246.3A CN104506362A (en) | 2014-12-29 | 2014-12-29 | Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410831246.3A CN104506362A (en) | 2014-12-29 | 2014-12-29 | Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104506362A true CN104506362A (en) | 2015-04-08 |
Family
ID=52948077
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410831246.3A Pending CN104506362A (en) | 2014-12-29 | 2014-12-29 | Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104506362A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105607915A (en) * | 2016-02-02 | 2016-05-25 | 浪潮(北京)电子信息产业有限公司 | Double-zone power-on control method for server |
CN105912498A (en) * | 2016-04-01 | 2016-08-31 | 浪潮电子信息产业股份有限公司 | Partitioning method and device for multi-path server and multi-path server |
CN106383791A (en) * | 2016-09-23 | 2017-02-08 | 深圳职业技术学院 | Memory block combination method and apparatus based on non-uniform memory access architecture |
CN107247683A (en) * | 2017-06-14 | 2017-10-13 | 郑州云海信息技术有限公司 | A kind of orientation management system and its method for rack server |
CN109144824A (en) * | 2018-07-19 | 2019-01-04 | 曙光信息产业(北京)有限公司 | The operating status display device of two-way server node |
CN109408266A (en) * | 2018-10-08 | 2019-03-01 | 郑州云海信息技术有限公司 | A kind of determination method and apparatus of Restart Type |
CN110532160A (en) * | 2019-09-03 | 2019-12-03 | 深圳市智微智能科技开发有限公司 | A kind of method of BMC record server system hot restart event |
WO2022078519A1 (en) * | 2020-10-16 | 2022-04-21 | 华为技术有限公司 | Computer device and management method |
CN116126649A (en) * | 2023-04-19 | 2023-05-16 | 苏州浪潮智能科技有限公司 | Method, device, server, equipment and medium for managing and controlling sub-nodes |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090164201A1 (en) * | 2006-04-20 | 2009-06-25 | Internationalbusiness Machines Corporation | Method, System and Computer Program For The Centralized System Management On EndPoints Of A Distributed Data Processing System |
CN102571452A (en) * | 2012-02-20 | 2012-07-11 | 华为技术有限公司 | Multi-node management method and system |
CN102708190A (en) * | 2012-05-15 | 2012-10-03 | 浪潮电子信息产业股份有限公司 | Directory cache method for node control chip in cache coherent non-uniform memory access (CC-NUMA) system |
CN103475494A (en) * | 2013-09-12 | 2013-12-25 | 华为技术有限公司 | CC-NUMA system and starting method thereof |
CN103593306A (en) * | 2013-11-15 | 2014-02-19 | 浪潮电子信息产业股份有限公司 | Design method for Cache control unit of protocol processor |
-
2014
- 2014-12-29 CN CN201410831246.3A patent/CN104506362A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090164201A1 (en) * | 2006-04-20 | 2009-06-25 | Internationalbusiness Machines Corporation | Method, System and Computer Program For The Centralized System Management On EndPoints Of A Distributed Data Processing System |
CN102571452A (en) * | 2012-02-20 | 2012-07-11 | 华为技术有限公司 | Multi-node management method and system |
CN102708190A (en) * | 2012-05-15 | 2012-10-03 | 浪潮电子信息产业股份有限公司 | Directory cache method for node control chip in cache coherent non-uniform memory access (CC-NUMA) system |
CN103475494A (en) * | 2013-09-12 | 2013-12-25 | 华为技术有限公司 | CC-NUMA system and starting method thereof |
CN103593306A (en) * | 2013-11-15 | 2014-02-19 | 浪潮电子信息产业股份有限公司 | Design method for Cache control unit of protocol processor |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105607915A (en) * | 2016-02-02 | 2016-05-25 | 浪潮(北京)电子信息产业有限公司 | Double-zone power-on control method for server |
CN105607915B (en) * | 2016-02-02 | 2018-10-02 | 浪潮(北京)电子信息产业有限公司 | A kind of double subregion start-up control methods of server |
CN105912498A (en) * | 2016-04-01 | 2016-08-31 | 浪潮电子信息产业股份有限公司 | Partitioning method and device for multi-path server and multi-path server |
CN106383791B (en) * | 2016-09-23 | 2019-07-12 | 深圳职业技术学院 | A kind of memory block combined method and device based on nonuniform memory access framework |
CN106383791A (en) * | 2016-09-23 | 2017-02-08 | 深圳职业技术学院 | Memory block combination method and apparatus based on non-uniform memory access architecture |
CN107247683B (en) * | 2017-06-14 | 2020-10-23 | 苏州浪潮智能科技有限公司 | Positioning management system and method for rack server |
CN107247683A (en) * | 2017-06-14 | 2017-10-13 | 郑州云海信息技术有限公司 | A kind of orientation management system and its method for rack server |
CN109144824A (en) * | 2018-07-19 | 2019-01-04 | 曙光信息产业(北京)有限公司 | The operating status display device of two-way server node |
CN109144824B (en) * | 2018-07-19 | 2022-07-08 | 中科曙光信息产业成都有限公司 | Running state display device of double-path server node |
CN109408266A (en) * | 2018-10-08 | 2019-03-01 | 郑州云海信息技术有限公司 | A kind of determination method and apparatus of Restart Type |
CN109408266B (en) * | 2018-10-08 | 2022-02-18 | 郑州云海信息技术有限公司 | Method and device for determining restart type |
CN110532160A (en) * | 2019-09-03 | 2019-12-03 | 深圳市智微智能科技开发有限公司 | A kind of method of BMC record server system hot restart event |
WO2022078519A1 (en) * | 2020-10-16 | 2022-04-21 | 华为技术有限公司 | Computer device and management method |
CN116126649A (en) * | 2023-04-19 | 2023-05-16 | 苏州浪潮智能科技有限公司 | Method, device, server, equipment and medium for managing and controlling sub-nodes |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104506362A (en) | Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server | |
US11789619B2 (en) | Node interconnection apparatus, resource control node, and server system | |
JP6530774B2 (en) | Hardware failure recovery system | |
CN107302465B (en) | PCIe Switch server complete machine management method | |
US9600370B2 (en) | Server system | |
CN105700969B (en) | server system | |
US8468383B2 (en) | Reduced power failover system | |
JP2005500622A (en) | Computer system partitioning using data transfer routing mechanism | |
CN104503783A (en) | Method and server for presenting initialization degree of server hardware | |
CN105242980A (en) | Complementary watchdog system and complementary watchdog monitoring method | |
WO2017136986A1 (en) | Method and system for power management | |
CN103532753A (en) | Double-computer hot standby method based on memory page replacement synchronization | |
US11662803B2 (en) | Control method, apparatus, and electronic device | |
US11093332B2 (en) | Application checkpoint and recovery system | |
CN102891762B (en) | The system and method for network data continuously | |
JP2013130961A (en) | Control system and repeater | |
CN105068763A (en) | Virtual machine fault-tolerant system and method for storage faults | |
CN103178977A (en) | Computer system and starting-up management method of same | |
JP7063315B2 (en) | Information processing equipment, management programs, management methods, and information processing systems | |
US10719310B1 (en) | Systems and methods for reducing keyboard, video, and mouse (KVM) downtime during firmware update or failover events in a chassis with redundant enclosure controllers (ECs) | |
TWI525449B (en) | Server control method and chassis controller | |
CN112631872B (en) | Exception handling method and device for multi-core system | |
CN104572561A (en) | Implementing method and system of overall hot plugging of clumps | |
JP5464886B2 (en) | Computer system | |
TWI774464B (en) | Expanded availability computing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150408 |
|
RJ01 | Rejection of invention patent application after publication |