CN104506362A - Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server - Google Patents

Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server Download PDF

Info

Publication number
CN104506362A
CN104506362A CN201410831246.3A CN201410831246A CN104506362A CN 104506362 A CN104506362 A CN 104506362A CN 201410831246 A CN201410831246 A CN 201410831246A CN 104506362 A CN104506362 A CN 104506362A
Authority
CN
China
Prior art keywords
node server
bmc
power
node
reset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410831246.3A
Other languages
Chinese (zh)
Inventor
贡维
宗艳艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201410831246.3A priority Critical patent/CN104506362A/en
Publication of CN104506362A publication Critical patent/CN104506362A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a method for system state switching and monitoring on a CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server, and belongs to the field of server management. The method includes that a main node server adjusts and transmits signals to notify an internal BMC of operations needed to execute, and the BMC in an auxiliary node is notified by an internal management network interface, executes corresponding system state operations after receiving commands, judges current system states according to different level combinations of received signals in the auxiliary node and pulse signals and notifies a client side host for monitoring. The method has the advantages that synchronous start, power-off, hot restart and cold restart of the CC-NUMA multi-node server are realized, the problem of lack of uniform sequential control of the multi-node server in the prior art is solved, and the whole state change process can be monitored and recorded by the BMC.

Description

A kind of method that on CC-NUMA multi node server, system mode switches and monitors
Technical field
The present invention discloses a kind of method that system mode switches and monitors, and belongs to server admin field, a kind of method that specifically on CC-NUMA multi node server, system mode switches and monitors.
Background technology
High-end server is applied to the very high scene of OLTP Transaction Processing Capability Requirement usually, and as bank, scientific algorithm etc., because the data volume calculating at one time and store is very large, high-end server generally adopts the architecture Design of CC-NUMA.CC-NUMA, Cache Coherent-Non Uniform Memory Access Architecture, i.e. Cache consistency nonuniform memory access, is linked together multiple processor by special interconnect equipment and forms distributed and shared memory space, only run an operating system.Each processor can access oneself Memory memory, also other processors or shared memory can be accessed, between multiple processor, the general mode of backboard or optical fiber that adopts connects, so be exactly morphologically interconnected by multiple server node to form a subregion, run an operating system.In the multi-node server system of traditional C C-NUMA framework, its each node is all generally oneself independently control start, shutdown and reboot operation, operationally each server node is independent control, unified sequencing control and system monitoring is lacked between each node, control easily not cause system normally to start at that time, affect the operating efficiency of server.The invention provides a kind of method that on CC-NUMA multi node server, system mode switches and monitors, all initiated, followed execution from node server by host node server by all system mode handover operations, and whole flow process all can be monitored by BMC administrative unit, realize the method for synchronous start, shutdown, hot restart, cold restart, to lack the problem of unified sequencing control before solving multi node server, and whole state change process can by BMC monitoring record.
Summary of the invention
The present invention is directed in the multi-node server system of traditional C C-NUMA framework, its each node is all generally oneself independently control start, shutdown and reboot operation, operationally each server node is independent control, unified sequencing control and system monitoring is lacked between each node, control easily not cause system normally to start at that time, affect the problem of the operating efficiency of server, a kind of method that on CC-NUMA multi node server, system mode switches and monitors is provided, realize synchronous start, shutdown, hot restart, cold restart, to lack the problem of unified sequencing control before solving multi node server, and whole state change process can by BMC monitoring record.
The concrete scheme proposed is:
On a kind of CC-NUMA multi node server, system mode switches and supervisory control system, comprises host node server, from node server, and host node server and be all provided with BMC monitoring management unit in node server, client host;
Host node server: the distribution in the whole computer address space of host node network in charge, it runs BIOS and OS, is also the promoter that system mode switches;
Several are from node server: provide calculating expanded function from node server, are also that executor is followed in system mode switching;
Host node and from the BMC monitoring management unit in node server: BMC by the communication between " internal management network " responsible node server;
Client host: the BMC in host node server is also connected to client host by an external management network interface, the client host that operates in that system mode switches can be monitored.
A kind of method that on CC-NUMA multi node server, system mode switches and monitors, system mode on a kind of CC-NUMA multi node server described in claim 1 is utilized to switch and supervisory control system, host node server regulates and sends Power_Enable, Power_OK, System_Reset tri-kinds of signals notify that inner BMC needs the operation performed, by internal management network interface notification from the BMC in node, go to perform the operation of corresponding system mode after the BMC in node receives order, in addition the BMC in host node server is according to the Power_Enable received in node, the varying level combination of Power_OK and System_Reset and pulse signal judge the state that system is current, notice client host is to monitor.
Described system mode switching refers to start, shutdown, cold restart, hot restart.
Described system mode switches to start, and concrete steps are:
1. host node server sends high level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power on operation;
Power-on command is passed to BMC from node server by internal management network interface by the BMC 2. in host node server;
3. send high level " Power_Enable " to all from node server from the BMC of node server, notify to perform power-on operation from node server;
4. host node and performing after power-on operation from node server, feedback high level " Power_OK " signal gives respective BMC, indicates that this node has powered on;
5. after " Power_OK ", host node server sends high level " System_Reset " to BMC, and indication host node inner treater, internal memory, chipset reset complete, and wait for from node reset and completing;
6., after the BMC in host node server receives " System_Reset ", other BMC from node are notified by internal management network;
7., after receiving " System_Reset " from node server BMC, reset operation is carried out to the processor in this node, internal memory, chipset;
8., after Servers-all node reset completes, host node server starts to load BIOS and OS.
Described system mode switches to shutdown, and concrete steps are:
1. host node server sends low level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power-off operation;
Power-on command is passed to BMC from node server by internal management network interface by the BMC 2. in host node server;
3. send low level " Power_Enable " to all from node server from the BMC of node server, notify to perform power-off operation from node server;
4. host node and performing after power-on operation from node server, feedback low level " Power_OK " signal gives respective BMC, indicates that this node has shut down;
5. after " Power_OK ", host node server sends low level " System_Reset " to BMC, and indication host node inner treater, internal memory, chipset reset complete, and wait for from node reset and completing;
6., after the BMC in host node server receives " System_Reset ", other BMC from node are notified by internal management network;
7., after receiving " System_Reset " from node server BMC, reset operation is carried out to the processor in this node, internal memory, chipset;
8., after Servers-all node reset completes, host node server has shut down.
During described start, the level combinations of Power_Enable, Power_OK and System_Reset is high level, high level, high level;
During described shutdown, the level combinations of Power_Enable, Power_OK and System_Reset is low level, low level, low level;
During described cold restart there is a low pulse in Power_Enable, Power_OK and System_Reset;
During described hot restart, Power_Enable, Power_OK are high level, and a low pulse appears in System_Reset.
If when " Power_Enable " and " Power_OK " is all high level, when a low pulse appears in " System_Reset ", system there occurs a hot restart operation; When a low pulse appears in " Power_Enable ", " Power_OK " and " System_Reset ", system there occurs a cold restart operation.
Usefulness of the present invention is: the present invention is initiated, followed execution from node server by host node server by all system mode handover operations, and whole flow process all can be monitored by BMC administrative unit, realize the method for synchronous start, shutdown, hot restart, cold restart, to lack the problem of unified sequencing control before solving multi node server, and whole state change process can by BMC monitoring record.
Accompanying drawing explanation
Fig. 1 is that CC-NUMA multi-node server system state switches block diagram.
Embodiment
The present invention will be further described by reference to the accompanying drawings.
First, to set up on a kind of CC-NUMA multi node server system mode and switch and supervisory control system, comprise host node server, from node server, host node server and be all provided with BMC monitoring management unit in node server, client host;
Host node server: the distribution in the whole computer address space of host node network in charge, it runs BIOS and OS, is also the promoter that system mode switches;
Several are from node server: provide calculating expanded function from node server, are also that executor is followed in system mode switching;
Host node and from the BMC monitoring management unit in node server: BMC by the communication between " internal management network " responsible node server;
Client host: the BMC in host node server is also connected to client host by an external management network interface, the client host that operates in that system mode switches can be monitored.
Utilize above system, realize a kind of method that on CC-NUMA multi node server, system mode switches and monitors, process is that host node server regulates transmission Power_Enable, Power_OK, System_Reset tri-kinds of signals notify that inner BMC needs the operation performed, by internal management network interface notification from the BMC in node, go to perform the operation of corresponding system mode after the BMC in node receives order, in addition the BMC in host node server is according to the Power_Enable received in node, the varying level combination of Power_OK and System_Reset and pulse signal judge the state that system is current, notice client host is to monitor.
Switch startup and shutdown for system mode, illustrate:
System mode switches to start:
1. 1 host node server sends high level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power on operation;
Power-on command passes to from node 0, from node 1, until from the BMC node N by internal management network interface by the BMC 2. in host node server;
3. send high level " Power_Enable " to all from node server from the BMC of node server, notify to perform power-on operation from node server;
4. host node and performing after power-on operation from node server, feedback high level " Power_OK " signal gives respective BMC, indicates that this node has powered on;
5. after " Power_OK ", host node server sends high level " System_Reset " to BMC, and indication host node inner treater, internal memory, chipset reset complete, and wait for from node reset and completing;
6., after the BMC in host node server receives " System_Reset ", other BMC from node are notified by internal management network;
7., after receiving " System_Reset " from node server BMC, reset operation is carried out to the processor in this node, internal memory, chipset;
8., after Servers-all node reset completes, host node server starts to load BIOS and OS.
System mode switches to shutdown:
1. host node server sends low level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power-off operation;
Power-on command passes to from node 0, from node 1, until from the BMC node N by internal management network interface by the BMC 2. in host node server;
3. send low level " Power_Enable " to all from node server from the BMC of node server, notify to perform power-off operation from node server;
4. host node and performing after power-on operation from node server, feedback low level " Power_OK " signal gives respective BMC, indicates that this node has shut down;
5. after " Power_OK ", host node server sends low level " System_Reset " to BMC, and indication host node inner treater, internal memory, chipset reset complete, and wait for from node reset and completing;
6., after the BMC in host node server receives " System_Reset ", other BMC from node are notified by internal management network;
7., after receiving " System_Reset " from node server BMC, reset operation is carried out to the processor in this node, internal memory, chipset;
8., after Servers-all node reset completes, host node server has shut down.
System status monitoring process is as follows:
The level combinations of each signal and pulse signal under system mode.BMC in host node server judges according to the varying level combination of " Power_Enable ", " Power_OK " and " System_Reset " in its node the state that system is current.
Open state: when " Power_Enable ", " Power_OK " and " System_Reset " are high level, system is in open state;
Off-mode: when " Power_Enable ", " Power_OK " and " System_Reset " are low level, system is in off-mode;
Hot restart: when " Power_Enable " and " Power_OK " is all high level, when a low pulse appears in " System_Reset ", system there occurs a hot restart operation;
Cold restart: when a low pulse appears in " Power_Enable ", " Power_OK " and " System_Reset ", system there occurs a cold restart operation;
Be all abnormality in addition.

Claims (6)

1. on CC-NUMA multi node server, system mode switches and a supervisory control system, it is characterized in that comprising host node server, from node server, and host node server and be all provided with BMC monitoring management unit in node server, client host;
Host node server: the distribution in the whole computer address space of host node network in charge, it runs BIOS and OS, is also the promoter that system mode switches;
Several are from node server: provide calculating expanded function from node server, are also that executor is followed in system mode switching;
Host node and from the BMC monitoring management unit in node server: BMC by the communication between " internal management network " responsible node server;
Client host: the BMC in host node server is also connected to client host by an external management network interface, the client host that operates in that system mode switches can be monitored.
2. the method that on a CC-NUMA multi node server, system mode switches and monitors, system mode on a kind of CC-NUMA multi node server described in claim 1 is utilized to switch and supervisory control system, it is characterized in that host node server regulates and send Power_Enable, Power_OK, System_Reset tri-kinds of signals notify that inner BMC needs the operation performed, by internal management network interface notification from the BMC in node, go to perform the operation of corresponding system mode after the BMC in node receives order, in addition the BMC in host node server is according to the Power_Enable received in node, the varying level combination of Power_OK and System_Reset and pulse signal judge the state that system is current, notice client host is to monitor.
3. the method that on a kind of CC-NUMA multi node server according to claim 2, system mode switches and monitors, is characterized in that described system mode switches and refers to start, shutdown, cold restart, hot restart.
4. the method that on a kind of CC-NUMA multi node server according to claim 3, system mode switches and monitors, it is characterized in that described system mode switches to start, concrete steps are:
1. host node server sends high level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power on operation;
Power-on command is passed to BMC from node server by internal management network interface by the BMC 2. in host node server;
3. send high level " Power_Enable " to all from node server from the BMC of node server, notify to perform power-on operation from node server;
4. host node and performing after power-on operation from node server, feedback high level " Power_OK " signal gives respective BMC, indicates that this node has powered on;
5. after " Power_OK ", host node server sends high level " System_Reset " to BMC, and indication host node inner treater, internal memory, chipset reset complete, and wait for from node reset and completing;
6., after the BMC in host node server receives " System_Reset ", other BMC from node are notified by internal management network;
7., after receiving " System_Reset " from node server BMC, reset operation is carried out to the processor in this node, internal memory, chipset;
8., after Servers-all node reset completes, host node server starts to load BIOS and OS.
5. the method that on a kind of CC-NUMA multi node server according to claim 3, system mode switches and monitors, it is characterized in that described system mode switches to shutdown, concrete steps are:
1. host node server sends low level " Power_Enable " useful signal to BMC, and notice BMC needs to perform power-off operation;
Power-on command is passed to BMC from node server by internal management network interface by the BMC 2. in host node server;
3. send low level " Power_Enable " to all from node server from the BMC of node server, notify to perform power-off operation from node server;
4. host node and performing after power-on operation from node server, feedback low level " Power_OK " signal gives respective BMC, indicates that this node has shut down;
5. after " Power_OK ", host node server sends low level " System_Reset " to BMC, and indication host node inner treater, internal memory, chipset reset complete, and wait for from node reset and completing;
6., after the BMC in host node server receives " System_Reset ", other BMC from node are notified by internal management network;
7., after receiving " System_Reset " from node server BMC, reset operation is carried out to the processor in this node, internal memory, chipset;
8., after Servers-all node reset completes, host node server has shut down.
6. the method that on a kind of CC-NUMA multi node server according to claim 3, system mode switches and monitors, it is characterized in that the level combinations of Power_Enable, Power_OK and System_Reset during described start is high level, high level, high level;
During described shutdown, the level combinations of Power_Enable, Power_OK and System_Reset is low level, low level, low level;
During described cold restart there is a low pulse in Power_Enable, Power_OK and System_Reset;
During described hot restart, Power_Enable, Power_OK are high level, and a low pulse appears in System_Reset.
CN201410831246.3A 2014-12-29 2014-12-29 Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server Pending CN104506362A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410831246.3A CN104506362A (en) 2014-12-29 2014-12-29 Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410831246.3A CN104506362A (en) 2014-12-29 2014-12-29 Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server

Publications (1)

Publication Number Publication Date
CN104506362A true CN104506362A (en) 2015-04-08

Family

ID=52948077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410831246.3A Pending CN104506362A (en) 2014-12-29 2014-12-29 Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server

Country Status (1)

Country Link
CN (1) CN104506362A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105607915A (en) * 2016-02-02 2016-05-25 浪潮(北京)电子信息产业有限公司 Double-zone power-on control method for server
CN105912498A (en) * 2016-04-01 2016-08-31 浪潮电子信息产业股份有限公司 Partitioning method and device for multi-path server and multi-path server
CN106383791A (en) * 2016-09-23 2017-02-08 深圳职业技术学院 Memory block combination method and apparatus based on non-uniform memory access architecture
CN107247683A (en) * 2017-06-14 2017-10-13 郑州云海信息技术有限公司 A kind of orientation management system and its method for rack server
CN109144824A (en) * 2018-07-19 2019-01-04 曙光信息产业(北京)有限公司 The operating status display device of two-way server node
CN109408266A (en) * 2018-10-08 2019-03-01 郑州云海信息技术有限公司 A kind of determination method and apparatus of Restart Type
CN110532160A (en) * 2019-09-03 2019-12-03 深圳市智微智能科技开发有限公司 A kind of method of BMC record server system hot restart event
WO2022078519A1 (en) * 2020-10-16 2022-04-21 华为技术有限公司 Computer device and management method
CN116126649A (en) * 2023-04-19 2023-05-16 苏州浪潮智能科技有限公司 Method, device, server, equipment and medium for managing and controlling sub-nodes

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164201A1 (en) * 2006-04-20 2009-06-25 Internationalbusiness Machines Corporation Method, System and Computer Program For The Centralized System Management On EndPoints Of A Distributed Data Processing System
CN102571452A (en) * 2012-02-20 2012-07-11 华为技术有限公司 Multi-node management method and system
CN102708190A (en) * 2012-05-15 2012-10-03 浪潮电子信息产业股份有限公司 Directory cache method for node control chip in cache coherent non-uniform memory access (CC-NUMA) system
CN103475494A (en) * 2013-09-12 2013-12-25 华为技术有限公司 CC-NUMA system and starting method thereof
CN103593306A (en) * 2013-11-15 2014-02-19 浪潮电子信息产业股份有限公司 Design method for Cache control unit of protocol processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164201A1 (en) * 2006-04-20 2009-06-25 Internationalbusiness Machines Corporation Method, System and Computer Program For The Centralized System Management On EndPoints Of A Distributed Data Processing System
CN102571452A (en) * 2012-02-20 2012-07-11 华为技术有限公司 Multi-node management method and system
CN102708190A (en) * 2012-05-15 2012-10-03 浪潮电子信息产业股份有限公司 Directory cache method for node control chip in cache coherent non-uniform memory access (CC-NUMA) system
CN103475494A (en) * 2013-09-12 2013-12-25 华为技术有限公司 CC-NUMA system and starting method thereof
CN103593306A (en) * 2013-11-15 2014-02-19 浪潮电子信息产业股份有限公司 Design method for Cache control unit of protocol processor

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105607915A (en) * 2016-02-02 2016-05-25 浪潮(北京)电子信息产业有限公司 Double-zone power-on control method for server
CN105607915B (en) * 2016-02-02 2018-10-02 浪潮(北京)电子信息产业有限公司 A kind of double subregion start-up control methods of server
CN105912498A (en) * 2016-04-01 2016-08-31 浪潮电子信息产业股份有限公司 Partitioning method and device for multi-path server and multi-path server
CN106383791B (en) * 2016-09-23 2019-07-12 深圳职业技术学院 A kind of memory block combined method and device based on nonuniform memory access framework
CN106383791A (en) * 2016-09-23 2017-02-08 深圳职业技术学院 Memory block combination method and apparatus based on non-uniform memory access architecture
CN107247683B (en) * 2017-06-14 2020-10-23 苏州浪潮智能科技有限公司 Positioning management system and method for rack server
CN107247683A (en) * 2017-06-14 2017-10-13 郑州云海信息技术有限公司 A kind of orientation management system and its method for rack server
CN109144824A (en) * 2018-07-19 2019-01-04 曙光信息产业(北京)有限公司 The operating status display device of two-way server node
CN109144824B (en) * 2018-07-19 2022-07-08 中科曙光信息产业成都有限公司 Running state display device of double-path server node
CN109408266A (en) * 2018-10-08 2019-03-01 郑州云海信息技术有限公司 A kind of determination method and apparatus of Restart Type
CN109408266B (en) * 2018-10-08 2022-02-18 郑州云海信息技术有限公司 Method and device for determining restart type
CN110532160A (en) * 2019-09-03 2019-12-03 深圳市智微智能科技开发有限公司 A kind of method of BMC record server system hot restart event
WO2022078519A1 (en) * 2020-10-16 2022-04-21 华为技术有限公司 Computer device and management method
CN116126649A (en) * 2023-04-19 2023-05-16 苏州浪潮智能科技有限公司 Method, device, server, equipment and medium for managing and controlling sub-nodes

Similar Documents

Publication Publication Date Title
CN104506362A (en) Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server
US11023143B2 (en) Node interconnection apparatus, resource control node, and server system
CN107302465B (en) PCIe Switch server complete machine management method
US9600370B2 (en) Server system
US8954784B2 (en) Reduced power failover
CN105700969B (en) server system
JP2017224272A (en) Hardware failure recovery system
JP2005500622A (en) Computer system partitioning using data transfer routing mechanism
CN105159798A (en) Dual-machine hot-standby method for virtual machines, dual-machine hot-standby management server and system
CN105242980A (en) Complementary watchdog system and complementary watchdog monitoring method
WO2017136986A1 (en) Method and system for power management
CN103532753A (en) Double-computer hot standby method based on memory page replacement synchronization
US11662803B2 (en) Control method, apparatus, and electronic device
CN102289402A (en) Monitoring and managing method based on physical multi-partition computer architecture
CN102891762B (en) The system and method for network data continuously
JP2013130961A (en) Control system and repeater
CN105068763A (en) Virtual machine fault-tolerant system and method for storage faults
CN103178977A (en) Computer system and starting-up management method of same
JP7063315B2 (en) Information processing equipment, management programs, management methods, and information processing systems
CN110764829B (en) Multi-path server CPU isolation method and system
US10719310B1 (en) Systems and methods for reducing keyboard, video, and mouse (KVM) downtime during firmware update or failover events in a chassis with redundant enclosure controllers (ECs)
US20200349008A1 (en) Application Checkpoint and Recovery System
TWI525449B (en) Server control method and chassis controller
CN112631872B (en) Exception handling method and device for multi-core system
CN104572561A (en) Implementing method and system of overall hot plugging of clumps

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150408

RJ01 Rejection of invention patent application after publication