CN105573868A - Method for supporting non-stop online replacement of calculation node, suitable for high-end host - Google Patents

Method for supporting non-stop online replacement of calculation node, suitable for high-end host Download PDF

Info

Publication number
CN105573868A
CN105573868A CN201510905791.7A CN201510905791A CN105573868A CN 105573868 A CN105573868 A CN 105573868A CN 201510905791 A CN201510905791 A CN 201510905791A CN 105573868 A CN105573868 A CN 105573868A
Authority
CN
China
Prior art keywords
hardware
bios
bmc
need
computing node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510905791.7A
Other languages
Chinese (zh)
Inventor
王超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201510905791.7A priority Critical patent/CN105573868A/en
Publication of CN105573868A publication Critical patent/CN105573868A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2097Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a method for supporting non-stop online replacement of a calculation node, suitable for a high-end host, belonging to the field of high-end hosts. The technical problem to be solved in the invention is how to realize online replacement of the calculation node and reduce the stop time when a service is normally operated. The technical scheme is as follows: the high-end host structurally comprises a structure, hardware, a BIOS, an OS and a BMC; the structure, the hardware, the BIOS, the OS and the BMC are matched and supported with each other; therefore, online replacement of the calculation node when the service is normally operated is realized, wherein the structure needs to be rapidly disassembled, such that the calculation node is independently removed and inserted; the hardware needs to make a special present signal on a calculation board and a backboard connector, such that a function of identifying removing and insertion is supported; and the hardware needs to be subjected to buffer circuit processing on all power supply circuits, such that insertion and extraction in a charged state can be supported.

Description

Be applicable to high-end main frame support and do not shut down the online method changing computing node
Technical field
The present invention relates to high-end main frame, be specifically applicable to high-end main frame support and do not shut down the online method changing computing node.
Background technology
For high-end main frame, ease for maintenance and high stability no less important, the raising of ease for maintenance can shorten high-end main frame annual stop time greatly, improves reliability simultaneously.Wherein the most critical point of ease for maintenance design is exactly the online replacing of key modules.For high-end main frame, modular mode is often selected in design, using CPU and internal memory as independent computing module or computing node.Realize computing node to change online when business runs well, can reduce by 80% annual business stop time.How can realize computing node to change online when business runs well, the power generating ratio time is the technical matters existed at present.
Summary of the invention
Technical assignment of the present invention is to provide and is a kind ofly applicable to high-end main frame support and does not shut down the online method changing computing node, solves how can realize computing node and change online when business runs well, the problem of power generating ratio time.
The technical solution adopted for the present invention to solve the technical problems is: be applicable to high-end main frame support and do not shut down the online method changing computing node, comprise structure, hardware, BIOS, OS, BMC, structure, hardware, BIOS, OS, BMC cooperatively interact, mutual support, realizes computing node and changes online when business runs well;
Wherein, (1) structure: structure need do quick detach, realizes computing node and removes separately and insert;
(2) hardware: hardware need do special present signal on computing board and back panel connector, to support the recognition function removing and insert; Hardware need do buffer circuit process in all supply lines, with the plug under tenaculum electricity condition;
(3) BIOS:BIOS need communicate with hardware and OS, remove with insertion process in control scheduling of resource;
(4) OS:OS need do resource migration action, and after removing instruction triggers, the resource on the hardware that will remove all is transferred on other hardware, and removes successful signal to BIOS transmission resource; Simultaneously when BIOS provides new hardware device to add action, add hardware online, and allocate resources on new hardware;
(5) BMC:BMC design need ensure the IPMI protocol communication between OS.
Of the present inventionly be applicable to high-end main frame support and do not shut down the online method changing computing node compared to the prior art; there is following beneficial effect: the present invention rely on structure, hardware, BIOS, OS, BMC five major part work in coordination; computing node can be realized and do not shut down replacing computing node when system cloud gray model; improve high-end main frame annual stop time and reliability; improve ease for maintenance and the high stability of high-end main frame; increase the serviceable life of high-end main frame; realize computing node to change online when business runs well, can reduce by 80% annual business stop time.
The present invention have reasonable in design, structure simple, be easy to the features such as processing, volume are little, easy to use, one-object-many-purposes, thus, have good value for applications.
Accompanying drawing explanation
Below in conjunction with accompanying drawing, the present invention is further described.
Accompanying drawing 1 is the process flow diagram removing computing node online;
Accompanying drawing 2 is the online process flow diagram inserting computing node.
Embodiment
Below in conjunction with the drawings and specific embodiments, the invention will be further described.
Of the present inventionly be applicable to high-end main frame support and do not shut down the online method changing computing node, its structure comprises structure, hardware, BIOS, OS, BMC, structure, hardware, BIOS, OS, BMC cooperatively interact, and mutually support, realize computing node and change online when business runs well;
Wherein, (1) structure: structure need do quick detach, realizes computing node and removes separately and insert;
(2) hardware: hardware need do special present signal on computing board and back panel connector, to support the recognition function removing and insert; Hardware need do buffer circuit process in all supply lines, with the plug under tenaculum electricity condition;
(3) BIOS:BIOS need communicate with hardware and OS, remove with insertion process in control scheduling of resource;
(4) OS:OS need do resource migration action, and after removing instruction triggers, the resource on the hardware that will remove all is transferred on other hardware, and removes successful signal to BIOS transmission resource; Simultaneously when BIOS provides new hardware device to add action, add hardware online, and allocate resources on new hardware;
(5) BMC:BMC design need ensure the IPMI protocol communication between OS.
Specific works process: as shown in Figure 1, removes computing node online, and first, BMC/ hardware button triggers and removes computing node action, and being communicated by IPMI and OS sends to OS by removing order, and OS order removes computing node resource; Then, OS offers BIOS after removing resource, and BIOS removes related resource and redistributes; Subsequently, BIOS offers CPLD after removing resource, and CPLD carries out lower electronic work; Meanwhile, feedback BMC removes complete state, and under BMC, display removes complete state; Finally, people is for removing computing node.
As shown in Figure 2, insert computing node online, first, during system cloud gray model, insert computing node; Then, CPLD powers on to computing node; Subsequently, offer BIOS after CPLD powers on, BIOS is new computing node Resources allocation; Subsequently, BIOS reports OS, OS scanning hardware equipment, and to new equipment Resources allocation; Finally, feedback BMC has removed, and shows node and add successfully under BMC.
By embodiment above, described those skilled in the art can be easy to realize the present invention.But should be appreciated that the present invention is not limited to above-mentioned embodiment.On the basis of disclosed embodiment, described those skilled in the art can the different technical characteristic of combination in any, thus realizes different technical schemes.

Claims (1)

1. be applicable to high-end main frame support and do not shut down the online method changing computing node, it is characterized in that: comprise structure, hardware, BIOS, OS, BMC, structure, hardware, BIOS, OS, BMC cooperatively interact, and mutually support, realize computing node and change online when business runs well;
Wherein, (1) structure: structure need do quick detach, realizes computing node and removes separately and insert;
(2) hardware: hardware need do special present signal on computing board and back panel connector, to support the recognition function removing and insert; Hardware need do buffer circuit process in all supply lines, with the plug under tenaculum electricity condition;
(3) BIOS:BIOS need communicate with hardware and OS, remove with insertion process in control scheduling of resource;
(4) OS:OS need do resource migration action, and after removing instruction triggers, the resource on the hardware that will remove all is transferred on other hardware, and removes successful signal to BIOS transmission resource; Simultaneously when BIOS provides new hardware device to add action, add hardware online, and allocate resources on new hardware;
(5) BMC:BMC design need ensure the IPMI protocol communication between OS.
CN201510905791.7A 2015-12-10 2015-12-10 Method for supporting non-stop online replacement of calculation node, suitable for high-end host Pending CN105573868A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510905791.7A CN105573868A (en) 2015-12-10 2015-12-10 Method for supporting non-stop online replacement of calculation node, suitable for high-end host

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510905791.7A CN105573868A (en) 2015-12-10 2015-12-10 Method for supporting non-stop online replacement of calculation node, suitable for high-end host

Publications (1)

Publication Number Publication Date
CN105573868A true CN105573868A (en) 2016-05-11

Family

ID=55884034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510905791.7A Pending CN105573868A (en) 2015-12-10 2015-12-10 Method for supporting non-stop online replacement of calculation node, suitable for high-end host

Country Status (1)

Country Link
CN (1) CN105573868A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508312A (en) * 2018-11-14 2019-03-22 郑州云海信息技术有限公司 A kind of sending method and relevant apparatus of PCIE add-on card heat addition information
CN113312657A (en) * 2021-07-30 2021-08-27 杭州乒乓智能技术有限公司 Application server non-stop issuing method and application server

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551770A (en) * 2009-05-07 2009-10-07 福建星网锐捷网络有限公司 Hot plug testing device and method
US20110016297A1 (en) * 2008-09-29 2011-01-20 Mark Merizan Managed data region for server management
CN103631736A (en) * 2013-11-27 2014-03-12 华为技术有限公司 Method and device for controlling equipment resources
CN104572561A (en) * 2015-01-30 2015-04-29 浪潮电子信息产业股份有限公司 Implementing method and system of overall hot plugging of clumps
CN104615500A (en) * 2015-02-25 2015-05-13 浪潮电子信息产业股份有限公司 Dynamic server computing resource allocation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110016297A1 (en) * 2008-09-29 2011-01-20 Mark Merizan Managed data region for server management
CN101551770A (en) * 2009-05-07 2009-10-07 福建星网锐捷网络有限公司 Hot plug testing device and method
CN103631736A (en) * 2013-11-27 2014-03-12 华为技术有限公司 Method and device for controlling equipment resources
CN104572561A (en) * 2015-01-30 2015-04-29 浪潮电子信息产业股份有限公司 Implementing method and system of overall hot plugging of clumps
CN104615500A (en) * 2015-02-25 2015-05-13 浪潮电子信息产业股份有限公司 Dynamic server computing resource allocation method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508312A (en) * 2018-11-14 2019-03-22 郑州云海信息技术有限公司 A kind of sending method and relevant apparatus of PCIE add-on card heat addition information
CN109508312B (en) * 2018-11-14 2021-10-29 郑州云海信息技术有限公司 Method and related device for sending hot addition information of PCIE (peripheral component interface express) external card
CN113312657A (en) * 2021-07-30 2021-08-27 杭州乒乓智能技术有限公司 Application server non-stop issuing method and application server

Similar Documents

Publication Publication Date Title
CN102724093A (en) Advanced telecommunications computing architecture (ATCA) machine frame and intelligent platform management bus (IPMB) connection method thereof
CN103076869A (en) Design method for power-on maintenance of RACK equipment cabinet system
CN206757471U (en) A kind of IO Riser boards applied on multipath server
CN209821735U (en) Extensible computing server with 4U8 nodes
CN105573868A (en) Method for supporting non-stop online replacement of calculation node, suitable for high-end host
CN203301532U (en) Cloud desktop system
CN203490581U (en) Management mainboard of blade server based on ATCT structure
CN105426334A (en) Parallel type large-scale USB extension device, working method and system
CN205091663U (en) Modularization server backplate based on CPCI framework
CN206312032U (en) A kind of multi-functional main frame
CN110868330A (en) Evaluation method, device and evaluation system for CPU resources which can be divided by cloud platform
CN210428443U (en) Cluster server mainboard
CN210129159U (en) Back plate
CN214278888U (en) Distributed communication bus system reset circuit
CN107239300A (en) A kind of intelligent cabinet RMC and MP batch refreshing methods
CN103984391A (en) High density and high bandwidth server main board
CN104914970A (en) Powering-on and powering-off device and method for PCIE slots and main board
CN216286652U (en) Equipment management card mainboard circuit
CN110865701A (en) Server system and computer-implemented method for assembling cable-less server system
CN209543274U (en) A kind of four road server power panels
CN211207318U (en) Network security isolation device
CN217388749U (en) High-reliability micro-grid control device based on cluster service
CN209895248U (en) Cable-free server system
Leigh et al. General-purpose blade infrastructure for configurable system architectures
CN208538050U (en) A kind of L-type 1U storage server for supporting 10 disk positions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160511