WO2023178975A1 - 机箱管理系统及机箱管理方法 - Google Patents

机箱管理系统及机箱管理方法 Download PDF

Info

Publication number
WO2023178975A1
WO2023178975A1 PCT/CN2022/121847 CN2022121847W WO2023178975A1 WO 2023178975 A1 WO2023178975 A1 WO 2023178975A1 CN 2022121847 W CN2022121847 W CN 2022121847W WO 2023178975 A1 WO2023178975 A1 WO 2023178975A1
Authority
WO
WIPO (PCT)
Prior art keywords
bmc
chassis
main
information
management
Prior art date
Application number
PCT/CN2022/121847
Other languages
English (en)
French (fr)
Inventor
黄玉龙
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023178975A1 publication Critical patent/WO2023178975A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication

Definitions

  • the present application relates to the field of storage technology, and in particular to a chassis management system and a corresponding chassis management method.
  • BMC Baseboard Manager Controller
  • a server remote management controller that can be used to implement chassis management of multi-controller storage products.
  • some operations such as firmware upgrade of the server device and machine device query can be performed.
  • unified high-end storage has better storage performance and higher reliability.
  • Unified high-end storage uses two controllers in one frame for device management. Each controller corresponds to a BMC.
  • the master-slave synchronization method is used to achieve data synchronization between BMCs. That is, each BMC must The status data is synchronized to the other three BMCs. This method of data synchronization requires a large amount of information, poor timeliness, and low chassis management efficiency.
  • This application provides a chassis management system, which effectively improves the chassis management efficiency of multi-controller storage products.
  • chassis management system including a hardware layer, a firmware layer, an operating system layer, an application layer and a cluster management center;
  • the hardware layer includes shared devices accessed by the main BMC and single-control devices managed by each BMC; the shared devices are used to collect chassis information and to network interconnect each control node of the cluster with each BMC;
  • the firmware layer includes the chassis management control module, multiple BMCs and their corresponding processors; each processor is used to manage the single-control belonging device of the corresponding control node and select the main BMC from each BMC; the chassis management control module is used to implement Telecommunication;
  • the operating system layer is used to communicate with and access each BMC
  • the application layer is used to access the operating system layer by calling the BMC interface, and obtain the hardware data information cached by each BMC through the chassis management control module;
  • the cluster management center is used to manage the hardware data information of all chassis obtained by each control node by accessing all BMCs.
  • the shared device includes chassis hardware, network management board and chassis power supply installed on the chassis; the shared device is connected to each BMC through I2C;
  • Chassis hardware is used to collect chassis information and indicate chassis information
  • the network management board is used to provide network interconnection functions to interconnect each control node of the cluster with each BMC.
  • chassis hardware includes any one or any combination of the following:
  • Backplane VPD chassis LED, chassis temperature sensor
  • the backplane VPD is used to obtain chassis electronic label information; the chassis LED is used to indicate chassis fault information and chassis alarm information; and the chassis temperature sensor is used to measure the chassis ambient temperature.
  • the chassis management control module includes a first chassis management controller and a second chassis management controller;
  • the first chassis management controller and the second chassis management controller are both connected to each BMC for network communication;
  • the first chassis management controller and the second chassis management controller implement network redundancy, and the network binding mode is a master-standby mode.
  • single-control belonging devices include any one or any combination of the following:
  • CAN VPD CAN VPD
  • CAN LED CAN LED
  • CAN sensor CAN sensor
  • fan IO expansion card
  • CAN VPD is used to obtain the controller electronic label information of the corresponding control node;
  • CAN LED is used to indicate node fault information or node alarm information or node positioning information of the corresponding control node;
  • CAN sensor is used to collect node temperature information of the corresponding control node and node voltage information;
  • IO expansion cards are used for link expansion of storage front-end or storage back-end.
  • the operating system layer includes multiple intelligent platform management tools corresponding to each control node;
  • Each intelligent platform management tool communicates with all BMCs to access each BMC through the intelligent platform management tool.
  • the operating system layer is also used to perform firmware upgrade operations on each BMC through the first chassis management controller or the second chassis management controller.
  • the application layer includes multiple high-definition monitors corresponding to each control node;
  • Each high-definition monitor is used to obtain the hardware data information cached by all BMCs by calling the corresponding BMC interface, and implement hardware management by polling each BMC and the main BMC;
  • Each high-definition monitor is connected to the cluster management center to synchronize the hardware data information obtained by the corresponding control node by accessing all BMCs to the cluster management center.
  • the main BMC has a virtual IP, and each control node in the cluster accesses the main BMC through the virtual IP;
  • the processor is also used to drift the virtual IP to the current primary BMC when a primary BMC switch is detected.
  • the processor is further used to:
  • Each BMC is preset with a physical number used to perform the main BMC switching sequence
  • the candidate BMC is used as the current primary BMC.
  • each BMC is associated with each controller node of the storage product, so that each control node can access the data collected by multiple BMCs simultaneously and in real time. It realizes the redundancy of links and control nodes, which is helpful to improve the reliability of storage products. It can also monitor the hardware status of the entire chassis through a single control node, improving the efficiency of chassis management. In addition, each control node can uniformly send the collected data to the cluster management center to keep the data consistent. There is no need to waste time on data synchronization between BMCs. It is highly time-effective and can further improve the efficiency of chassis management of storage products. Moreover, due to Each control node obtains the full amount of information, which can further improve data consistency.
  • FIG. 1 is a structural diagram of a specific implementation of the chassis management system provided by some embodiments of the present application.
  • FIG. 2 is a schematic flow chart of the steps of a chassis management method provided by some embodiments of the present application.
  • FIG. 3 is a structural diagram of another specific implementation of the chassis management system provided by some embodiments of the present application.
  • Figure 1 is a structural diagram of a specific implementation of a chassis management system provided by some embodiments of the present application. Some embodiments of the present application may include the following:
  • the chassis management system may include hardware layer 1, firmware layer 2, operating system layer 3, application layer 4 and cluster management center 5.
  • hardware layer 1 includes hardware devices of multi-controller storage products.
  • This layer includes two types of hardware.
  • One type of hardware is accessed and managed independently by the BMC of each control node, that is, it only belongs to each control node.
  • this type of hardware is called a single-control owned device.
  • Each control node of a multi-controller storage product corresponds to a group of single-controlled owned devices.
  • the number of groups of single-controlled owned devices is related to the controller of a multi-control storage product. In other words, the total number of control nodes is the same, and each control node contains the same type and number of single-control devices.
  • Each single-control belonging device can be connected to its BMC through any kind of bus.
  • shared devices can only be accessed and managed by the main BMC, and there is only one set of shared devices. Since the main BMC will be selected from all BMCs, and if the original main BMC fails or cannot carry services, the main BMC will change, so the shared device can be connected to the BMC of each control node through any bus.
  • the shared device in this embodiment can be used to collect information about the entire chassis and to network interconnect each control node of the cluster with each BMC.
  • firmware layer 2 is all programs written in EPROM (erasable programmable read-only memory) or EEPROM (electrically erasable programmable read-only memory), including the internal storage of each hardware device in hardware layer 1 Drivers.
  • the firmware layer 2 also includes a processor used to manage the single-control device belonging to the corresponding control node and select the main BMC for the current storage product.
  • Each BMC is associated with a processor.
  • the processors are connected, that is, the number of processors is the same as the total number of controllers of the multi-controller storage product.
  • the processor may include one or more processing cores, such as a 4-core processor or an 8-core processor.
  • the processor may also be a controller, a microcontroller, a microprocessor or other data processing chips.
  • the processor can use DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array), CPLD (Complex Programmable logic device, complex programmable logic device).
  • DSP Digital Signal Processing, digital signal processing
  • FPGA Field-Programmable Gate Array, field programmable gate array
  • PLA Programmable Logic Array, programmable logic array
  • CPLD Complex Programmable logic device, complex programmable logic device
  • the processor can also include a main processor and a co-processor.
  • the main processor is a processor used to process data in the awake state, also called CPU (Central Processing Unit, central processing unit); co-processor It is a low-power processor used to process data in standby mode.
  • the processor can even be integrated with a GPU (Graphics Processing Unit, image processor).
  • the GPU is used to be responsible for the rendering and drawing of content that needs to be displayed on the display screen, such as data information stored in the storage product.
  • the processor may also include an AI (Artificial Intelligence, artificial intelligence) processor, which is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • the processor in this embodiment may be a CPLD.
  • firmware layer 2 may include a chassis management control module, which is used to implement network communication. .
  • the operating system layer 3 is used to communicate with and access each BMC; the operating system layer 3 can provide tools to implement communication with the BMC.
  • Each tool corresponds to a control node, and one end of each tool is connected to The corresponding BMC interface of application layer 4 is connected, and the other end is connected to each BMC through any bus.
  • the operating system layer 3 provides a connection channel between the application layer 4 and the BMC of the firmware layer 2, so that the application layer 4 can access the operating system layer 3 by calling the BMC interface. Through the channel provided by the operating system layer 3, based on the chassis management control module The network interconnection function is provided to access the BMC, thereby obtaining the hardware data information cached by each BMC.
  • the BMC interfaces in the application layer 4 of this embodiment include multiple BMC interfaces.
  • One BMC corresponds to one BMC interface.
  • the user can obtain all the data collected by the BMC through the human-computer interaction page provided by the application layer 4 and through any BMC interface. That is to say, each control node can obtain the full amount of data collected by each BMC of the multi-control storage product. In other words, each control node can obtain exactly the same full amount of data that can reflect the operating status information of each chassis. After the control node obtains the full amount of data, all the obtained data can be sent to the cluster management center 5 in a unified manner.
  • the cluster management center 5 is used to manage the hardware data information of all chassis obtained by each control node by accessing all BMCs, thereby realizing chassis management and control of multi-controller storage products.
  • each BMC is associated with each controller node of the storage product through network interconnection technology, so that each control node can access data collected by multiple BMCs simultaneously and in real time. , not only realizes the redundancy of links and control nodes, which is helpful to improve the reliability of storage products, but also monitors the hardware status of the entire chassis through a single control node, improving the efficiency of chassis management.
  • each control node can uniformly send the collected data to the cluster management center to keep the data consistent. There is no need to waste time on data synchronization between BMCs. It is highly time-effective and can further improve the efficiency of chassis management of storage products. Moreover, due to Each control node obtains the full amount of information, which can further improve data consistency.
  • the structure of the hardware layer 1 may include the following:
  • the single-control belonging device in this embodiment includes any one or any combination of the following: CAN (Controller Area Network, controller area network) VPD (Vital Product Dat, important product data), CAN LED (light-emitting diode, light-emitting diode) , CAN sensor, fantry (fan), IO expansion card (Input/Output, input/output).
  • CAN Controller Area Network, controller area network
  • VPD Vehicle Product Dat, important product data
  • CAN LED light-emitting diode, light-emitting diode
  • CAN sensor CAN sensor
  • fantry fan
  • IO expansion card Input/Output, input/output
  • CAN VPD is used to obtain the controller electronic label information of the corresponding control node;
  • VPD is a collection of configuration and information data related to a specific set of hardware or software, which stores some important information of the device, such as part number (part number) ), serial number, required persistence information and some data specified by the device.
  • CAN LED is used as a node indicator to indicate node fault information or node alarm information or node positioning information of the corresponding control node, that is, used for positioning. or warning indication.
  • CAN sensors are used to collect node temperature information and node voltage information of corresponding control nodes.
  • CAN sensors can include node temperature sensors, node voltage sensors, etc.
  • Fantry is used for heat dissipation.
  • IO expansion cards are used to expand the storage front-end or storage back-end link.
  • Shared devices are a set of hardware set on the chassis that need to be accessed by all control nodes. They can include chassis hardware, network management boards, and chassis power supplies. I2C does not support simultaneous access, and simultaneous access will cause hang-ups. Shared devices can It is connected to each BMC through I2C (Inter-Integrated Circuit, two-wire serial bus), and is accessed by the selected main BMC node. Each control node realizes access to the shared device by accessing the BMC.
  • the chassis hardware is used to collect chassis information and indicate chassis information; the chassis hardware can include any one or any combination of the following: backplane VPD, chassis LED, and chassis temperature sensor.
  • the backplane VPD is used to obtain chassis electronic label information; the chassis LED is used as a chassis indicator light to indicate chassis fault information and chassis alarm information; the chassis temperature sensor is used to measure the chassis ambient temperature.
  • the network management board is used to provide network interconnection functions to interconnect each control node of the cluster with each BMC.
  • this embodiment can also include :
  • the chassis management control module may include a first chassis management controller CMC1 and a second chassis management controller CMC2; both the first chassis management controller and the second chassis management controller are connected to each BMC for realizing network communication; the first chassis The management controller CMC1 and the second chassis management controller CMC2 implement network redundancy, and the network binding mode is active and standby mode.
  • the chassis management control module can adopt the network card bonding mode, and also virtualize multiple physical network cards into one virtual network card through software. After the configuration is completed, the IP and mac of all physical network cards will be become the same.
  • this embodiment can adopt the active and backup mode of Bond1.
  • gratuitous ARP Address Resolution Protocol
  • the main salve interface and all VLAN (Virtual Local Area Network) interfaces configured on the interface will send gratuitous ARP, and at least one IP address needs to be configured on these interfaces.
  • Gratuitous ARPs sent on VLAN interfaces will have the appropriate VLAN id attached.
  • This mode provides fault tolerance. In this embodiment, communication is performed through CMC1 by default. If CMC1 fails or is not in place, network communication is switched to CMC2.
  • the network management board of the shared device in hardware layer 1 may be a CMC management board.
  • each hardware device in this embodiment can be connected to each BMC through I2C, that is, the hardware device can be accessed through I2C.
  • the processor is also used for each I2C Assign an I2C address and assign a corresponding address to each GPIO (General-purpose input/output, general-purpose input and output).
  • the operating system layer 3 may include multiple intelligent platform management tools Ipmitool corresponding to each control node. That is, the total number of intelligent platform management tools is the same as the number of control nodes of multi-controller storage products.
  • Ipmi Intelligent Platform Management Interface, Intelligent Platform Management Interface
  • Ipmi Intelligent Platform Management Interface
  • Each intelligent platform management tool communicates with all BMCs of the storage product to access each BMC through the intelligent platform management tool. In this way, each control node can access each BMC to obtain all hardware data through the Ipmitool tool, which is simple and efficient.
  • the operating system layer 3 can also be used to perform firmware upgrade operations on each BMC through the first chassis management controller or the second chassis management controller.
  • the firmware can be upgraded through Yafu-upgrade.sh-firmare.sh.
  • the application layer 4 of this embodiment may include multiple high-definition monitors corresponding to each control node, that is, the total number of high-definition monitors.
  • the number of control nodes is the same as that of multi-controller storage products.
  • Each high-definition monitor is used to obtain the hardware data information cached by all BMCs by calling the corresponding BMC interface, and implement hardware management by polling each BMC and the main BMC; each high-definition monitor is connected to the cluster management center to control the corresponding
  • the hardware data information obtained by the node by accessing all BMCs is synchronized to the cluster management center.
  • the application layer 4 can also call the interface through other monitors or other methods to connect to the operating system layer 3 and perform BMC data acquisition operations, which does not affect the implementation of this application.
  • this application also provides a feasible selection method of the main BMC, which may include the following:
  • the processor can further be used to: each BMC pre-set the physical number used to perform the main BMC switching sequence; obtain the heartbeat status information of each BMC; if it is detected that the main BMC is not in place or is abnormal, determine whether it is consistent with the physical number of the main BMC. Whether the next candidate BMC adjacent to the physical number of the primary BMC is in place and normal; if the next candidate BMC adjacent to the physical number of the primary BMC is in place and normal, the candidate BMC is used as the current primary BMC.
  • each control node communicates with each other.
  • the physical locations of each control node of a multi-controller storage product are 1-n from left to right.
  • Each BMC can communicate with other BMCs once every 5 seconds.
  • the heartbeat informs the heartbeat status of itself. If the BMC of control node 1, that is, BMC1, is in place and normal, BMC1 is the master. If the BMC of control node 1 is not in place or abnormal, and the BMC of control node 2, that is, BMC2, is in place and normal, Then select BMC2 as the main BMC, and so on in the order of 1-n.
  • each BMC in this embodiment also sets a virtual IP for the main BMC.
  • the main BMC has Virtual IP, each control node of the cluster accesses the main BMC through the virtual IP; the processor is also used to drift the virtual IP to the current main BMC when a main BMC switch is detected.
  • Each control node or upper-layer business does not need to query the IP of the main BMC. , you can always use this virtual IP to access the main BMC, and the virtual IP can be dynamically drifted to the current main BMC to achieve decoupling between devices.
  • the above-mentioned chassis management system may also include a display screen, an input and output interface, a communication interface or network interface, a power supply, and a communication bus.
  • the display screen and input and output interfaces such as keyboard (Keyboard) belong to the user interface, and optional user interfaces may also include standard wired interfaces, wireless interfaces, etc.
  • the display may be an LED display, a liquid crystal display, a touch-controlled liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
  • the display also appropriately called a display screen or display unit, is used for displaying information processed by the chassis management system in performing chassis management and for displaying a visual user interface.
  • the communication interface may optionally include a wired interface and/or a wireless interface, such as a WI-FI interface, a Bluetooth interface, etc., which are usually used to establish communication connections between the chassis management system and other electronic devices.
  • the communication bus can be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus, etc.
  • chassis management methods involved in the chassis management system in the above embodiments are implemented in the form of software functional units and sold or used as independent products, they can be stored in a non-volatile computer-readable from the storage medium.
  • the technical solution of the present application is essentially or contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a non-volatile computer software. All or part of the steps of the methods of various embodiments of the present application can be executed in a readable storage medium.
  • non-volatile readable storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), electrically erasable programmable ROM , register, hard disk, multimedia card, card-type memory (such as SD or DX memory, etc.), magnetic memory, removable disk, CD-ROM, magnetic disk or optical disk and other media that can store program code.
  • FIG 2 is a schematic flowchart of the steps of a chassis management method provided by some embodiments of the present application.
  • the chassis management method is mainly applied to the chassis management system shown in Figure 1, including the chassis management system including hardware layer, firmware layer, operating system layer, application layer and cluster management center.
  • the hardware layer includes shared devices accessed by the main BMC and single-control devices managed by each BMC.
  • the firmware layer includes the chassis management control module, multiple BMCs and their corresponding processor.
  • Step S201 collect the chassis information of the chassis management system through the shared device, and interconnect each control node of the cluster in the chassis management system with each BMC respectively;
  • shared devices include chassis hardware, network management boards, and chassis power supplies installed on the chassis.
  • chassis information of the chassis management system can collect the chassis information of the chassis management system and indicate the chassis information through the chassis hardware, and then interconnect each control node of the cluster with each BMC through the network interconnection function provided by the network management board.
  • the chassis hardware includes any one or any combination of the following: backplane VPD, chassis LED, chassis temperature sensor.
  • chassis electronic label information is usually obtained through the backplane VPD, and/or the chassis fault information and chassis alarm information are indicated through the chassis LED, and/or the chassis is measured through the chassis temperature sensor. ambient temperature.
  • the firmware layer can be connected to each BMC through the chassis management control module in the firmware layer.
  • the network binding mode is set to the main and backup modes so that each processor in the subsequent firmware layer can manage the single-control belonging device of the corresponding control node and select the main BMC from each BMC. The operation creates conditions.
  • the chassis management control module includes a first chassis management controller and a second chassis management controller. At this time, both the first chassis management controller and the second chassis management controller can be connected to each BMC for network communication. Under this condition, network redundancy is performed through the first chassis management controller and the second chassis management controller, and the network bonding mode is set to the primary and backup modes.
  • Step S202 Manage the single-control belonging device of the corresponding control node through each processor in the firmware layer, and select the main BMC from each BMC;
  • single-control belonging devices include any one or any combination of the following: CAN VPD, CAN LED, CAN sensor, fan, IO expansion card.
  • each processor can be used to obtain the CAN VPD used to obtain the controller electronic tag information of the corresponding control node, and/or the CAN LED used to indicate the node fault information or node alarm information or node positioning information of the corresponding control node.
  • a CAN sensor used to collect node temperature information and node voltage information of the corresponding control node, and/or, an IO expansion card used for link expansion of the storage front end or storage back end for management, and based on each
  • the primary BMC is selected from among the BMCs.
  • the processor selects the main BMC from each BMC, it can also switch the main BMC.
  • the processor may be used to drift the virtual IP to the current primary BMC when a primary BMC switch is detected.
  • each BMC of the processor can pre-set the physical number for executing the main BMC switching sequence, and obtain the heartbeat status information of each BMC. If it is detected that the main BMC is not in place or is abnormal, it is determined that the physical number is consistent with the main BMC. Whether the next candidate BMC adjacent to the physical number of the primary BMC is in place and normal; if the next candidate BMC adjacent to the physical number of the primary BMC is in place and normal, the candidate BMC is used as the current primary BMC.
  • Step S203 Communicate with the selected main BMC and each BMC except the main BMC in the fixed layer through the operating system layer, and access the main BMC and each BMC except the main BMC based on the network interconnection function provided by the chassis management control module.
  • the operating system layer includes multiple intelligent platform management tools corresponding to each control node.
  • the accessed Each BMC may include a master BMC selected by each processor, and each BMC other than the master BMC, that is, a slave BMC.
  • the operating system layer can also perform firmware upgrade operations on each accessible BMC.
  • the firmware upgrade operation for each BMC can be performed based on the first chassis management controller or the second chassis management controller through the operating system layer.
  • Step S204 call the BMC interface through the application layer to access the operating system layer, and obtain the hardware data information cached by the main BMC and the hardware data information cached by each BMC except the main BMC through the chassis management control module;
  • the hardware data information cached by the main BMC can include chassis information collected by the shared device and other hardware data information.
  • BMCs other than the main BMC, that is, the hardware cached by the slave BMC The data information only contains the corresponding hardware data information and does not include chassis information. That is, the hardware data information cached by the main BMC can include the chassis information collected by the shared device, so that all hardware data information can be sent uniformly based on the control nodes corresponding to the main BMC and the slave BMC.
  • the application layer when calling the BMC interface to access the operating system layer through the application layer, the application layer includes multiple high-definition monitors corresponding to each control node.
  • the corresponding BMC interface can be called through each high-definition monitor to obtain all BMC cached information.
  • Hardware data information and realizes hardware management by polling each BMC and the main BMC.
  • each high-definition monitor is connected to the cluster management center.
  • the hardware data information obtained by the corresponding control node by accessing all BMCs can also be synchronized to the cluster management center through each high-definition monitor.
  • Step S205 The cluster management center accesses the hardware data information of all chassis obtained by the main BMC and each BMC except the main BMC based on the control nodes corresponding to the main BMC and each BMC except the main BMC.
  • the main BMC has a virtual IP.
  • each BMC can be associated with each controller node of the storage product, and access to the chassis information collected by the shared device is implemented through the selected master BMC, based on the associated Multiple controller nodes realize unified access to other control nodes, and uniformly send the collected data to the cluster management center based on each control node to maintain data consistency and ensure data synchronization on different control nodes.
  • this application also takes a 4-control unified high-end storage chassis management system as an example to illustrate the unified high-end storage chassis management architecture, as shown in Figure 3, which may include the following content :
  • the unified high-end storage chassis management architecture includes the hardware layer, firmware layer, operating system layer, APP layer and cluster management center from bottom to top.
  • the hardware layer includes the single-control belonging devices corresponding to the four controllers and the shared devices that all four controllers need to access;
  • the firmware layer includes the BMC and CPLD corresponding to each controller, as well as CMC1 and CMC2;
  • the operating system layer includes The four controllers correspond to the Ipmitool and perform the firmware upgrade of the corresponding BMC and the upgrade of the corresponding PSU through CMC1 or CMC2.
  • the APP layer includes HD monitor and BMC interfaces corresponding to four controllers. The HD monitor of each controller is connected to the cluster management center.
  • each Ipmitool is connected to the corresponding BMC interface, and the other end is connected to the four controllers. connected to the BMC.
  • CMC1 and CMC2 are connected to each Ipmitool and each BMC respectively.
  • Each CPLD is interconnected, and each BMC is connected to the corresponding single-control belonging device and shared device through I2C.
  • each single-control device can include: CAN VPD/CAN LED/sensor/fantry/IO expansion card.
  • Shared devices can include backplane VPD/chassis LED/CMC network management board/chassis temperature sensor. Only the main BMC among the four BMCs can access shared devices.
  • BMC is responsible for information collection and monitoring of hardware such as single-control devices, shared devices, CPLD, CMC1 and CMC2, including monitoring VPD reading and writing, LED status access, temperature reading, voltage reading, CMC network status reading, etc.
  • CPLD is used for I2C/GPIO address allocation and direct control and management of the above hardware, such as VPD recording information that needs to be persisted, LED setting positioning indication, alarm indication, status indication, temperature, voltage, fan speed control, etc.
  • the CMC management board is responsible for network communication.
  • CMC1 and CMC2 implement network redundancy and adopt bond1 active and standby mode.
  • Each Ipmitool at the operating system layer is responsible for communicating with all BMCs.
  • the controller can access four BMC devices through Ipmitool.
  • HD monitor is used to call BMC interface information to collect BMC cached hardware information. That is to say, HD monitor can read all the hardware information cached on all BMCs through the Ipmitool tool and cache it in HD monitor.
  • Hardware management can also be achieved by polling 4 BMCs and main BMC information, such as LED setting positioning indication, status indication, temperature, voltage, fan speed control, etc.
  • the cluster management center is responsible for unified management. Each controller can collect complete information of the entire chassis through four BMCs, and then synchronize it to the cluster management center to achieve data link redundancy and data consistency.
  • the CPLDs of the four controllers communicate with each other to select the main BMC, BMC1, BMC2, BMC3, and BMC4 for example testing and manage the status of all single-node hardware, that is, the single-control device.
  • the main BMC For example, BMC1 is responsible for the instance test management of shared devices. If BMC1 fails to access the shared device, CPLD switches the main BMC1 to BMC2.
  • the controller Hd monitor1, Hd monitor2, Hd monitor3, and Hd monitor4 services access Ipmitool through the BMC interface layer, and then access 4 BMC, BMC2, BMC3, and BMC4 through CMC1, ultimately realizing hardware access management.
  • the operating system layer will switch the network to CMC2, and the HD monitors of the four controllers will obtain the hardware status and upload it to the cluster management center.
  • this embodiment can achieve link, network and node redundancy. For example, if the BMC1 of controller 1 fails to access the PSU (Power supply unit, power supply module) and the I2C link fails, BMC1 notifies the controller CPLD of 1, each CPLD communicates with each other, selects BMC2 as the main BMC, and switches the main BMC to BMC2 of controller 2. BMC2 of controller 2 accesses the PSU. Each controller accesses BMC2 to obtain PSU data through the CMC1 network to implement hardware Link redundancy. By default, all controllers access BMC1, BMC2, BMC3, and BMC4 through CMC1, and finally obtain all hardware status.
  • PSU Power supply unit, power supply module
  • CMC1 fails or is unplugged, the network automatically switches to CMC2, and all controllers access BMC1, BMC2, BMC3, and BMC4 through CMC2. , and finally obtain all hardware status.
  • each controller can manage four chassis through BMC1, BMC2, BMC3, and BMC4. If an operating system layer failure occurs on three controllers, such as OS1, OS2, or OS3 failure, the entire chassis can still manage BMC1 through OS4. , monitor the hardware status of BMC2, BMC3, and BMC4 and issue commands.
  • some embodiments of the present application realize that a single control can monitor the hardware status of the entire chassis, realize network redundancy, and improve link reliability; realize link redundancy and improve storage product reliability; and realize node redundancy and improve reliability. stability; after a single control obtains the full amount of information, it is uniformly sent to the cluster to achieve data consistency, enabling full collection of single node data to improve data consistency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Hardware Redundancy (AREA)

Abstract

一种机箱管理系统及机箱管理方法,机箱管理系统包括硬件层、固件层、操作系统层、应用层及集群管理中心;硬件层包括由主BMC访问、用于采集机箱信息以及将集群的每个控制节点分别与各BMC进行网络互联的共享器件和多个由对应控制节点的BMC管理的单控归属器件。固件层包括机箱管理控制模块、多个BMC及其对应的处理器;各处理器用于从各BMC中选择出主BMC;机箱管理控制模块实现网络通信。操作系统层与各BMC进行通信,应用层通过调用BMC接口访问操作系统层,并通过机箱管理控制模块获取各BMC缓存的硬件数据信息;集群管理中心管理,每个控制节点通过访问所有BMC所获取的所有机箱的硬件数据信息,可有效提高多控存储器的机箱管理效率。

Description

机箱管理系统及机箱管理方法
相关申请的交叉引用
本申请要求于2022年03月22日提交中国专利局,申请号为202210279484.2,申请名称为“机箱管理系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储技术领域,特别是涉及一种机箱管理系统、以及相应的一种机箱管理方法。
背景技术
BMC(Baseboard Manager Controller,基板管理控制器)为伺服器远端管理控制器,可用于实现多控制器的存储产品的机箱管理。在服务器未开机状态下,可执行服务器设备的固件升级、机器设备查询等一些操作。相较双控的统一存储,统一高端存储的存储性能更好,可靠性更高。
统一高端存储采用一框2个控制器进行设备管理,每个控制器对应一个BMC,机箱管理过程中,采用主从同步方式实现各BMC之间的数据同步,也即每个BMC要将所有硬件状态数据同步到其他3个BMC上,这种方式数据同步信息量大,时效性差,机箱管理效率不高。
发明内容
本申请提供了一种机箱管理系统,有效提高了多控制器的存储产品的机箱管理效率。
为解决上述技术问题,本申请一些实施例提供以下技术方案:
本申请一些实施例提供了一种机箱管理系统,包括硬件层、固件层、操作系统层、应用层及集群管理中心;
硬件层包括由主BMC访问的共享器件和由每个BMC管理的单控归属器件;共享器件用于采集机箱信息以及用于将集群的每个控制节点分别与各BMC进行网络互联;
固件层包括机箱管理控制模块、多个BMC及其对应的处理器;各处理器用于对相应控制节点的单控归属器件进行管理,并从各BMC中选择主BMC;机箱管理控制模块用于实现网络通信;
操作系统层用于与各BMC进行通信,并访问各BMC;
应用层用于通过调用BMC接口访问操作系统层,并通过机箱管理控制模块获取各BMC缓存的硬件数据信息;
集群管理中心用于管理每个控制节点通过访问所有BMC所获取的所有机箱的硬件数据信 息。
可选的,共享器件包括设置在机框上的机箱硬件、网络管理板和机箱电源;共享器件通过I2C与每个BMC连接;
机箱硬件用于采集机箱信息以及指示机箱信息;
网络管理板用于提供网络互连功能,以将集群的每个控制节点分别与各BMC进行互联。
可选的,机箱硬件包括以下任意一项或任意组合:
背板VPD、机箱LED、机箱温度传感器;
其中,背板VPD用于获取机箱电子标签信息;机箱LED用于指示机箱故障信息和机箱告警信息;机箱温度传感器用于测量机箱环境温度。
可选的,机箱管理控制模块包括第一机箱管理控制器和第二机箱管理控制器;
第一机箱管理控制器和第二机箱管理控制器均与各BMC相连,用于实现网络通信;
第一机箱管理控制器和第二机箱管理控制器实现网络冗余,且网络绑定模式为主备模式。
可选的,单控归属器件包括以下任意一项或任意组合:
CAN VPD、CAN LED、CAN传感器、风扇、IO扩展卡;
其中,CAN VPD用于获取相应控制节点的控制器电子标签信息;CAN LED用于指示相应控制节点的节点故障信息或节点告警信息或节点定位信息;CAN传感器用于采集相应控制节点的节点温度信息和节点电压信息;IO扩展卡用于对存储前端或存储后端进行链接扩展。
可选的,操作系统层包括与各控制节点对应的多个智能平台管理工具;
每个智能平台管理工具均与所有BMC通信,以通过智能平台管理工具访问各BMC。
可选的,操作系统层还用于通过第一机箱管理控制器或第二机箱管理控制器执行对各BMC的固件升级操作。
可选的,应用层包括与每个控制节点对应的多个高清监视器;
各高清监视器用于通过调用相应的BMC接口获取所有BMC缓存的硬件数据信息,并通过轮询各BMC和主BMC实现对硬件的管理;
各高清监视器均与集群管理中心相连,以将相应控制节点通过访问所有BMC所得的硬件数据信息同步至集群管理中心。
可选的,主BMC具有虚拟IP,集群的各控制节点通过虚拟IP访问主BMC;
处理器还用于当检测到主BMC切换时,将虚拟IP漂移至当前主BMC。
可选的,处理器进一步用于:
各BMC预先设置用于执行主BMC切换顺序的物理编号;
获取各BMC的心跳状态信息;
若检测到主BMC不在位或者是异常,则判断与主BMC的物理编号相邻的下一个候选BMC是否在位且正常;
若与主BMC的物理编号相邻的下一个候选BMC在位且正常,则将候选BMC作为当前主BMC。
本申请提供的技术方案的优点在于,通过网络互联技术,将每个BMC均与存储产品的每个控制器节点进行关联,实现各控制节点可以同时且实时访问多个BMC所采集的数据,不仅实现了链路和控制节点的冗余,有利于提高存储产品的可靠性,还可通过单个控制节点监控整个机箱的硬件状态,提高机箱管理效率。此外,每个控制节点可将所采集的数据统一发送给集群管理中心,保持数据一致,各BMC之间无需耗费时间进行数据同步,时效性高,可进一步提升存储产品的机箱管理效率,而且由于每个控制节点获取的均是全量信息,还可进一步提高数据一致性。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本公开。
附图说明
为了更清楚的说明本申请一些实施例或相关技术的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请一些实施例提供的机箱管理系统的一种具体实施方式结构图;
图2为本申请一些实施例提供的机箱管理方法的步骤流程示意图;
图3为本申请一些实施例提供的机箱管理系统的另一种具体实施方式结构图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等是用于区别不同的对象,而不是用于描述特定的顺序。此外术语“包括”和“具有”以及他们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方 法、系统、产品或设备没有限定于已列出的步骤或单元,而是可包括没有列出的步骤或单元。
在介绍了本申请一些实施例的技术方案后,下面详细的说明本申请的各种非限制性实施方式。
首先参见图1,图1为本申请一些实施例提供的一种机箱管理系统的一种具体实施方式结构图,本申请一些实施例可包括以下内容:
机箱管理系统可包括硬件层1、固件层2、操作系统层3、应用层4及集群管理中心5。
其中,硬件层1包括多控制器存储产品的硬件器件,该层包括两类硬件,一类硬件为由每个控制节点的BMC单独进行访问并管理的硬件,也即只归属每个控制节点的硬件,为了便于描述,称该类硬件为单控归属器件,多控制器存储产品的每个控制节点均对应一组单控归属器件,单控归属器件的组数与多控存储产品的控制器或者是说控制节点总数相同,且每个控制节点所包含的单控归属器件的种类和个数均相同。每个单控归属器件可通过任何一种总线与所属BMC进行连接。为了便于描述,另一类硬件可称为共享器件,共享器件只能由主BMC进行访问和管理,且共享器件只有一组。由于主BMC是会从各BMC中选出来的,且若原主BMC故障或无法承载业务时,主BMC会变化,所以共享器件可通过任何一种总线与每个控制节点的BMC均连接。本实施例的共享器件可用于采集整个机箱信息以及用于将集群的每个控制节点分别与各BMC进行网络互联。
在本实施例中,固件层2为写入EPROM(可擦写可编程只读存储器)或EEPROM(电可擦可编程只读存储器)中的所有程序,包括硬件层1的各硬件设备内部存储的驱动程序。固件层2除了包括每个控制器对应的BMC之外,还包括用于对相应控制节点的单控归属器件进行管理、以及负责为当前存储产品选择主BMC的处理器,每个BMC与一个处理器相连,也即处理器的个数与多控存储产品的控制器的总数相同。处理器可以包括一个或多个处理核心,比如4核心处理器、8核心处理器,处理器还可为控制器、微控制器、微处理器或其他数据处理芯片等。处理器可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)、CPLD(Complex Programmable logic device,复杂可编程逻辑器件)中的至少一种硬件形式来实现。当然,处理器也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器甚至可以集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要 显示的内容如存储产品中存储的数据信息的渲染和绘制。一些实施例中,处理器还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。考虑到整个系统成本,本实施例的处理器可为CPLD。由于硬件层1的共享器件中包括实现将集群中每个控制节点与各BMC进行网络互连的硬件设备,相应的,固件层2可包括机箱管理控制模块,机箱管理控制模块用于实现网络通信。
在本实施例中,操作系统层3用于与各BMC进行通信,并访问各BMC;操作系统层3可提供实现与BMC通信的工具,每个工具对应一个控制节点,每个工具的一端与所对应的应用层4的BMC接口相连,另一端通过任何一种总线分别与每个BMC相连。操作系统层3提供应用层4与固件层2的BMC的连接通道,从而使得应用层4可通过调用BMC接口访问操作系统层3,通过操作系统层3所提供的通道,基于机箱管理控制模块所提供的网络互连功能去访问BMC,进而实现获取各BMC缓存的硬件数据信息。本实施例的应用层4中的BMC接口包括多个,一个BMC对应一个BMC接口,用户可通过应用层4所提供的人机交互页面、通过任意一个BMC接口,获取所有BMC所采集的数据,也即每个控制节点可以获取多控存储产品的每个BMC所采集的全量数据,也就是说,每个控制节点可以获取到完全相同、可反映各机箱运行状态信息的全量数据。在控制节点获取得到全量数据之后,可将获取的所有数据统一发送至集群管理中心5。集群管理中心5用于管理每个控制节点通过访问所有BMC所获取的所有机箱的硬件数据信息,从而实现对多控制器的存储产品的机箱管理与控制。
在本申请一些实施例提供的技术方案中,通过网络互联技术,将每个BMC均与存储产品的每个控制器节点进行关联,实现各控制节点可以同时且实时访问多个BMC所采集的数据,不仅实现了链路和控制节点的冗余,有利于提高存储产品的可靠性,还可通过单个控制节点监控整个机箱的硬件状态,提高机箱管理效率。此外,每个控制节点可将所采集的数据统一发送给集群管理中心,保持数据一致,各BMC之间无需耗费时间进行数据同步,时效性高,可进一步提升存储产品的机箱管理效率,而且由于每个控制节点获取的均是全量信息,还可进一步提高数据一致性。
上述实施例对硬件层1所包含的硬件并没有进行任何限定,基于上述实施例,作为一种可选的实施方式,硬件层1的结构可包括下述内容:
本实施例的单控归属器件包括以下任意一项或任意组合:CAN(Controller Area Network,控制器局域网络)VPD(Vital Product Dat,重要产品数据)、CAN LED(light-emitting diode,发光二极管)、CAN传感器、fantry(风扇)、IO扩展卡 (Input/Output,输入/输出)。当然,每个控制节点的控制器也属于单控归属器件。
其中,CAN VPD用于获取相应控制节点的控制器电子标签信息;VPD是与一组特定硬件或软件相关的配置和信息数据的集合,其存储该设备的一些重要信息,例如部件号(part number),序列号(serial number)、需要持久化信息以及设备指定的一些数据CAN LED作为节点指示灯,用于指示相应控制节点的节点故障信息或节点告警信息或节点定位信息,也即用于定位或告警指示。CAN传感器用于采集相应控制节点的节点温度信息和节点电压信息,相应的,CAN传感器可包括节点温度传感器、节点电压传感器等。fantry用于进行散热。IO扩展卡用于对存储前端或存储后端进行链接扩展。
共享器件是设置在机框上的一组需要所有控制节点均访问的硬件,其可包括机箱硬件、网络管理板和机箱电源;I2C不支持同时访问,同时访问会造成挂死问题,共享器件可通过I2C(Inter-Integrated Circuit,两线式串行总线)与每个BMC连接,由选择出来的主BMC节点进行访问,各控制节点通过访问BMC实现对共享器件的访问。其中,机箱硬件用于采集机箱信息以及指示机箱信息;机箱硬件可包括以下任意一项或任意组合:背板VPD、机箱LED、机箱温度传感器。背板VPD用于获取机箱电子标签信息;机箱LED作为机箱指示灯,用于指示机箱故障信息和机箱告警信息;机箱温度传感器用于测量机箱环境温度。网络管理板用于提供网络互连功能,以将集群的每个控制节点分别与各BMC进行互联。
作为一种可选的实施方式,为了提高整个存储产品的可靠性,可通过双链路实现网络冗余,提高链路可靠性,进而提升存储产品可靠性,基于此,本实施例还可包括:
机箱管理控制模块可包括第一机箱管理控制器CMC1和第二机箱管理控制器CMC2;第一机箱管理控制器和第二机箱管理控制器均与各BMC相连,用于实现网络通信;第一机箱管理控制器CMC1和第二机箱管理控制器CMC2实现网络冗余,且网络绑定模式为主备模式。
在本实施例中,为了提高网络速度,机箱管理控制模块可采用网卡bonding模式,也将多张物理网卡通过软件虚拟成一个虚拟的网卡,配置完毕后,所有的物理网卡的ip和mac将会变成相同的。网络bond模式配置模式包括七种:1.Mode=0(balance-rr,平衡负载模式)表示负载分担round-robin(轮询调度),和交换机的聚合强制不协商的方式配合。2.Mode=1(active-backup,主备模式),只有一块网卡是active(活跃状态),另外一块是备的standby(备用状态),由于交换机往两块网卡发包,有一半包是丢弃的,这时如果交换机配的是捆绑,将不能正常工作。3.Mode=2(balance-xor)表示XOR Hash负载分担,和交换机的聚合强制不协商方式配合。(需要xmit_hash_policy)4.Mode=3(broadcast,广播)表示所有包从所有接口interface发出,这个不均衡,只有冗余机制,和交换机的聚合强制 不协商方式配合。5.Mode=4(802.3ad)表示支持802.3ad协议,和交换机的聚合LACP(Link Aggregation Control Protocol,链路汇聚控制协议)方式配合(需要xmit_hash_policy)。6.Mode=5(balance-tlb(Translation Lookaside Buffer,后备缓存器))是根据每个slave的负载情况选择slave进行发送,接收时使用当前轮到的slave。7.Mode=6(balance-alb(Automatic Loop Back,自动回环),平衡负载模式)在Mode=5的tlb基础上增加了rlb(receive load balance,负载均衡)。为了实现网卡的负载均衡及冗余,本实施例可采用Bond1的主备模式,只有一个slave(从设备)被激活,只有当active(活跃)的slave的接口down(关闭)时,才会激活其它slave接口。主备模式下发生一次故障切换,在新激活的slave接口上会发送一个或者多个gratuitous ARP(免费ARP(Address Resolution Protocol,地址解析协议))请求。主salve接口上以及配置在接口上的所有VLAN(Virtual Local Area Network,虚拟局域网)接口都会发送gratuitous ARP,需要在这些接口上配置了至少一个IP地址。VLAN接口上发送的gratuitous ARP将会附上适当的VLAN id。本模式提供容错能力。在本实施例中,默认是通过CMC1进行通信,如果CMC1故障或不在位,则切换至CMC2进行网络通信。相应的,硬件层1中的共享器件的网络管理板可为CMC管理板。
上述实施例对整个机箱管理架构中与BMC相连所采用的总线并不进行限定,由于I2C为一种简单、双向二线制同步串行总线,只需要两根线即可在连接于总线上的器件之间传送信息,作为一种可选的实施方式,本实施例的各硬件设备可通过I2C与各BMC相连,也即可通过I2C访问硬件设备,相应的,处理器还用于为每个I2C分配一个I2C地址,为每个GPIO(General-purpose input/output,通用型之输入输出)分配相应地址。
上述实施例对操作系统层3提供与BMC进行连接的方式并不进行限定,作为一种可选的实施方式,操作系统层3可包括与各控制节点对应的多个智能平台管理工具Ipmitool,也即智能平台管理工具的总数与多控制器的存储产品的控制节点数相同。Ipmi(Intelligent Platform Management Interface,智能平台管理接口)tool可用在linux系统下的命令行方式,既支持本地操作也支持远端操作,能够不依赖服务器的CPU、内存、存储、电源等独立工作。通过其可以实现获取传感器的信息、显示系统日志内容、网络远程开关机等功能。每个智能平台管理工具均与存储产品的所有BMC通信,以通过智能平台管理工具访问各BMC。这样,每个控制节点可通过Ipmitool工具访问各BMC获取所有硬件数据,简单高效。
进一步的,为了提高整个机箱管理系统的实用性和便捷性,操作系统层3还可用于通过第一机箱管理控制器或第二机箱管理控制器执行对各BMC的固件升级操作。如图2所示,可通 过Yafu-upgrade.sh-firmare.sh实现对固件的升级。
上述实施例对应用层4的软件结构并没有进行任何限定,基于上述实施例,本实施例的应用层4可包括与每个控制节点对应的多个高清监视器,也即高清监视器的总数与多控制器的存储产品的控制节点数相同。各高清监视器用于通过调用相应的BMC接口获取所有BMC缓存的硬件数据信息,并通过轮询各BMC和主BMC实现对硬件的管理;各高清监视器均与集群管理中心相连,以将相应控制节点通过访问所有BMC所得的硬件数据信息同步至集群管理中心。当然,应用层4也可通过其他监控器或者是其他方式调用接口与操作系统层3连接,并执行BMC数据获取操作,这均不影响本申请的实现。
上述实施例对主BMC的选择方式并不进行任何限定,基于上述实施例,本申请还提供了主BMC的一种可行的选择方式,可包括下述内容:
处理器可进一步用于:各BMC预先设置用于执行主BMC切换顺序的物理编号;获取各BMC的心跳状态信息;若检测到主BMC不在位或者是异常,则判断与主BMC的物理编号相邻的下一个候选BMC是否在位且正常;若与主BMC的物理编号相邻的下一个候选BMC在位且正常,则将候选BMC作为当前主BMC。
在本实施例中,每个控制节点的处理器相互通信,多控制器的存储产品的各控制节点,物理位置从左到右依次为1-n,每个BMC与其他BMC每5s可发送一次心跳告知本身心跳状态,如果控制节点1的BMC也即BMC1在位且正常则就BMC1为主,如果控制节点1的BMC不在位或异常,且控制节点2的BMC也即BMC2在位且正常,则选择BMC2为主BMC,按照1-n顺序依次类推。
为了进一步提高机箱管理效率,降低各器件之间的耦合性,基于上述实施例,本实施例的每个BMC除了具备单独的、固定的IP之外,还为主BMC设置虚拟IP,主BMC具有虚拟IP,集群的各控制节点通过虚拟IP访问主BMC;处理器还用于当检测到主BMC切换时,将虚拟IP漂移至当前主BMC,各控制节点或者是上层业务无需查询主BMC的IP,可一直使用该虚拟IP访问主BMC即可,通过将该虚拟IP可动态漂移至当前的主BMC上实现各器件之间的解耦。
在一些实施例中,上述机箱管理系统还可包括有显示屏、输入输出接口、通信接口或者称为网络接口、电源以及通信总线。其中,显示屏、输入输出接口比如键盘(Keyboard)属于用户接口,可选的用户接口还可以包括标准的有线接口、无线接口等。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。显示器也可以适当的称为显示屏或显示单元,用于显示在机箱管理系统在执行机箱管理过程中处理的信息以及用于显示可视化的 用户界面。通信接口可选的可以包括有线接口和/或无线接口,如WI-FI接口、蓝牙接口等,通常用于在机箱管理系统与其他电子设备之间建立通信连接。通信总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。
可以理解的是,如果上述实施例中的机箱管理系统中涉及到的一些机箱管理方法以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个非易失性计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个非易失性可读存储介质中,执行本申请各个实施例方法的全部或部分步骤。而前述的非易失性可读存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电可擦除可编程ROM、寄存器、硬盘、多媒体卡、卡型存储器(例如SD或DX存储器等)、磁性存储器、可移动磁盘、CD-ROM、磁碟或者光盘等各种可以存储程序代码的介质。
具体地,参见图2,图2为本申请一些实施例提供的机箱管理方法的步骤流程示意图,机箱管理方法主要应用于如图1所示的机箱管理系统,包括机箱管理系统包括硬件层、固件层、操作系统层、应用层及集群管理中心,其中硬件层包括由主BMC访问的共享器件和由各BMC管理的单控归属器件,固件层包括机箱管理控制模块、多个BMC及其对应的处理器。
具体包括如下步骤:
步骤S201,通过共享器件采集机箱管理系统的机箱信息,并将机箱管理系统中集群的每个控制节点分别与各BMC进行网络互联;
其中,共享器件包括设置在机框上的机箱硬件、网络管理板和机箱电源。
具体可以通过机箱硬件采集机箱管理系统的机箱信息以及指示机箱信息,然后通过网络管理板所提供的网络互连功能,将集群的每个控制节点分别与各BMC进行互联。
机箱硬件包括以下任意一项或任意组合:背板VPD、机箱LED、机箱温度传感器。
在通过机箱硬件采集机箱信息以及指示机箱信息时,通常通过背板VPD获取机箱电子标签信息,和/或,通过机箱LED指示机箱故障信息和机箱告警信息,和/或,通过机箱温度传感器测量机箱环境温度。
在本申请的一些实施例中,在通过硬件层中的共享器件将集群中的每个控制节点分别与各BMC进行网络互联之后,首先可以通过固件层中的机箱管理控制模块将固件层与各BMC相连 实现网络通信的基础上,将网络绑定模式为主备模式,以便为后续固件层中的各处理器在对相应控制节点的单控归属器件进行管理,并从各BMC中选择主BMC的操作创建条件。
具体地,机箱管理控制模块包括第一机箱管理控制器和第二机箱管理控制器,此时可以通过第一机箱管理控制器和第二机箱管理控制器均与各BMC相连,进行网络通信的情况下,通过第一机箱管理控制器和第二机箱管理控制器进行网络冗余,且将网络绑定模式为主备模式。
步骤S202,通过固件层中各处理器对相应控制节点的单控归属器件进行管理,并从各BMC中选择主BMC;
其中,单控归属器件包括以下任意一项或任意组合:CAN VPD、CAN LED、CAN传感器、风扇、IO扩展卡。
具体地,可以通过各处理器对用于获取相应控制节点的控制器电子标签信息的CAN VPD,和/或,用于指示相应控制节点的节点故障信息或节点告警信息或节点定位信息的CAN LED,和/或,用于采集相应控制节点的节点温度信息和节点电压信息的CAN传感器,和/或,用于对存储前端或存储后端进行链接扩展的IO扩展卡进行管理,并基于每个控制节点的BMC所单独进行的访问从各BMC中选择主BMC。
此外,处理器在从各BMC中选择主BMC后,还可以对主BMC进行切换。
在本申请的一些实施例中,可以通过处理器在检测到主BMC切换时,将虚拟IP漂移至当前主BMC。具体地,可以通过处理器各BMC预先设置执行主BMC切换顺序的物理编号,并获取各BMC的心跳状态信息,其中若检测到主BMC不在位或者是异常,则判断与主BMC的物理编号相邻的下一个候选BMC是否在位且正常;若与主BMC的物理编号相邻的下一个候选BMC在位且正常,则将候选BMC作为当前主BMC。
步骤S203,通过操作系统层与所选择的主BMC以及固定层中除主BMC以外的各BMC进行通信,并基于机箱管理控制模块所提供的网络互连功能访问主BMC以及除主BMC以外的各BMC;
操作系统层包括与各控制节点对应的多个智能平台管理工具,在访问各BMC时,可以通过每个智能平台管理工具与所有BMC通信,并通过智能平台管理工具访问各BMC,其中所访问的各BMC可以包括各处理器所选出的主BMC,以及除主BMC以外的各BMC,即从BMC。
操作系统层除了可以与各BMC进行通信以外,还可以对可访问的各BMC进行固件升级操作。
具体地,可以通过操作系统层基于第一机箱管理控制器或第二机箱管理控制器执行对各BMC的固件升级操作。
步骤S204,通过应用层调用BMC接口访问操作系统层,并通过机箱管理控制模块获取主BMC缓存的硬件数据信息以及除主BMC以外的各BMC缓存的硬件数据信息;
其中,共享器件只能由主BMC进行访问与管理,主BMC缓存的硬件数据信息可以包括共享器件所采集的机箱信息以及其他硬件数据信息,而除主BMC以外各BMC,即从BMC所缓存硬件数据信息仅包含相应的硬件数据信息,不包括机箱信息。即主BMC缓存的硬件数据信息可以包括共享器件采集的机箱信息,以便后续在基于与主BMC以及从BMC相应的控制节点可以实现对所有硬件数据信息的统一发送。
具体地,在通过应用层调用BMC接口访问操作系统层时,应用层包括与每个控制节点对应的多个高清监视器,此时可以通过各高清监视器调用相应的BMC接口获取所有BMC缓存的硬件数据信息,并通过轮询各BMC和主BMC实现对硬件的管理。其中,各高清监视器均与集群管理中心相连,此时还可可以通过各高清监视器将相应控制节点通过访问所有BMC所得的硬件数据信息同步至集群管理中心。
步骤S205,通过集群管理中心基于与主BMC以及除主BMC以外的各BMC相应的控制节点,访问主BMC以及除主BMC以外的各BMC所获取的所有机箱的硬件数据信息。
其中,主BMC具有虚拟IP,此时可以通过集群的各控制节点基于虚拟IP访问主BMC,以及基于任何一种与每个控制节点的BMC均连接的总线访问除主BMC以外的各BMC,得到主BMC以及除主BMC以外的各BMC相应的控制节点,访问主BMC以及除主BMC以外的各BMC所获取的所有机箱的硬件数据信息。
在本申请的一些实施例中,可以将每个BMC均与存储产品的每个控制器节点进行关联,通过所选出的主BMC实现对共享器件所采集的机箱信息的访问,基于所关联的多个控制器节点实现其他控制节点的统一访问,并基于每个控制节点将所采集的数据统一发送给集群管理中心,保持数据的一致,进而保证不同控制节点上的数据同步。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请一些实施例并不受所描述的动作顺序的限制,因为依据本申请一些实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请一些实施例所必须的。
为了使所属领域技术人员更加清楚明白本申请的技术方案,本申请还以4控统一高端存储的机箱管理系统为例阐述统一高端存储的机箱管理架构,如图3所示,可包括下述内容:
统一高端存储的机箱管理架构由底至上分别包括硬件层、固件层、操作系统层、APP层和集群管理中心。硬件层包括四个控制器分别对应的单控归属器件和4个控制器均需要访问的共享器件;固件层包括每个控制器分别对应的BMC和CPLD,还包括CMC1和CMC2;操作系统层包括4个控制器分别对应的Ipmitool以及通过CMC1或CMC2执行对相应BMC的固件升级、相应的PSU的升级操作。APP层包括四个控制器分别对应的HD monitor和BMC接口,每个控制器的HD monitor均有集群管理中心相连,每个Ipmitool的一端与相应的BMC接口相连,另一端均与四个控制器的BMC相连。CMC1、CMC2分别与每个Ipmitool和每个BMC相连。各CPLD之间互联,每个BMC通过I2C与相应的单控归属器件和共享器件相连。
其中,对于每个单控归属器件,其可包括:CAN VPD/CAN LED/传感器/fantry/IO扩展卡。共享器件可包括背板VPD/机箱LED/CMC网络管理板/机箱温度传感器,4个BMC只有主BMC可以访问共享器件。BMC负责硬件如单控归属器件、共享器件、CPLD、CMC1和CMC2的信息采集和监控,包括监控VPD读写,LED状态访问,温度读取,电压读取,CMC网络状态读取等。CPLD用于为I2C/GPIO地址分配以及上述硬件的直接控制管理,如VPD记录需要持久化信息,LED设置定位指示,告警指示,状态指示,温度、电压、风扇转速控制等,还负责从四个BMC中选择主BMC。CMC管理板负责网络通信,CMC1和CMC2实现网络冗余,并采用bond1主备模式。操作系统层的各Ipmitool负责和所有BMC通信,控制器可通过Ipmitool访问4个BMC的器件。Hd monitor用于调用BMC接口信息采集BMC缓存的硬件信息。也即Hd monitor可将所有BMC上缓存的所有硬件信息通过Ipmitool工具读取出来,并缓存到Hd monitor中。还可可通过轮询4个BMC和主BMC信息实现硬件的管理,诸如LED设置定位指示,状态指示,温度、电压、风扇转速控制等。集群管理中心负责统一管理,每个控制器通过4个BMC可以采集整个机箱完整信息,然后再同步到集群管理中心,实现数据链路的冗余,数据的一致。
基于上述机箱管理架构,系统上电后,4个控制器的CPLD相互通信以选取主BMC,BMC1、BMC2、BMC3、BMC4例测以及管理所有单节点硬件也即单控归属器件的状态,主BMC如BMC1负责共享器件的例测管理。如果BMC1访问共享器件故障,CPLD切换主BMC1到BMC2上。当操作系统服务开始运行,控制器Hd monitor1、Hd monitor2、Hd monitor3、Hd monitor4服务通过BMC接口层访问Ipmitool,再通过CMC1访问4个BMC、BMC2、BMC3、BMC4,最终实现硬件访问管理。在机箱管理过程中,如果CMC1故障了,操作系统层会切换网络到CMC2上,4个控制器的Hd monitor将硬件状态获取后上传给集群管理中心。
基于上述机箱管理架构,本实施例可实现链路、网络和节点冗余,举例来说,控制器1的BMC1访问PSU(Power supply unit,电源模块)故障,I2C链路故障,BMC1通知控制器1的 CPLD,各CPLD相互通信,选择BMC2为主BMC,主BMC切换到控制器2的BMC2上,控制器2的BMC2访问PSU,每个控制器通过CMC1网络,访问BMC2获取PSU数据,实现硬件链路冗余。默认情况下,所有控制器通过CMC1访问BMC1、BMC2、BMC3、BMC4,最终获取所有硬件状态,如果CMC1故障或拔插,网络自动切换为CMC2,所有控制器通过CMC2访问BMC1、BMC2、BMC3、BMC4,最终获取所有硬件状态。默认情况下,每个控制器可以通过BMC1、BMC2、BMC3、BMC4管理4个机箱,如果3个控制器发生操作系统层故障,如OS1,OS2,OS3故障,整个机箱依然可以通过OS4实现对BMC1、BMC2、BMC3、BMC4的硬件状态的监控以及命令的下发。
由上可知,本申请一些实施例实现单控可以监控整个机箱的硬件状态,实现网络冗余,提高链路可靠性;实现链路冗余,提高存储产品可靠性;实现节点冗余,提高可靠性;单控获取全量信息后统一发给集群实现数据一致,实现单节点全量采集,提高数据一致性。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
以上对本申请所提供的一种机箱管理系统以及相应的一种机箱管理方法进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。

Claims (21)

  1. 一种机箱管理系统,其特征在于,包括硬件层、固件层、操作系统层、应用层及集群管理中心;
    所述硬件层包括由主BMC访问的共享器件和由每个BMC管理的单控归属器件;所述共享器件用于采集机箱信息以及用于将集群的每个控制节点分别与各BMC进行网络互联;
    所述固件层包括机箱管理控制模块、多个BMC及其对应的处理器;各处理器用于对相应控制节点的单控归属器件进行管理,并从各BMC中选择主BMC;所述机箱管理控制模块用于实现网络通信;
    所述操作系统层用于与各BMC进行通信,并访问各BMC;
    所述应用层用于通过调用BMC接口访问所述操作系统层,并通过所述机箱管理控制模块获取各BMC缓存的硬件数据信息;
    所述集群管理中心用于管理每个控制节点访问所有BMC所获取的所有机箱的硬件数据信息。
  2. 根据权利要求1所述的机箱管理系统,其特征在于,所述共享器件包括设置在机框上的机箱硬件、网络管理板和机箱电源;所述共享器件通过I2C与每个BMC连接;
    所述机箱硬件用于采集机箱信息以及指示机箱信息;
    所述网络管理板用于提供网络互连功能,以将集群的每个控制节点分别与各BMC进行互联。
  3. 根据权利要求2所述的机箱管理系统,其特征在于,所述机箱硬件包括以下任意一项或任意组合:
    背板VPD、机箱LED、机箱温度传感器;
    其中,所述背板VPD用于获取机箱电子标签信息;所述机箱LED用于指示机箱故障信息和机箱告警信息;所述机箱温度传感器用于测量机箱环境温度。
  4. 根据权利要求2所述的机箱管理系统,其特征在于,所述机箱管理控制模块包括第一机箱管理控制器和第二机箱管理控制器;
    所述第一机箱管理控制器和所述第二机箱管理控制器均与各BMC相连,用于实现网络通信;
    所述第一机箱管理控制器和所述第二机箱管理控制器实现网络冗余,且网络绑定模式为主备模式。
  5. 根据权利要求4所述的机箱管理系统,其特征在于,所述操作系统层还用于通过所述第一机箱管理控制器或所述第二机箱管理控制器执行对各BMC的固件升级操作。
  6. 根据权利要求1所述的机箱管理系统,其特征在于,所述单控归属器件包括以下任意一项或任意组合:
    CAN VPD、CAN LED、CAN传感器、风扇、IO扩展卡;
    其中,所述CAN VPD用于获取相应控制节点的控制器电子标签信息;所述CAN LED用于指示相应控制节点的节点故障信息或节点告警信息或节点定位信息;所述CAN传感器用于采集相应控制节点的节点温度信息和节点电压信息;所述IO扩展卡用于对存储前端或存储后端进行链接扩展。
  7. 根据权利要求1所述的机箱管理系统,其特征在于,所述操作系统层包括与各控制节点对应的多个智能平台管理工具;
    每个智能平台管理工具均与所有BMC通信,以通过所述智能平台管理工具访问各BMC。
  8. 根据权利要求1所述的机箱管理系统,其特征在于,所述应用层包括与每个控制节点对应的多个高清监视器;
    各高清监视器用于通过调用相应的BMC接口获取所有BMC缓存的硬件数据信息,并通过轮询各BMC和所述主BMC实现对硬件的管理;
    各高清监视器均与所述集群管理中心相连,以将相应控制节点通过访问所有BMC所得的硬件数据信息同步至所述集群管理中心。
  9. 根据权利要求1至8任意一项所述的机箱管理系统,其特征在于,所述主BMC具有虚拟IP,所述集群的各控制节点通过所述虚拟IP访问所述主BMC;
    所述处理器还用于当检测到所述主BMC切换时,将所述虚拟IP漂移至当前主BMC。
  10. 根据权利要求9所述的机箱管理系统,其特征在于,所述处理器进一步用于:
    各BMC预先设置用于执行主BMC切换顺序的物理编号;
    获取各BMC的心跳状态信息;
    若检测到所述主BMC不在位或者是异常,则判断与所述主BMC的物理编号相邻的下一个候选BMC是否在位且正常;
    若与所述主BMC的物理编号相邻的下一个候选BMC在位且正常,则将所述候选BMC作为当前主BMC。
  11. 根据权利要求1至10任意一项所述的机箱管理系统,其特征在于,所述单控归属器件为由每个控制节点的BMC单独进行访问并管理的硬件。
  12. 一种机箱管理方法,其特征在于,应用于如权利要求1至11任意一项所述的机箱 管理系统,所述机箱管理系统包括硬件层、固件层、操作系统层、应用层及集群管理中心,其中所述硬件层包括由主BMC访问的共享器件和由各BMC管理的单控归属器件,所述固件层包括机箱管理控制模块、多个BMC及其对应的处理器,所述方法包括:
    通过所述共享器件采集所述机箱管理系统的机箱信息,并将所述机箱管理系统中集群的每个控制节点分别与各BMC进行网络互联;
    通过所述固件层中各处理器对相应控制节点的单控归属器件进行管理,并从各BMC中选择主BMC;
    通过所述操作系统层与所选择的主BMC以及所述固定层中除所述主BMC以外的各BMC进行通信,并基于所述机箱管理控制模块所提供的网络互连功能访问所述主BMC以及除所述主BMC以外的各BMC;
    通过所述应用层调用BMC接口访问所述操作系统层,并通过所述机箱管理控制模块获取所述主BMC缓存的硬件数据信息以及除所述主BMC以外的各BMC缓存的硬件数据信息;其中,所述主BMC缓存的硬件数据信息包括所述共享器件采集的机箱信息;
    通过集群管理中心基于与所述主BMC以及除所述主BMC以外的各BMC相应的控制节点,访问所述主BMC以及除所述主BMC以外的各BMC所获取的所有机箱的硬件数据信息。
  13. 根据权利要求12所述的方法,其特征在于,所述共享器件包括设置在机框上的机箱硬件、网络管理板和机箱电源,所述通过所述共享器件采集所述机箱管理系统的机箱信息,并将所述机箱管理系统中集群的每个控制节点分别与各BMC进行网络互联,包括:
    通过所述机箱硬件采集所述机箱管理系统的机箱信息以及指示机箱信息;
    通过所述网络管理板所提供的网络互连功能,将集群的每个控制节点分别与各BMC进行互联。
  14. 根据权利要求13所述的方法,其特征在于,所述机箱硬件包括以下任意一项或任意组合:
    背板VPD、机箱LED、机箱温度传感器;
    所述通过所述机箱硬件采集机箱信息以及指示机箱信息,包括:
    通过所述背板VPD获取机箱电子标签信息,和/或,通过所述机箱LED指示机箱故障信息和机箱告警信息,和/或,通过所述机箱温度传感器测量机箱环境温度。
  15. 根据权利要求12或13所述的方法,其特征在于,所述固件层中机箱管理控制模块包括第一机箱管理控制器和第二机箱管理控制器,还包括:
    通过所述第一机箱管理控制器和所述第二机箱管理控制器均与各BMC相连,进行网络 通信;
    以及通过所述第一机箱管理控制器和所述第二机箱管理控制器进行网络冗余,且将网络绑定模式为主备模式。
  16. 根据权利要求15所述的方法,其特征在于,所述单控归属器件包括以下任意一项或任意组合:
    CAN VPD、CAN LED、CAN传感器、风扇、IO扩展卡;
    所述通过所述固件层中各处理器对相应控制节点的单控归属器件进行管理,并从各BMC中选择主BMC,包括:
    通过各处理器对负责获取相应控制节点的控制器电子标签信息的CAN VPD,和/或,负责指示相应控制节点的节点故障信息或节点告警信息或节点定位信息的CAN LED,和/或,负责采集相应控制节点的节点温度信息和节点电压信息的CAN传感器,和/或,负责对存储前端或存储后端进行链接扩展的IO扩展卡进行管理;
    基于每个控制节点的BMC所单独进行的访问从各BMC中选择主BMC。
  17. 根据权利要求16所述的方法,其特征在于,还包括:
    通过所述处理器在检测到所述主BMC切换时,将所述虚拟IP漂移至当前主BMC;
    所述通过所述处理器在检测到所述主BMC切换时,将所述虚拟IP漂移至当前主BMC,包括:
    通过所述处理器各BMC预先设置执行主BMC切换顺序的物理编号,并获取各BMC的心跳状态信息;
    若检测到所述主BMC不在位或者是异常,则判断与所述主BMC的物理编号相邻的下一个候选BMC是否在位且正常;
    若与所述主BMC的物理编号相邻的下一个候选BMC在位且正常,则将所述候选BMC作为当前主BMC。
  18. 根据权利要求12所述的方法,其特征在于,所述操作系统层包括与各控制节点对应的多个智能平台管理工具;所述通过所述操作系统层与所选择的主BMC以及所述固定层中除所述主BMC以外的各BMC进行通信,并基于所述机箱管理控制模块所提供的网络互连功能访问所述主BMC以及除所述主BMC以外的各BMC,包括:
    通过每个智能平台管理工具与所有BMC通信,并通过所述智能平台管理工具访问各BMC。
  19. 根据权利要求18所述的方法,其特征在于,所述固件层中机箱管理控制模块包括第一机箱管理控制器和第二机箱管理控制器,所述方法还包括:
    通过所述操作系统层基于所述第一机箱管理控制器或所述第二机箱管理控制器执行对各BMC的固件升级操作。
  20. 根据权利要求12所述的方法,其特征在于,所述应用层包括与每个控制节点对应的多个高清监视器;
    所述通过所述应用层调用BMC接口访问所述操作系统层,并通过所述机箱管理控制模块获取所述主BMC缓存的硬件数据信息以及除所述主BMC以外的各BMC缓存的硬件数据信息,包括:
    通过各高清监视器调用相应的BMC接口获取所有BMC缓存的硬件数据信息,并通过轮询各BMC和所述主BMC实现对硬件的管理;其中,各高清监视器均与所述集群管理中心相连;
    还包括:
    通过各高清监视器将相应控制节点通过访问所有BMC所得的硬件数据信息同步至所述集群管理中心。
  21. 根据权利要求12至20任一项所述的方法,其特征在于,所述主BMC具有虚拟IP,所述通过集群管理中心基于与所述主BMC以及除所述主BMC以外的各BMC相应的控制节点,访问所述主BMC以及除所述主BMC以外的各BMC所获取的所有机箱的硬件数据信息,包括:
    通过所述集群的各控制节点基于所述虚拟IP访问所述主BMC,以及基于任何一种与每个控制节点的BMC均连接的总线访问除所述主BMC以外的各BMC,得到所述主BMC以及除所述主BMC以外的各BMC相应的控制节点,访问所述主BMC以及除所述主BMC以外的各BMC所获取的所有机箱的硬件数据信息。
PCT/CN2022/121847 2022-03-22 2022-09-27 机箱管理系统及机箱管理方法 WO2023178975A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210279484.2A CN114356725B (zh) 2022-03-22 2022-03-22 机箱管理系统
CN202210279484.2 2022-03-22

Publications (1)

Publication Number Publication Date
WO2023178975A1 true WO2023178975A1 (zh) 2023-09-28

Family

ID=81094476

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/121847 WO2023178975A1 (zh) 2022-03-22 2022-09-27 机箱管理系统及机箱管理方法

Country Status (2)

Country Link
CN (1) CN114356725B (zh)
WO (1) WO2023178975A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356725B (zh) * 2022-03-22 2022-06-07 苏州浪潮智能科技有限公司 机箱管理系统
CN115905055A (zh) * 2022-10-21 2023-04-04 超聚变数字技术有限公司 一种计算设备及数据获取方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108173959A (zh) * 2018-01-09 2018-06-15 郑州云海信息技术有限公司 一种集群存储系统
CN109901862A (zh) * 2019-02-28 2019-06-18 苏州浪潮智能科技有限公司 一种bmc配置参数存储方法
US20200028902A1 (en) * 2018-07-19 2020-01-23 Cisco Technology, Inc. Multi-node discovery and master election process for chassis management
CN112162887A (zh) * 2020-09-24 2021-01-01 北京浪潮数据技术有限公司 存储设备及其机框共享部件访问方法、装置、存储介质
CN114356725A (zh) * 2022-03-22 2022-04-15 苏州浪潮智能科技有限公司 机箱管理系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7716515B2 (en) * 2006-12-21 2010-05-11 Inventec Corporation Method for updating the timing of a baseboard management controller
US9619243B2 (en) * 2013-12-19 2017-04-11 American Megatrends, Inc. Synchronous BMC configuration and operation within cluster of BMC
CN109981635B (zh) * 2019-03-20 2021-09-24 浪潮商用机器有限公司 一种数据处理方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108173959A (zh) * 2018-01-09 2018-06-15 郑州云海信息技术有限公司 一种集群存储系统
US20200028902A1 (en) * 2018-07-19 2020-01-23 Cisco Technology, Inc. Multi-node discovery and master election process for chassis management
CN109901862A (zh) * 2019-02-28 2019-06-18 苏州浪潮智能科技有限公司 一种bmc配置参数存储方法
CN112162887A (zh) * 2020-09-24 2021-01-01 北京浪潮数据技术有限公司 存储设备及其机框共享部件访问方法、装置、存储介质
CN114356725A (zh) * 2022-03-22 2022-04-15 苏州浪潮智能科技有限公司 机箱管理系统

Also Published As

Publication number Publication date
CN114356725A (zh) 2022-04-15
CN114356725B (zh) 2022-06-07

Similar Documents

Publication Publication Date Title
WO2023178975A1 (zh) 机箱管理系统及机箱管理方法
US10810085B2 (en) Baseboard management controllers for server chassis
CN102035862B (zh) Svc集群中配置节点的故障移交方法和系统
USRE47289E1 (en) Server system and operation method thereof
US7356665B2 (en) Method and system for machine memory power and availability management in a processing system supporting multiple virtual machines
US9465771B2 (en) Server on a chip and node cards comprising one or more of same
JP4768497B2 (ja) ストレージシステムにおけるデータの移動
US8176501B2 (en) Enabling efficient input/output (I/O) virtualization
CN102053857A (zh) 虚拟机器的管理装置及其相关切换方法
US20050080887A1 (en) Redundant management control arbitration system
CN101894060A (zh) 故障检测方法及模块化设备
CN101483540A (zh) 一种高端数据通信设备中的主备倒换方法
CN102346707B (zh) 服务器系统与其操作方法
CN105302248A (zh) 芯片组以及服务器系统
JP2004021556A (ja) 記憶制御装置およびその制御方法
JP2013073289A (ja) 多重化システム、データ通信カード、状態異常検出方法、及びプログラム
GB2536515A (en) Computer system, and a computer system control method
CN102983989B (zh) 一种服务器虚拟地址的迁移方法、装置和设备
WO2024007510A1 (zh) 服务器管理方法、装置、系统、电子设备及可读存储介质
CN101582034B (zh) 分享基本输入输出系统的伺服装置及其方法
US7962735B2 (en) Servo device auto-booted upon power supply recovery and method thereof
CN114721593A (zh) 存储设备信息收集方法、系统及电子设备
CN109901954B (zh) 存储设备和资源管理方法
US8171271B2 (en) Server device and method of sharing basic input/output system
CN216719089U (zh) 一种服务器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22933027

Country of ref document: EP

Kind code of ref document: A1