CN115567400A - Whole cabinet management method, device, equipment and medium - Google Patents

Whole cabinet management method, device, equipment and medium Download PDF

Info

Publication number
CN115567400A
CN115567400A CN202211198798.6A CN202211198798A CN115567400A CN 115567400 A CN115567400 A CN 115567400A CN 202211198798 A CN202211198798 A CN 202211198798A CN 115567400 A CN115567400 A CN 115567400A
Authority
CN
China
Prior art keywords
management
management controller
nodes
sub
cabinet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211198798.6A
Other languages
Chinese (zh)
Inventor
郭平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202211198798.6A priority Critical patent/CN115567400A/en
Publication of CN115567400A publication Critical patent/CN115567400A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/085Retrieval of network configuration; Tracking network configuration history
    • H04L41/0853Retrieval of network configuration; Tracking network configuration history by actively collecting configuration information or by backing up configuration information

Abstract

The application provides a method, a device, equipment and a medium for managing a whole cabinet, which are applied to a first management controller, wherein the first management controller is deployed on a resource switching node, and the method comprises the following steps: respectively sending configuration parameters to N sub-nodes so that the N sub-nodes are respectively configured according to the received configuration parameters, and reporting the identity identification information of the sub-nodes to the resource switching node after the configuration is finished; and establishing a topological relation of the whole cabinet according to the identity identification information of the N sub-nodes. In the method, the automatic configuration of the sub-nodes and the reporting of the identity identification information are carried out to realize the topology discovery between the resource exchange node and the sub-nodes, so that the rapid and accurate identification of the topology relationship between the resource exchange node and the sub-nodes is realized.

Description

Whole cabinet management method, device, equipment and medium
Technical Field
The present application relates to the field of server application technologies, and in particular, to a method, an apparatus, a device, and a medium for managing a whole rack.
Background
The server resource pooling technology can bring flexible and elastic resource deployment, improve the resource utilization rate, and more effectively improve the fault repair capability and the operation efficiency of the server. The server resource pooling technology is generally deployed by taking a whole cabinet as a unit, some important resources, such as a CPU pool, a memory pool, a storage pool, a heterogeneous acceleration pool and the like, are pooled in the whole cabinet, and various resource pools in the whole cabinet are connected together by using a resource switching node (Switch node), so that integration and flexible configuration of various resources are realized.
At present, a resource pooling technology using a resource switching node as a core is still in a research stage, and how to correctly identify and manage a topological relationship between each port of the resource switching node and a child node of a resource pool complete cabinet becomes a problem to be solved urgently.
Disclosure of Invention
In view of the foregoing, embodiments of the present application provide a method, an apparatus, a device, and a medium for managing a complete cabinet, so as to overcome the foregoing problems or at least partially solve the foregoing problems.
In a first aspect of the embodiments of the present application, a method for managing a whole rack is disclosed, which is applied to a first management controller, where the first management controller is deployed on a resource switching node, and the method includes:
respectively sending configuration parameters to N sub-nodes so that the N sub-nodes are respectively configured according to the received configuration parameters, and reporting the identity identification information of the sub-nodes to the resource switching node after the configuration is finished;
and establishing a topological relation of the whole cabinet according to the identity identification information of the N sub-nodes.
Optionally, the first management controller establishes communication connections with storage devices deployed on N child nodes, where the communication connections are implemented through communication links in connection cables of the N child nodes and N ports of the resource switching node; respectively sending configuration parameters to the N sub-nodes, wherein the configuration parameters comprise:
acquiring control authority of respective storage devices of the N child nodes;
and respectively writing the configuration parameters of the N child nodes into respective storage devices of the N child nodes, so that the N child nodes read the configuration parameters of the child nodes from the respective deployed storage devices.
Optionally, before establishing the topology relationship of the entire cabinet according to the identity information of each of the N child nodes, the method further includes:
the identity identification information reported by N sub-nodes is obtained from an internal local area network of the whole cabinet, wherein the internal local area network of the whole cabinet is a local area network established by respectively connecting the first management controller, the N second management controllers and a TOR network switch in the whole cabinet through network links, and each second management controller is deployed on one sub-node.
Optionally, after establishing the topological relation of the entire cabinet, the method further includes:
and reporting the topological relation to a management client through an external management network so that the management client can manage the N child nodes.
Optionally, the first management controller establishes communication connections with storage devices deployed on N child nodes, where the communication connections are implemented by communication links in connection cables of the N child nodes and N ports of the resource switching node; the method further comprises the following steps:
when the identity identification information reported by the child node is not received within a preset time length, reading fault information in the storage equipment of the child node;
when receiving fault broadcast of a local area network in the whole cabinet, reading fault information in storage equipment of a sub-node with a fault;
and reporting the acquired fault information to a management client through an external management network so that the management client processes the fault.
Optionally, the N child nodes are deployed with muxs, the first management controller establishes communication connections with the storage devices deployed on the N child nodes through the muxs, and the second management controller establishes communication connections with the corresponding storage devices through the muxs; the method further comprises the following steps:
and the first management controller and the second management controller deployed on each child node perform control right switching through the Mux, so that the first management controller and the second management controller can perform read-write operation on the storage device respectively.
In a second aspect of the embodiments of the present application, a method for managing a complete rack is disclosed, which is applied to a second management controller, where the second management controller is deployed on a child node, and the method includes:
receiving configuration parameters sent by a first management controller;
configuring according to the received configuration parameters;
and reporting the identity identification information of the resource exchange node after the configuration is finished so that the first management controller establishes the topological relation of the whole cabinet according to the identity identification information.
Optionally, the second management controller establishes a communication connection with a storage device deployed on the child node through a communication link; the receiving the configuration parameters sent by the first management controller includes:
acquiring the control authority of the storage device;
and reading the configuration parameters which are written into the storage device by the first management controller in advance from the storage device.
Optionally, reporting, after completing the configuration, identity information of the resource switching node to the resource switching node, where the reporting includes:
and issuing the self identity identification information in an internal local area network of the whole cabinet, wherein the internal local area network of the whole cabinet is the local area network established by respectively connecting the first management controller, the N second management controllers and the TOR network switch in the whole cabinet through network links.
Optionally, the second management controller establishes a communication connection with a storage device deployed on the child node through a communication link, and the method further includes:
when a child node fails in a configuration process or an identity identification information reporting process, writing failure information into storage equipment of the child node so that the first management controller reads the failure information from the storage equipment;
and sending fault broadcast to the local area network inside the whole cabinet, so that the first management controller reads fault information in the storage equipment of the failed child node after receiving the fault broadcast.
In a third aspect of the embodiments of the present application, a method for managing a whole rack is disclosed, which is applied to a management client, where the management client is connected to a TOR network switch in the whole rack through an external management network, and the method includes:
accessing a first management controller through the external management network to obtain a topological relation of the entire cabinet, where the topological relation is generated according to the method of the first aspect;
and calling communication interfaces on the N sub-nodes in the topological relation by using a local area network in the whole cabinet to acquire the equipment information of the N sub-nodes and manage the N sub-nodes.
Optionally, the method further comprises:
and acquiring the fault information reported by the first management controller through the external management network, and processing the fault information.
In a fourth aspect of the embodiments of the present application, a complete equipment cabinet management apparatus is disclosed, which is applied to a first management controller, where the first management controller is deployed on a resource switching node, and the apparatus includes:
a sending module, configured to send configuration parameters to the N child nodes, so that the N child nodes perform configuration according to the received configuration parameters, respectively, and report their own identity information to the resource switching node after the configuration is completed;
and the identification module is used for establishing the topological relation of the whole cabinet according to the respective identity identification information of the N sub-nodes.
In a fifth aspect of the embodiments of the present application, a complete cabinet management apparatus is disclosed, which is applied to a second management controller, where the second management controller is deployed on a child node, and the apparatus includes:
the receiving module is used for receiving the configuration parameters sent by the first management controller;
the configuration module is used for configuring according to the received configuration parameters;
and the reporting module is used for reporting the identity identification information of the resource switching node after the configuration is finished so that the first management controller establishes the topological relation of the whole cabinet according to the identity identification information.
In a sixth aspect of the embodiment of the present application, a complete cabinet management device is disclosed, which is applied to a management client, where the management client is connected to a TOR network switch in a complete cabinet through an external management network, and the device includes:
an access module, configured to access a first management controller through the external management network to obtain a topological relation of a complete cabinet, where the topological relation is generated according to the method of the first aspect;
and the management module is used for calling the communication interfaces on the N sub-nodes in the topological relation by using the local area network in the whole cabinet so as to acquire the equipment information of the N sub-nodes and manage the N sub-nodes.
In a seventh aspect of the embodiments of the present application, an electronic device is disclosed, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the method for managing a complete rack according to the first aspect, the method for managing a complete rack according to the second aspect, or the method for managing a complete rack according to the third aspect is implemented.
In an eighth aspect of the embodiments of the present application, a computer-readable storage medium is disclosed, on which a computer program/instruction is stored, where the computer program/instruction, when executed by a processor, implements the overall cabinet management method according to the first aspect, the overall cabinet management method according to the second aspect, or the overall cabinet management method according to the third aspect.
The embodiment of the application has the following advantages:
in the embodiment of the application, a first management controller on a resource exchange node in a complete machine cabinet is used for sending configuration parameters to each sub-node respectively, each sub-node finishes automatic configuration according to the configuration parameters and reports identity information of the sub-node to the resource exchange node, and then the first management controller establishes a topological relation of the complete machine cabinet according to the received identity information. Therefore, the present embodiment provides a resource pooling entire cabinet management method with high feasibility, which utilizes automatic configuration of child nodes and identity information reporting to implement topology discovery between resource exchange nodes and child nodes, thereby implementing fast and accurate identification of a topology relationship between resource exchange nodes and child nodes.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings may be obtained according to these drawings without inventive labor.
Fig. 1 is a flowchart of steps of a complete cabinet management method applied to a first management controller according to an embodiment of the present application;
fig. 2 is a flowchart of steps of a complete cabinet management method applied to a second management controller according to an embodiment of the present application;
fig. 3 is a flowchart of steps of a complete cabinet management method applied to a management client according to an embodiment of the present application;
fig. 4 is a schematic diagram of a hardware topology structure of a complete cabinet management system according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a complete cabinet management device applied to a first management controller according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a complete cabinet management device applied to a second management controller according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a complete cabinet management apparatus applied to a management client according to an embodiment of the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the embodiments of the present application are described in detail and completely in the following, and it is to be understood that the described embodiments are a part of the embodiments of the present application, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 shows a whole rack control method provided in an embodiment of the present application, which is applied to a first management controller, where the first management controller is deployed on a resource switching node, and the method includes:
step S101: and respectively sending configuration parameters to N sub-nodes so that the N sub-nodes are respectively configured according to the received configuration parameters, and reporting the identity identification information of the sub-nodes to the resource switching node after the configuration is finished.
In this embodiment, the whole rack is equivalent to a large server, and the inside of the whole rack includes a resource switching node (i.e., a Switch node) and a plurality of (N) child nodes, where the resource switching node is a core switching device for pooling resources of the whole rack and may be used for device expansion and resource and information interaction, and a first management controller is deployed on the resource switching node and is used to manage operation and maintenance of the resource switching node; each child node is equivalent to a pooled resource (for example, a CPU pool, a memory pool, a storage pool, a heterogeneous acceleration pool, and the like) inside the whole cabinet, and a second management controller is deployed on each child node and is used for managing the operation and maintenance of the child node.
Because the configuration parameters of each child node are different, after the first management controller on the resource switching node is started, the configuration parameters are sent to each child node one by one, wherein the configuration parameters comprise the management parameters of the local area network in the whole cabinet and the identification of the corresponding connection port on the resource switching node. And when the child nodes receive the configuration parameters, the child nodes automatically configure according to the configuration parameters, and report the identity identification information of the child nodes after the configuration is completed, wherein the identity identification information of each child node comprises the information of the main capability, the management address, the equipment identification, the interface identification and the like of each child node.
In an optional embodiment, the first management controller establishes communication connections with storage devices deployed on N child nodes respectively, where the communication connections are implemented by communication links in connection cables of the N child nodes through N ports of the resource switching node; respectively sending configuration parameters to the N sub-nodes, wherein the configuration parameters comprise:
acquiring control authority of respective storage devices of the N child nodes;
respectively writing the configuration parameters of the N sub-nodes into respective storage devices of the N sub-nodes, so that the N sub-nodes read the configuration parameters of the N sub-nodes from the respective deployed storage devices.
In this embodiment, each child node is disposed with a storage device, where the storage device refers to a device or a chip that has a data storage function and can perform data Read-write operation, for example, an EEPROM (Electrically Erasable Programmable Read Only Memory), which is a storage chip that does not lose data after power failure. The communication link is a link having a function of transmitting and receiving data between devices, and may be an SMBus link (System Management Bus) which is a two-wire serial Bus and may transmit and receive data between devices through the SMBus link. And then the first management controller on the resource exchange node establishes connection with the sub-node EEPROM through SMBus links in connection cables of the ports of the resource exchange node and the sub-node, so that communication between the first management controller and the storage device (EEPROM) is realized, namely the first management controller can perform data writing and reading operations on the storage device through communication connection.
After a first management controller on a resource exchange node is started, the first management controller actively acquires the control authority of the storage device of each child node, writes configuration parameters into an area corresponding to a storage of the child node after the control authority is acquired, and releases the control authority of the storage device of the child node after the writing operation is completed. In addition, after the first management controller completes the configuration parameter writing operation, the first management controller may notify each child node that the configuration parameters in the storage device may be read by sending a network broadcast.
In this embodiment, a communication link and a storage device are used to establish hardware connection, that is, management, of device topology in the whole resource pool cabinet, and a first management controller writes configuration parameters into the storage device, so that a child node obtains the configuration parameters through the storage device, and further data interaction between a resource exchange node and the child node is realized at a low cost.
Step S102: and establishing a topological relation of the whole cabinet according to the identity identification information of the N sub-nodes.
In this embodiment, after receiving the identity information reported by each child node, the first management controller on the resource exchange node recognizes the identity information, and establishes a topological relationship of the entire cabinet according to the recognized identity information, where the topological relationship refers to a connection relationship between the resource exchange node and each child node. And further, topology discovery between the resource exchange node and the child nodes is realized, and rapid and accurate identification of the topology relationship between the resource exchange node and the child nodes is realized.
In an optional embodiment, before establishing the topology relationship of the entire cabinet according to the identity information of each of the N child nodes, the method further includes:
the identity identification information reported by N sub-nodes is obtained from an internal local area network of the whole cabinet, wherein the internal local area network of the whole cabinet is a local area network established by respectively connecting the first management controller, the N second management controllers and a TOR network switch in the whole cabinet through network links, and each second management controller is deployed on one sub-node.
In this embodiment, the network link may be a Local Area Network (LAN) link, and the first management controller on the resource switching node and the second management controllers on the respective child nodes are respectively connected to the TOR network switch located at the top of the cabinet through the LAN link, so as to establish an internal LAN connection of the whole machine. After the local area network connection is established, the first management controller and the second management controller can exchange network data, when the sub-nodes complete configuration according to the configuration parameters, the second management controller sends the identity identification information of the sub-nodes to the internal local area network of the whole cabinet, and the first management controller on the resource exchange node obtains the identity identification information sent by each sub-node from the internal local area network of the whole cabinet.
The identity information is sent in the form of an LLDPDU packet, that is, after configuration is completed, the second management controller encapsulates its own identity information (for example, information such as its own main capability, management address, device identifier, interface identifier, and the like) in an LLDPDU (Link Layer Discovery Protocol Data Unit) and issues the information in an internal local area network of the whole cabinet, so that the first management controller receives and receives the LLDPDU packet sent by each child node from the internal local area network of the whole cabinet, and establishes a topological relationship in the whole cabinet according to the identity information carried in the LLDPDU packet.
In an optional embodiment, after the topological relation of the entire cabinet is established, the method further includes:
and reporting the topological relation to a management client through an external management network so that the management client can manage the N child nodes.
In this embodiment, the TOR network switch located at the top of the entire cabinet is connected to the external management network and establishes a connection with the management client, so that the management client can manage the entire cabinet. The first management controller reports the established topological relation of the whole cabinet to the management client, and the management client manages each node in the topological relation.
In an optional embodiment, the first management controller establishes communication connections with storage devices deployed on N child nodes respectively, where the communication connections are implemented by communication links in connection cables of the N child nodes through N ports of the resource switching node; the method further comprises the following steps:
when the identity identification information reported by the child node is not received within a preset time length, reading fault information in the storage equipment of the child node;
when receiving fault broadcast of a local area network in the whole cabinet, reading fault information in storage equipment of a sub-node with a fault;
and reporting the acquired fault information to a management client through an external management network so that the management client processes the fault.
In this embodiment, when the first management controller does not receive the id information reported by the child node within the preset time period, it indicates that the child node fails to report its own id information on time due to a failure, or when a failure broadcast of the local area network in the entire cabinet is received, it indicates that the child node fails.
After the failure occurs, the second management controller on the child node writes the failure information into the storage device of the child node, so that the first management controller on the resource switching node reads the failure information from the storage device of the failed child node. Therefore, the storage device in each sub-node and the first management controller can report the sub-node fault information, manage the client and process the fault information, so that the whole cabinet management method has fault diagnosis service, and the reliability of the whole cabinet management method is higher.
In an optional embodiment, a Mux is deployed on the N child nodes, the first management controller establishes communication connections with storage devices deployed on the N child nodes through the Mux, and the second management controller establishes communication connections with corresponding storage devices through the Mux; the method further comprises the following steps:
and the first management controller and the second management controller deployed on each child node perform control right switching through the Mux, so that the first management controller and the second management controller can perform read-write operation on the storage device respectively.
Before the first management controller and the second management controller perform read and write operations on the storage device of the child node, the control authority of the storage device needs to be acquired first. Therefore, a Mux (multiplexer, data selector) is set in front of the storage device of each child node, and the Mux can perform signal switching as needed.
In this embodiment, a first management controller on a resource exchange node in a complete machine cabinet is used to send configuration parameters to each child node, each child node reports its own identity information to the resource exchange node after completing automatic configuration according to the configuration parameters, and then the first management controller establishes a topological relation of the complete machine cabinet according to the received identity information. Therefore, the present embodiment provides a resource pooling entire cabinet management method with high feasibility, which utilizes automatic configuration of child nodes and identity information reporting to implement topology discovery between resource exchange nodes and child nodes, thereby implementing fast and accurate identification of the topology relationship between resource exchange nodes and child nodes.
As shown in fig. 2, according to another aspect of the present application, there is provided a complete rack management method applied to a second management controller, where the second management controller is deployed on a child node, the method including:
step S201: receiving configuration parameters sent by a first management controller;
step S202: configuring according to the received configuration parameters;
step S203: and reporting the identity identification information of the resource exchange node after the configuration is finished so that the first management controller establishes the topological relation of the whole cabinet according to the identity identification information.
In this embodiment, each child node is deployed with a second management controller, and the second management controller is configured to manage operation and maintenance of the child node, so as to implement data interaction with the first management controller on the resource switching node. Receiving the configuration parameters sent by the first management controller means that each child node receives the configuration parameters sent by the first management controller to the child node individually, and the configuration parameters include identification information of connection ports corresponding to the resource exchange node and the child node, so that the configuration parameters of each child node are different.
After receiving the respective configuration parameters, the second management controller performs automatic configuration according to the respective configuration parameters, and sends the identity information of the second management controller to the first management controller after the configuration is completed, wherein the identity information includes the information of the main capability, the management address, the device identifier, the interface identifier and the like of the child node. And the first management controller establishes a topological relation in the whole cabinet according to the received identity information of each child node, and then each child node realizes topology discovery between the resource exchange node and the child node in a mode of automatic configuration and identity information reporting, so that the topological relation between the resource exchange node and the child node is quickly and accurately identified.
In an alternative embodiment, the second management controller establishes a communication connection with a storage device deployed on the child node through a communication link; the receiving the configuration parameters sent by the first management controller includes:
acquiring the control authority of the storage device;
and reading the configuration parameters which are written into the storage device by the first management controller in advance from the storage device.
In this embodiment, after the first management controller writes the configuration parameters into the corresponding area of the child node storage device, the control authority of the child node storage device is released, and the second management controller is notified to read the configuration information in the storage device, and after receiving the notification, the second management controller obtains the control authority of the storage device and reads the configuration parameters from the control authority. The first management controller can inform the second management controller to read the configuration parameters in the storage device in a network broadcasting mode.
In an optional embodiment, reporting, after completing the configuration, identity information of the resource switching node to the resource switching node includes:
and issuing the self identity identification information in an internal local area network of the whole cabinet, wherein the internal local area network of the whole cabinet is the local area network established by respectively connecting the first management controller, the N second management controllers and the TOR network switch in the whole cabinet through network links.
In this embodiment, the second management controller on the child node and the first management controller on the resource exchange node may perform network data exchange through the internal local area network of the entire cabinet, and after the second management controller completes configuration, the second management controller issues the identity information of the second management controller to the internal local area network of the entire cabinet. After configuration is completed, the second management controller encapsulates the identity information of the child nodes (the main capability, the management address, the device identifier, the interface identifier and the like of the child nodes) into the LLDPDU packet and publishes the LLDPDU packet in the local area network inside the complete cabinet, and then the first management controller receives the LLDPDU packet in the local area network inside the complete cabinet.
In an optional embodiment, the second management controller establishes a communication connection with a storage device deployed on the child node through a communication link, and the method further includes:
when a child node fails in a configuration process or an identity identification information reporting process, writing failure information into storage equipment of the child node so that the first management controller reads the failure information from the storage equipment;
and sending fault broadcast to the local area network inside the whole cabinet, so that the first management controller reads fault information in the storage equipment of the failed child node after receiving the fault broadcast.
In this embodiment, after the second management controller on the failed node writes the failure information into the storage device, the second management controller may release the control authority of the storage device, and then when the first management controller on the resource switching node receives the failure broadcast or does not receive the identification information reported by the child node for a long time, the first management controller may obtain the control authority of the storage device of the failed child node again, and read the failure information in the storage device.
As shown in fig. 3, according to yet another aspect of the present application, there is provided a whole cabinet management method, applied to a management client, where the management client is connected to a TOR network switch in a whole cabinet through an external management network, and the method includes:
step S301: and accessing the first management controller through the external management network to obtain a topological relation of the entire cabinet, where the topological relation is generated according to the method in the embodiment of the first aspect.
Step S302: and calling communication interfaces on the N sub-nodes in the topological relation by using a local area network in the whole cabinet to acquire the equipment information of the N sub-nodes and manage the N sub-nodes.
In this embodiment, the management client is connected to the TOR network switch through an external management network, and then may perform data interaction with the first management controller and the second management controller, the management client accesses the first management controller through the internal local area network of the entire cabinet and obtains the topological relation of the entire cabinet, and then calls the communication interface of the child node in the topological relation through the internal local area network of the entire cabinet, where the communication interface may be a Redfish interface, and obtains the device information of the child node through the Redfish interface of the child node, and then manages the node device.
In an optional embodiment, the method further comprises:
and acquiring the fault information reported by the first management controller through the external management network, and processing the fault information.
In this embodiment, a first management controller on a resource switching node reports fault information read from a storage device of a faulty child node to a management client, and the management client performs processing after receiving the fault information, where the fault information includes: parameter configuration error failures, network failures, etc. And further, under the condition that the child node encounters a fault, the whole cabinet management method has fault diagnosis service.
In this embodiment, a communication link and a storage device are used to establish hardware connection, that is, management, of device topology in a resource pool complete machine cabinet, a first management controller on a resource exchange node in the complete machine cabinet is used to send configuration parameters to each child node, each child node reports its own identity information to the resource exchange node after completing automatic configuration according to the configuration parameters, then the first management controller establishes a topology relationship of the complete machine cabinet according to the received identity information, and the child node automatic configuration and identity information reporting are used to realize topology discovery between the resource exchange node and the child node, thereby realizing rapid and accurate identification of the topology relationship between the resource exchange node and the child node; the cost of the storage equipment is low, and the whole cabinet management system has higher reliability and expansibility; in addition, the storage device, the first management controller and the second management controller are used for reporting the fault information of the child nodes, and when the child nodes encounter faults, the fault diagnosis service is provided through a whole cabinet management method.
Fig. 4 shows a hardware topology structure diagram of a whole cabinet management system provided in an embodiment of the present application, as shown in fig. 4, a whole cabinet includes a resource switching node, that is, a Switch node, and a first management controller is deployed on the resource switching node, where the first management controller is used to manage operation and maintenance of the resource switching node. The whole cabinet is also provided with a plurality of sub-nodes, each sub-node is equivalent to a pooled resource, a second management controller is deployed on each sub-node, and the second management controller is also used for managing the operation and maintenance of the sub-nodes; EEPROM (storage device), mux and Devices (i.e., other Devices for business operations) are also deployed on each child node. And a TOR network switch is arranged on the top of the whole cabinet, and a management client is deployed outside the whole cabinet.
A first management controller deployed on a resource exchange node is connected with an SMBus link (communication link) and a Mux in a cable through each port of the resource exchange node and a child node to establish communication connection with an EEPROM of the child node, and a second management controller deployed on the child node is connected with the EEPROM through the SMBus link and the Mux to establish communication connection. And the first management controller deployed on the resource exchange node and the second management controller deployed on the child node perform control right switching through the Mux preposed by the EEPROM, so that the first management controller and the second management controller can respectively perform read-write operation on the EEPROM. In addition, the first management controller deployed on the resource switching node and the second management controller deployed on the child node are respectively connected to the TOR network switch located at the top of the entire cabinet through a network link LAN, and a local area network connection is established, so that the first management controller and the second management controller can perform network data exchange, for example, identity information, network broadcast, and the like. The TOR network switch is connected to an external management network, so that the management client manages the whole cabinet through the external management network.
In the practical application process, after the first management controller on the resource exchange node is started, the control authority of the EEPROM of each sub-node is actively acquired, the read-write operation is carried out on the EEPROMs of the sub-nodes one by one, the configuration information such as the management parameters of the local area network in the whole cabinet, the identification of the corresponding connection port on the resource exchange node and the like is written into the corresponding area of the EEPROM of the sub-node, and the control authority of the EEPROM of the sub-node is released after the write operation is finished. And then the second management controller on each child node acquires the control authority corresponding to the EEPROM, reads the configuration parameters in the respective EEPROM, performs automatic configuration, encapsulates the identity identification information such as the main capability, the management address, the equipment identifier, the interface identifier and the like of the child node in the LLDPDU, releases the identity identification information in the internal local area network of the whole cabinet and releases the control authority of the EEPROM. And then, the first management controller on the resource switching node receives the LLDPDU message information in the local area network inside the whole cabinet, and establishes a topological relation of the whole cabinet according to the LLDPDU message information, wherein the topological relation can be used for displaying to a management end or for managing nodes inside the whole cabinet. And finally, the management client accesses the first management controller through an external management network, and calls Redfish interfaces on each child node in the topological relation by using a local area network in the whole cabinet to acquire detailed information of equipment on the child nodes and manage the child node equipment.
In addition, when the automatic configuration process on the child node or the LLDPDU message is sent in error, the second management controller on the child node writes part of fault information into the EEPROM for the first management controller to report the fault information for use, and the EEPROM control right is released after the fault writing operation is finished; and then the first management controller accesses the EEPROM of the sub-node with the fault, reads the fault information in the EEPROM, records the fault information and reports the fault information to the management client through an external management network, and the client processes the fault information after receiving the fault information.
Fig. 5 shows a complete rack management apparatus provided in this embodiment of the present application, which is applied to a first management controller, where the first management controller is deployed on a resource switching node, and the apparatus includes:
a sending module 51, configured to send configuration parameters to the N child nodes, so that the N child nodes perform configuration according to the received configuration parameters, respectively, and report identity information of the resource switching node after the configuration is completed;
and the identifying module 52 is configured to establish a topological relation of the entire cabinet according to the identity information of each of the N child nodes.
In an optional embodiment, the first management controller establishes communication connections with storage devices deployed on N child nodes respectively, where the communication connections are implemented by communication links in connection cables of the N child nodes through N ports of the resource switching node, and the sending module includes:
the first authority acquisition module is used for acquiring the control authority of the storage device of each of the N child nodes;
a parameter writing module, configured to write the configuration parameters of the N child nodes into respective storage devices of the N child nodes, respectively, so that the N child nodes read their own configuration parameters from the respective deployed storage devices.
In an alternative embodiment, the apparatus further comprises:
the identity acquisition module is used for acquiring identity identification information reported by N sub-nodes from an internal local area network of the whole cabinet, wherein the internal local area network of the whole cabinet is a local area network established by respectively connecting the first management controller, the N second management controllers and a TOR network switch in the whole cabinet through network links, and each second management controller is arranged on one sub-node.
In an alternative embodiment, the apparatus further comprises:
and the topology reporting module is used for reporting the topology relation to a management client through an external management network so that the management client can manage the N child nodes.
In an optional embodiment, the first management controller establishes communication connections with storage devices deployed on N child nodes respectively, where the communication connections are implemented by communication links in connection cables of the N child nodes through N ports of the resource switching node, and the apparatus further includes:
the first fault reading module is used for reading the fault information in the storage equipment of the child node when the identity identification information reported by the child node is not received within a preset time length;
the second fault reading module is used for reading fault information in the storage equipment of the failed child node when receiving the fault broadcast of the local area network in the whole cabinet;
and the fault reporting module is used for reporting the acquired fault information to the management client through an external management network so that the management client processes the fault.
In an optional embodiment, a Mux is deployed on the N child nodes, the first management controller establishes communication connections with storage devices deployed on the N child nodes through the Mux, and the second management controller establishes communication connections with corresponding storage devices through the Mux; the device further comprises:
and the authority switching module is used for switching the control authority between the first management controller and the second management controller deployed on each child node through the Mux, so that the first management controller and the second management controller can respectively perform read-write operation on the storage device.
Fig. 6 shows a complete equipment cabinet management apparatus provided in the embodiment of the present application, which is applied to a second management controller, where the second management controller is deployed on a child node, and the apparatus includes:
a receiving module 61, configured to receive the configuration parameters sent by the first management controller;
a configuration module 62, configured according to the received configuration parameters;
a reporting module 63, configured to report, to the resource switching node, identity information of the resource switching node after configuration is completed, so that the first management controller establishes a topological relationship of the entire cabinet according to the identity information.
In an alternative embodiment, the second management controller establishes a communication connection with a storage device deployed on the child node through a communication link, and the receiving module includes:
the second authority acquisition module is used for acquiring the control authority of the storage device;
and the parameter reading module is used for reading the configuration parameters which are written into the storage equipment by the first management controller in advance from the storage equipment.
In an optional embodiment, the reporting module includes:
and the identity reporting module is used for issuing the identity identification information of the identity reporting module in an internal local area network of the whole cabinet, wherein the internal local area network of the whole cabinet is a local area network established by respectively connecting the first management controller, the N second management controllers and the TOR network switch in the whole cabinet through network links.
In an optional embodiment, the second management controller establishes a communication connection with a storage device deployed on the child node through a communication link, and the apparatus further includes:
a failure writing module, configured to, when a child node fails in a configuration process or an identity information reporting process, write failure information into a storage device of the child node, so that the first management controller reads the failure information from the storage device;
and the fault notification module is used for sending a fault broadcast to the local area network in the whole cabinet, so that the first management controller reads fault information in the storage equipment of the failed child node after receiving the fault broadcast.
Fig. 7 shows a whole cabinet management apparatus provided in an embodiment of the present application, which is applied to a management client, where the management client is connected to a TOR network switch in a whole cabinet through an external management network, and the apparatus includes:
an accessing module 71, configured to access the first management controller through the external management network to obtain a topological relation of the entire cabinet, where the topological relation is generated according to the method described in the foregoing first embodiment;
and the management module 72 is configured to call, by using a local area network inside the entire cabinet, communication interfaces on the N child nodes in the topological relation, to obtain the device information of the N child nodes, and manage the N child nodes.
In an alternative embodiment, the apparatus further comprises:
and the fault processing module is used for acquiring the fault information reported by the first management controller through the external management network and processing the fault information.
The embodiment of the application also provides electronic equipment which comprises a memory, a processor and a memory. A computer program stored in a memory and capable of running on a processor, wherein the processor implements the overall cabinet management method according to any one of the above embodiments when executing the computer program.
The embodiment of the present application further provides a computer-readable storage medium, on which a computer program/instruction is stored, and when the computer program/instruction is executed by a processor, the method for managing a complete cabinet according to any of the above embodiments is implemented
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or terminal device that comprises the element.
The above detailed description is given to a method, an apparatus, a device, and a medium for managing a whole cabinet, and a specific example is applied in the detailed description to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (17)

1. A whole cabinet management method is applied to a first management controller, wherein the first management controller is deployed on a resource switching node, and the method comprises the following steps:
respectively sending configuration parameters to N sub-nodes so that the N sub-nodes are respectively configured according to the received configuration parameters, and reporting the identity identification information of the sub-nodes to the resource switching node after the configuration is finished;
and establishing a topological relation of the whole cabinet according to the identity identification information of the N sub-nodes.
2. The method according to claim 1, wherein the first management controller establishes communication connections with storage devices deployed on N child nodes respectively, and the communication connections are implemented through communication links in connection cables of the N child nodes and N ports of the resource switching node; respectively sending configuration parameters to the N sub-nodes, wherein the configuration parameters comprise:
acquiring control authority of each storage device of the N child nodes;
respectively writing the configuration parameters of the N sub-nodes into respective storage devices of the N sub-nodes, so that the N sub-nodes read the configuration parameters of the N sub-nodes from the respective deployed storage devices.
3. The method according to claim 1, before establishing the topological relationship of the entire cabinet according to the identity information of each of the N child nodes, further comprising:
the identity identification information reported by N sub-nodes is obtained from an internal local area network of the whole cabinet, wherein the internal local area network of the whole cabinet is a local area network established by respectively connecting the first management controller, the N second management controllers and a TOR network switch in the whole cabinet through network links, and each second management controller is deployed on one sub-node.
4. The method of claim 1, after establishing the topological relationship of the complete cabinet, further comprising:
and reporting the topological relation to a management client through an external management network so that the management client can manage the N child nodes.
5. The method according to claim 1, wherein the first management controller establishes communication connections with storage devices deployed on N child nodes respectively, and the communication connections are implemented through communication links in connection cables of the N child nodes and N ports of the resource switching node; the method further comprises the following steps:
when the identity identification information reported by the child node is not received within a preset time length, reading fault information in the storage equipment of the child node;
when receiving fault broadcast of a local area network in the whole cabinet, reading fault information in storage equipment of a sub-node with a fault;
and reporting the acquired fault information to a management client through an external management network so that the management client processes the fault.
6. The method according to claim 1, wherein a Mux is deployed on the N child nodes, the first management controller establishes communication connections with the storage devices deployed on the N child nodes through the Mux, and the second management controller establishes communication connections with the corresponding storage devices through the Mux; the method further comprises the following steps:
and the first management controller and a second management controller deployed on each child node perform control right switching through the Mux, so that the first management controller and the second management controller can perform read-write operation on the storage device respectively.
7. A whole cabinet management method is applied to a second management controller, the second management controller is deployed on a child node, and the method comprises the following steps:
receiving configuration parameters sent by a first management controller;
configuring according to the received configuration parameters;
and reporting the identity identification information of the resource exchange node after the configuration is finished so that the first management controller establishes the topological relation of the whole cabinet according to the identity identification information.
8. The method of claim 7, wherein the second management controller establishes a communication connection with a storage device deployed on the child node via a communication link; the receiving the configuration parameters sent by the first management controller includes:
acquiring the control authority of the storage device;
and reading the configuration parameters which are written into the storage device by the first management controller in advance from the storage device.
9. The method of claim 7, wherein reporting the identity information of the resource switching node after completing the configuration comprises:
and issuing the self identity identification information in an internal local area network of the whole cabinet, wherein the internal local area network of the whole cabinet is the local area network established by respectively connecting the first management controller, the N second management controllers and the TOR network switch in the whole cabinet through network links.
10. The method of claim 7, wherein the second management controller establishes a communication connection with a storage device deployed on the child node via a communication link, the method further comprising:
when a child node fails in a configuration process or an identity information reporting process, writing failure information into storage equipment of the child node so that the first management controller reads the failure information from the storage equipment;
and sending fault broadcast to the local area network inside the whole cabinet, so that the first management controller reads fault information in the storage equipment of the failed child node after receiving the fault broadcast.
11. A whole cabinet management method is characterized by being applied to a management client, wherein the management client is connected with a TOR network switch in a whole cabinet through an external management network, and the method comprises the following steps:
accessing a first management controller through the external management network to obtain a topological relation of the entire cabinet, the topological relation being generated according to the method of any one of claims 1 to 6;
and calling communication interfaces on the N sub-nodes in the topological relation by using a local area network in the whole cabinet to acquire the equipment information of the N sub-nodes and manage the N sub-nodes.
12. The method of claim 11, further comprising:
and acquiring the fault information reported by the first management controller through the external management network, and processing the fault information.
13. A whole cabinet management device is applied to a first management controller, wherein the first management controller is deployed on a resource switching node, and the device comprises:
a sending module, configured to send configuration parameters to the N child nodes, so that the N child nodes perform configuration according to the received configuration parameters, respectively, and report their own identity information to the resource switching node after the configuration is completed;
and the identification module is used for establishing the topological relation of the whole cabinet according to the respective identity identification information of the N sub-nodes.
14. A complete cabinet management device is applied to a second management controller, wherein the second management controller is deployed on a child node, and the complete cabinet management device comprises:
the receiving module is used for receiving the configuration parameters sent by the first management controller;
the configuration module is used for configuring according to the received configuration parameters;
and the reporting module is used for reporting the identity identification information of the resource switching node after the configuration is finished so that the first management controller establishes the topological relation of the whole cabinet according to the identity identification information.
15. The utility model provides a complete machine cabinet management device which characterized in that is applied to the management customer end, the management customer end passes through external management network and is connected with the TOR network switch in the complete machine cabinet, the device includes:
an access module, configured to access a first management controller through the external management network to obtain a topological relation of an entire cabinet, where the topological relation is generated according to the method of any one of claims 1 to 6;
and the management module is used for calling the communication interfaces on the N sub-nodes in the topological relation by using the local area network in the whole cabinet so as to acquire the equipment information of the N sub-nodes and manage the N sub-nodes.
16. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the cabinet management method according to any one of claims 1 to 6, the cabinet management method according to any one of claims 7 to 10 or the cabinet management method according to any one of claims 11 to 12 when executed.
17. A computer readable storage medium, on which a computer program/instructions are stored, wherein the computer program/instructions, when executed by a processor, implement the complete cabinet management method according to any one of claims 1 to 6, the complete cabinet management method according to any one of claims 7 to 10, or the complete cabinet management method according to any one of claims 11 to 12.
CN202211198798.6A 2022-09-29 2022-09-29 Whole cabinet management method, device, equipment and medium Pending CN115567400A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211198798.6A CN115567400A (en) 2022-09-29 2022-09-29 Whole cabinet management method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211198798.6A CN115567400A (en) 2022-09-29 2022-09-29 Whole cabinet management method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN115567400A true CN115567400A (en) 2023-01-03

Family

ID=84743608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211198798.6A Pending CN115567400A (en) 2022-09-29 2022-09-29 Whole cabinet management method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115567400A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116028232A (en) * 2023-02-27 2023-04-28 浪潮电子信息产业股份有限公司 Cross-cabinet server memory pooling method, device, equipment, server and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116028232A (en) * 2023-02-27 2023-04-28 浪潮电子信息产业股份有限公司 Cross-cabinet server memory pooling method, device, equipment, server and medium

Similar Documents

Publication Publication Date Title
TW201318308A (en) Distributed battery management system and method for distributing identifications thereof
CN106685733A (en) FC-AE-1553 network rapid configuration and automatic testing method
CN108199944B (en) Onboard cabin core system of dynamic daisy chain ring network and dynamic positioning method
CN102263651A (en) Method for detecting connection state of local end equipment in SNMP (simple network management protocol) network management system (NMS)
CN106230622B (en) Cluster implementation method and device
CN110032334A (en) Support the system and method based on manageability between NVMe-oF system chassis
CN115567400A (en) Whole cabinet management method, device, equipment and medium
CN102664755B (en) Control channel fault determining method and device
CN114401250A (en) Address allocation method and device
US10554497B2 (en) Method for the exchange of data between nodes of a server cluster, and server cluster implementing said method
CN104753707A (en) System maintenance method and network switching equipment
CN112019378A (en) Troubleshooting method and device
CN113949649B (en) Fault detection protocol deployment method and device, electronic equipment and storage medium
CN109379239B (en) Method and device for configuring access switch in OpenStack environment
CN109547274A (en) A kind of enclosure board switching method, device and first network equipment
CN104125079A (en) Method and device for determining double-device hot-backup configuration information
CN113360386A (en) Switching chip drive test method, device, electronic equipment and storage medium
CN114124803B (en) Device management method and device, electronic device and storage medium
CN113342456A (en) Connection method, device, equipment and storage medium
CN115550427A (en) Equipment upgrading method, device, equipment and storage medium
CN116137603A (en) Link fault detection method and device, storage medium and electronic device
CN114201439B (en) Server signal identification optimization method, system and storage medium
CN107248935B (en) System and method for network management to discover and monitor network elements
CN114221882A (en) Method, device, equipment and storage medium for detecting fault link
CN112087348A (en) Digital processor enumeration method and state monitoring method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination