CN115543872A - Equipment management method and device and computer storage medium - Google Patents

Equipment management method and device and computer storage medium Download PDF

Info

Publication number
CN115543872A
CN115543872A CN202110731449.5A CN202110731449A CN115543872A CN 115543872 A CN115543872 A CN 115543872A CN 202110731449 A CN202110731449 A CN 202110731449A CN 115543872 A CN115543872 A CN 115543872A
Authority
CN
China
Prior art keywords
expansion device
reset control
peripheral controller
target expansion
reset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110731449.5A
Other languages
Chinese (zh)
Inventor
李小龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110731449.5A priority Critical patent/CN115543872A/en
Publication of CN115543872A publication Critical patent/CN115543872A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/10Program control for peripheral devices
    • G06F13/102Program control for peripheral devices where the programme performs an interfacing function, e.g. device driver

Abstract

The embodiment of the application provides a device management method, a device and a computer storage medium, wherein the method comprises the following steps: sending a heartbeat command to a peripheral controller of each expansion device through a first communication link; receiving equipment state data sent by the peripheral controller of each expansion equipment in response to the heartbeat command; if the device state data sent by the peripheral controller of the target expansion device in the one or more expansion devices is not received within the preset time, a first reset control instruction is sent to a reset control circuit of the target expansion device, and the first reset control instruction is used for instructing the reset control circuit of the target expansion device to execute reset operation on the peripheral controller of the target expansion device, so that fault detection can be conveniently and effectively carried out on the peripheral controller of the expansion device, fault processing can be timely carried out, and service interruption time is effectively shortened.

Description

Equipment management method and device and computer storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a device management method and apparatus, and a computer storage medium.
Background
The heterogeneous server refers to a data center or a cluster composed of electronic devices with different Processing capabilities or hardware configurations, and generally includes a motherboard and at least one expansion device, where the expansion device may be, for example, a storage server, a GPU (Graphics Processing Unit) server, an intelligent network card, or the like. The main board comprises a main controller, a processor and the like, the expansion device comprises a peripheral controller and a service processing circuit, the main controller of the main board is generally responsible for external communication and receiving operation instructions issued by the operation and maintenance management platform, and the peripheral controller of the expansion device is mainly responsible for carrying out comprehensive management on the service processing circuit included by the expansion device.
With the large number of applications of heterogeneous servers, the complexity of device management follows, for example, a main controller of a motherboard is usually only connected to a peripheral controller of an expansion device based on an I2C bus, so as to obtain asset information (e.g., field replaceable unit), real-time data (e.g., temperature), and the like, which are collected by the peripheral controller of the expansion device, when the peripheral controller of the expansion device is abnormal, the main controller of the motherboard cannot obtain the asset information, the real-time data, and the like, and the peripheral controller of the expansion device cannot be automatically recovered, which may cause a fan to rotate at high speed for a long time, and may cause a risk of overheating or power failure, and at present, it is usually necessary to manually arrive at the field to perform fault processing on the peripheral controller, which is low in efficiency and long in service interruption time. Therefore, it is necessary to monitor the running state of the peripheral controller of the expansion device in real time and timely perform fault processing on the peripheral controller of the expansion device.
Disclosure of Invention
The embodiment of the application provides an equipment management method, an equipment management device and a computer storage medium, which can conveniently and effectively implement fault detection on a peripheral controller of an expansion device, timely process faults and effectively shorten service interruption time.
In one aspect, an embodiment of the present application provides an apparatus management method, which is applied to a computer apparatus, where the computer apparatus includes a main controller and one or more expansion apparatuses, the expansion apparatus includes a reset control circuit and a peripheral controller, and the main controller, the peripheral controller, and the reset control circuit are connected through a first communication link, where the method includes:
sending a heartbeat command to a peripheral controller of each expansion device over the first communication link;
receiving equipment state data sent by the peripheral controller of each expansion equipment in response to the heartbeat command;
if the device state data sent by the peripheral controller of the target expansion device in the one or more expansion devices is not received within a preset time length, sending a first reset control instruction to a reset control circuit of the target expansion device, where the first reset control instruction is used to instruct the reset control circuit of the target expansion device to execute a reset operation on the peripheral controller of the target expansion device.
In one aspect, an embodiment of the present application provides an apparatus management device, which is applied to a computer device, where the computer device includes a main controller and one or more expansion devices, each expansion device includes a reset control circuit and a peripheral controller, the main controller is connected to the peripheral controller through a first communication link, and the apparatus includes:
a sending module, configured to send a heartbeat command to a peripheral controller of each expansion device through the first communication link;
a receiving module, configured to receive device status data sent by the peripheral controller of each extension device in response to the heartbeat command;
the sending module is further configured to send a first reset control instruction to a reset control circuit of the target expansion device if the device state data sent by the peripheral controller of the target expansion device in the one or more expansion devices is not received within a preset time period, where the first reset control instruction is used to instruct the reset control circuit of the target expansion device to perform a reset operation on the peripheral controller of the target expansion device.
In one aspect, an embodiment of the present application provides a computer device, including: the system comprises a main controller, one or more expansion devices, a memory and a communication interface, wherein each expansion device comprises a reset control circuit and a peripheral controller, and the main controller is connected with the peripheral controllers and the reset control circuits through a first communication link;
the main controller is suitable for executing a computer program;
the memory stores a computer program which, when executed by the main controller, implements the device management method described above.
In one aspect, the present application provides a computer-readable storage medium, which stores a computer program, where the computer program is adapted to be loaded by a main controller and execute the above-mentioned device management method.
In one aspect, embodiments of the present application provide a computer program product or computer program, which includes computer instructions stored in a computer-readable storage medium. The host controller of the computer device reads the computer instructions from the computer-readable storage medium, and the host controller executes the computer instructions, so that the computer device performs the device management method described above.
In the embodiment of the application, the main controller can send a heartbeat command to the peripheral controller of each expansion device through the first communication link, receive device state data sent by the peripheral controller of each expansion device in response to the heartbeat command, if the device state data sent by the peripheral controller of a target expansion device in one or more expansion devices is not received within a preset time period, send a first reset control instruction to the reset control circuit of the target expansion device, and the reset control circuit of the target expansion device responds to the first reset control instruction to execute reset operation on the peripheral controller of the target expansion device; the equipment management method can monitor the running state of the peripheral controller of the expansion equipment by judging whether the peripheral controller of the expansion equipment responds to the heartbeat command to return equipment state data, and meanwhile, when the peripheral controller of the target expansion equipment does not return the equipment state data, the main controller can send a first reset control instruction to the reset control circuit of the target expansion equipment to enable the peripheral controller of the target expansion equipment to execute reset operation, so that fault detection can be conveniently and effectively carried out on the peripheral controller of the expansion equipment, fault processing can be timely carried out, and service interruption time is effectively shortened.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic block diagram of a computer device according to an exemplary embodiment of the present application;
fig. 2 is a flowchart illustrating a device management method according to an exemplary embodiment of the present application;
fig. 3 is a schematic flowchart of a device management method according to another exemplary embodiment of the present application;
FIG. 4 is a schematic block diagram of a computer device according to another exemplary embodiment of the present application;
fig. 5 is a schematic structural diagram of a device management apparatus according to an exemplary embodiment of the present application;
fig. 6 is a schematic structural diagram of a computer device according to another exemplary embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the descriptions of "first", "second", etc. referred to in the embodiments of the present application are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a technical feature defined as "first" or "second" may explicitly or implicitly include at least one such feature.
In order to implement the purposes of conveniently and effectively implementing fault detection on a peripheral controller of an expansion device, timely performing fault processing and shortening service interruption time, the embodiment of the application provides an equipment management method based on a cloud technology.
Cloud technology (Cloud technology) is a generic term of network technology, information technology, integration technology, management platform technology, application technology and the like based on Cloud computing business model application, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have an own identification mark and needs to be transmitted to a background system for logic processing, data of different levels can be processed separately, and various industry data need strong system background support and can be realized only through cloud computing.
Cloud computing (cloud computing) is a computing model that distributes computing tasks over a large pool of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.
With research and development of cloud technologies, research and application of cloud technologies are developed in multiple fields, and the device management method in the embodiment of the present application relates to technologies such as cloud computing in the cloud technologies, and is specifically described by the following embodiments.
In order to better understand the device management method, apparatus, and computer storage medium provided in the embodiments of the present application, a description is first given below of a structure of a computer device to which the embodiments of the present application are applicable. Referring to fig. 1, fig. 1 is a schematic structural diagram of a computer device according to an exemplary embodiment of the present application. As shown in fig. 1, the computer device includes a main controller 101, one or more expansion devices 102, and the expansion devices 102 include a reset control circuit 103, a peripheral controller 104, and one or more service processing circuits 105. The main controller 101 is connected with the reset control circuit 103 and the peripheral controller 104 through a first communication link, the main controller 101 is connected with the peripheral controller 104 through a second communication link, and the peripheral controller 104 is connected with the service processing circuit 105. The reset control circuit 103 of the expansion device 102 may be connected to a reset control terminal of the peripheral controller 104 of the expansion device 102, or may be connected to a reset control terminal of the service processing circuit 105 of the expansion device 102. It should be noted that, in a specific implementation, the computer device further includes hardware resources such as a memory and a hard disk, and software resources such as an operating system and an application program. The computer device may specifically be a heterogeneous server, and the heterogeneous server may specifically be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services.
The first communication link may be used to transmit an operation instruction (e.g., a reset control instruction, a serial port switching instruction, a heartbeat command, etc.) issued by the main controller 101 to the expansion device 102, and may also be used to transmit data (e.g., the peripheral controller 104 of the expansion device 102 sends the power state, the temperature, etc. of the expansion device 102 to the main controller 101). The second communication link may be used to transmit data (e.g., the peripheral controller 104 of the expansion device 102 sends the power state, temperature, hardware configuration of the expansion device 102, log data of the peripheral controller 104, etc.) to the main controller 101.
The first communication link may specifically be, but is not limited to, a data transmission Bus such as an I2C Bus (serial transmission Bus), an SMBus (System Management Bus), and the like. The second communication link may specifically be a USB (Universal Serial Bus), an NCSI (Network Controller Sideband Interface), a UART (Universal Asynchronous Receiver/Transmitter), and the like, but is not limited thereto.
The main controller 101 may issue an operation instruction to the expansion device 102 and monitor an operation state of the expansion device, for example, a Baseboard Management Controller (BMC) or the like.
The reset control circuit 103 may be configured to execute an operation instruction issued by the master 101, and may be a reset circuit such as PCA 9555.
The peripheral Controller 104 may be used for device Management of the connected service processing circuit 105, and may be a Satellite Controller (SMC) or the like, for example.
The service processing circuit 105 is a circuit that processes a service, and may be, for example, an FPGA (Field Programmable Gate Array), a System on Chip (SoC), a storage server, a GPU server, a temperature sensor (e.g., LM 76), or the like.
In an embodiment, the main controller 101 may send a heartbeat command to the peripheral controller 104 of each expansion device 102 through the first communication link, and receive device status data sent by the peripheral controller 104 of each expansion device 102 in response to the heartbeat command, when the peripheral controller 104 of the target expansion device 102 does not return the device status data within a preset time period, at this time, the peripheral controller 104 of the target expansion device 102 may be abnormal, the main controller 101 may send a first reset control instruction to the reset control circuit 103 of the target expansion device 102, so that the reset control circuit 103 of the target expansion device 102 outputs reset trigger information to the reset control end of the peripheral controller 104 of the target expansion device 102, so that the peripheral controller 104 of the target expansion device 102 executes a reset operation, and this method may implement fault detection on the peripheral controller 104 of the expansion device 102 conveniently and effectively, and perform fault processing in time, and effectively shorten a service interruption time.
In one embodiment, the main controller 101 may be connected to the management device, and the connection may be a wired connection or a wireless connection. At this time, the main controller 101 may receive a serial port switching instruction sent by the management device, when operation type indication information included in the serial port switching instruction is a serial port switching operation, the main controller 101 sends the serial port switching instruction to the reset control circuit 103 of the target expansion device 102 to establish a second communication link between the main controller 101 and the peripheral controller 104 of the target expansion device 102, at this time, the main controller 101 may obtain one or more of hardware configuration data, device state data, and operation record data of the peripheral controller 104 of the target expansion device 102 from the peripheral controller 104 of the target expansion device 102 through the second communication link, and send the obtained data to the management device, and a user may analyze a fault of the target expansion device 102 and the peripheral controller 104 of the target expansion device 102 by looking up the one or more of the hardware configuration data, the device state data, and the operation record data on the management device; when the operation type indication information included in the serial port switching instruction is a reset operation, the main controller 101 sends a second reset control instruction to the reset control circuit 103 of the target expansion device 102, so as to perform a reset operation on a target reset object, wherein the target reset object is determined from one or more service processing circuits of the target expansion device and a peripheral controller of the target expansion device through the reset object indication information included in the serial port switching instruction, and the method can perform fault processing on the target expansion device 102 through the serial port switching instruction sent by the management device, and can also collect hardware configuration data, device state data and operation record data of the peripheral controller 104 of the target expansion device 102, so that fault reasons can be deeply analyzed offline, a large amount of time is saved, and the problem solution efficiency is greatly improved.
It can be understood that the structural schematic diagram of the computer device described in the embodiment of the present application is for more clearly illustrating the technical solution of the embodiment of the present application, and does not form a limitation on the technical solution provided in the embodiment of the present application, and as a person having ordinary skill in the art knows that along with the evolution of the structure and the appearance of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.
A device management method of the present application is described in detail below.
As shown in fig. 2, fig. 2 is a schematic flowchart of a device management method according to an exemplary embodiment of the present application, and taking the method as an example for being applied to the master controller 101 in fig. 1, the method may include the following steps:
s201, sending a heartbeat command to a peripheral controller of each expansion device through a first communication link.
The first communication link is an interactive host channel between the main controller and a peripheral controller of the expansion device, and may be used to transmit an operation instruction (e.g., a reset control instruction) and transmit data, and the first communication link may be a data transmission bus such as I2C, SMBus. The heartbeat command is used for instructing the peripheral controller of the expansion device to send the device state data to the main controller.
The main controller can periodically send a heartbeat command to the peripheral controller of each expansion device through the first communication link, so that the peripheral controller of each expansion device periodically sends corresponding device state data to the main controller, thereby monitoring the running state of the peripheral controller of each expansion device and detecting whether the peripheral controller of each expansion device is alive or not.
The main controller needs to detect the mounted expansion device on the first communication link before sending the heartbeat command to the peripheral controller of each expansion device periodically through the first communication link. One or more clamping grooves can be formed around the main control unit main board for inserting the expansion equipment, and if the expansion equipment is inserted into the corresponding clamping groove, the expansion equipment is externally connected to the main control unit main board, namely, the expansion equipment is mounted on the first communication link.
In one embodiment, the main controller and the reset control circuit of the expansion device are connected through a first communication link, the reset control circuit may include one or more device identification terminals, level values of the one or more device identification terminals may serve as device identifications of the expansion device, the main controller reads the level values of the one or more device identification terminals in the reset control circuit of the expansion device through the first communication link, and the expansion device and the device type may be uniquely determined according to the level values, so as to detect the expansion device mounted on the first communication link.
S202, receiving the device state data sent by the peripheral controller of each expansion device in response to the heartbeat command.
The device status data is data related to the expansion device, and may include, for example, a power supply status and a health status of the expansion device, and a hardware configuration of one or more service processing circuits connected to a peripheral controller of the expansion device. The service processing circuit is a circuit for processing specific services, such as a temperature sensor, soC, FPGA, and the like.
And after the main controller sends the heartbeat command to the peripheral controller of each expansion device, receiving device state data sent by the peripheral controller of each expansion device in response to the heartbeat command.
S203, if the device state data sent by the peripheral controller of the target expansion device in the one or more expansion devices is not received within a preset time length, sending a first reset control instruction to a reset control circuit of the target expansion device, wherein the first reset control instruction is used for instructing the reset control circuit of the target expansion device to execute reset operation on the peripheral controller of the target expansion device.
The preset time period may be configured according to an actual application scenario, for example, 5 minutes. The reset control circuit of the target expansion equipment is connected with the reset control end of the peripheral controller of the target expansion equipment. The target expansion device is an expansion device corresponding to the peripheral controller which does not send device state data to the main controller.
And when the main controller does not receive the equipment state data sent by the peripheral controller of the expansion equipment within the preset time length, the main controller is used as the peripheral controller of the target expansion equipment in one or more expansion equipments. At the moment, the main controller generates a first reset control instruction and sends the first reset control instruction to the reset control circuit of the target expansion equipment, the reset control circuit of the target expansion equipment responds to the first reset control instruction to generate a reset trigger signal, and the reset trigger signal is sent to the reset control end of the peripheral controller of the target expansion equipment, so that the peripheral controller of the target expansion equipment executes reset operation.
In the embodiment of the application, the main controller can send a heartbeat command to the peripheral controller of each expansion device through the first communication link, receive device state data sent by the peripheral controller of each expansion device in response to the heartbeat command, and send a first reset control instruction to the reset control circuit of the target expansion device if the device state data sent by the peripheral controller of the target expansion device in one or more expansion devices is not received within a preset time period, so that the reset control circuit of the target expansion device executes reset operation on the peripheral controller of the target expansion device; the equipment management method can monitor the running state of the peripheral controller of the expansion equipment by judging whether the peripheral controller of the expansion equipment responds to the heartbeat command to return equipment state data, and meanwhile, when the peripheral controller of the target expansion equipment does not return the equipment state data, the main controller can send a first reset control instruction to the reset control circuit of the target expansion equipment to enable the peripheral controller of the target expansion equipment to execute reset operation, so that fault detection can be conveniently and effectively carried out on the peripheral controller of the expansion equipment, fault processing can be timely carried out, and service interruption time is effectively shortened.
In some cases, for example, when the peripheral controller is hung up, the peripheral controller may still not respond to the heartbeat command sent by the main controller after the peripheral controller of the target expansion device performs the reset operation, and therefore, manual intervention is required to perform fault handling on the peripheral controller. Based on this, as shown in fig. 3, fig. 3 is a schematic flowchart of a device management method provided in another exemplary embodiment of the present application, and taking the method as an example for being applied to the main controller 101 in fig. 1, the method may include the following steps:
s301, sending a heartbeat command to a peripheral controller of the target expansion device through the first communication link.
The target expansion device is an expansion device corresponding to the peripheral controller which does not send device state data to the main controller within a preset time length after receiving the heartbeat command.
In one embodiment, the main controller may periodically send a heartbeat command to the peripheral controller of the target expansion device through the first communication link based on an IPMI (Intelligent Platform Management Interface) protocol.
The IPMI protocol is an industry standard for managing peripheral devices used in an enterprise system based on an Intel architecture, and may be used to monitor physical health characteristics of a server, such as temperature, voltage, fan operating state, power state, etc.
In an embodiment, the master controller sends a heartbeat command to the peripheral controller of the target expansion device through the first communication link, where the heartbeat command may specifically be: the detailed information of GET Sensor Reading, which includes the network function code (NetFn), the Command identifier (Command), the Request Data (Request Data), and the Response Data (Response Data) returned after responding to the heartbeat Command GET Sensor Reading, is shown in table 1 below as an example. The network function code may be used to identify a network protocol (e.g., 0x3A for IPMI protocol), a command identification may be used to identify a command to execute (e.g., 0x12 for a heartbeat command), and request data may be used to set a speed for a sectored fan.
TABLE 1
Figure BDA0003138464760000101
Figure BDA0003138464760000111
S302, if the equipment state data sent by the peripheral controller of the target expansion equipment responding to the heartbeat command is not received within the preset time length, sending fault indication information of the target expansion equipment to management equipment.
The device state data is response data returned by the peripheral controller of the target expansion device in response to the heartbeat command, and is data related to the target expansion device, which may include, for example, a power state and a health state of the target expansion device.
When the main controller does not receive the device state data sent by the peripheral controller of the target expansion device in response to the heartbeat command within the preset time length, the fault indication information of the target expansion device can be sent to the management device, and a user can check and receive the sent fault indication information in the management device and determine the expansion device (namely the target expansion device) with the fault, so that the user can conveniently intervene manually to debug the target expansion device, locate the fault position, analyze the fault and the like.
In one embodiment, the main controller may send the fault indication information of the target expansion device to the management device when the heartbeat command is not received within the preset time period after the first reset control instruction is sent to the peripheral controller of the target expansion device and then the heartbeat command is sent to the peripheral controller of the target expansion device, and when the device status data sent by the peripheral controller of the target expansion device in response to the heartbeat command is not received within the preset time period; or sending the fault indication information of the target expansion device to the management device when the device state data sent by the peripheral controller of the target expansion device in response to the heartbeat command is not received within the preset time length, which is not limited in the present application.
S303, receiving a serial port switching instruction sent by the management equipment, wherein the serial port switching instruction comprises operation type indication information.
The management device may send a serial port switching instruction to the main controller through wired connection or wireless connection, where the serial port switching instruction includes operation type indication information used to indicate an operation that needs to be performed by a reset control circuit of the expansion device, for example, a communication link between a serial port of a peripheral controller of the target expansion device and a serial port of the main controller may be established, or a reset operation may be performed on the peripheral controller.
In one embodiment, a user may send a serial port switching instruction to the main controller on the management device (for example, through the operation and maintenance management platform), and the main controller receives the serial port switching instruction sent by the management device.
In one embodiment, taking the serial port switching instruction as Force Hardware Operation as an example, the detailed information of the Force Hardware Operation is shown in table 2 below, which includes a network function code (NetFn), a Command identifier (Command), request Data (Request Data), and Response Data (Response Data) returned after responding to GET Sensor Reading. The network function code may be used to identify a network protocol (for example, 0x3A is an IPMI protocol), a command identifier may be used to identify an executed command (for example, 0x11 is a serial port switching instruction), request data may be used to determine operation logic of the serial port switching instruction (including operation type indication information, etc.), and response data may be used to indicate execution of the serial port switching instruction.
TABLE 2
Figure BDA0003138464760000131
When the Type of the expansion Device (Device Type) in table 3 is a GPU server (JBOD) or a storage server (JBOG), the peripheral controller of the expansion Device is connected to the GPU server or the storage server, that is, the GPU server or the storage server is a service processing circuit, and when the Type of the expansion Device (Device Type) is an intelligent network card, the service processing circuit, for example, an FPGA, an SoC, or the like, needs to be determined according to the Device connected to the peripheral controller.
And S304, if the operation type indication information is serial port switching operation, sending the serial port switching instruction to a reset control circuit of the target expansion device, where the serial port switching instruction is used to instruct the reset control circuit of the target expansion device to establish a second communication link between the main controller and a peripheral controller of the target expansion device, and the second communication link is a communication link between a serial port of the peripheral controller of the target expansion device and a serial port of the main controller.
The second communication link is a channel for remote maintenance and diagnosis interaction, is used for data transmission, is a communication link between a serial port of the peripheral controller of the target expansion device and a serial port of the main controller, and may be, for example, a USB, an NCSI, a UART, or the like.
The reset control circuit of the target expansion equipment is connected with a serial port switching control end of a peripheral controller of the target expansion equipment, if Operation type indicating information included in a serial port switching instruction is serial port switching Operation, for example, request data byte [2] of a serial port switching instruction Force Hardware Operation is 1, the Operation type indicating information is the serial port switching Operation, the main controller sends the serial port switching instruction to the reset control circuit of the target expansion equipment, the reset control circuit of the target expansion equipment responds to the serial port switching instruction to generate a switching trigger signal aiming at the serial port switching control end, and therefore a second communication link between the peripheral controller of the target expansion equipment and the main controller is established.
In one embodiment, the reset control circuit may be a PCA9555 defining the following table 3:
TABLE 3
Figure BDA0003138464760000141
Figure BDA0003138464760000151
Wherein, IO is PCA 9555's pin, IO0_0, IO0_1, IO0_2, IO0_3, IO0_4, IO0_5, IO0_6, IO0_7 are reset control circuit's equipment identification end for instruct the equipment identification of extension fixture, IO1_2 is the reset control end of peripheral hardware controller, IO1_3 can insert first communication link, input P12V's voltage, IO1_6 is serial ports switching control end.
In one embodiment, the reset control circuit may respond to the serial port switching instruction, and then generate the switching trigger information by pulling down the level or pulling up the level of the serial port switching control terminal (IO 1_ 6), so as to establish the second communication link between the main controller and the peripheral controller of the target expansion device.
In one embodiment, the host controller may receive a login request sent by the management device, establish a network connection between the management device and a Serial port of a peripheral controller of the target expansion device through a data transparent port (e.g., a SOL (Serial Over LAN)) in response to the login request, and send operation record data of the peripheral controller of the target expansion device to the management device by using the network connection, where the operation record data includes register data and log data, and a user may view the operation record data at the management device, so as to perform failure analysis on the peripheral controller of the target expansion device. Through the embodiment, no matter the peripheral controller is in the running stage, the boot stage or is not alive (for example, hung up), the management device can obtain the running record data of the peripheral controller of the target expansion device through remote login, so that remote diagnosis is performed, compared with field diagnosis, a large amount of time can be saved, and the operation and maintenance efficiency is improved.
S305, if the operation type indication information is reset operation, acquiring reset object indication information included in the serial port switching instruction, and determining a target reset object from the one or more service processing circuits and the peripheral controller of the target expansion device according to the reset object indication information.
The reset object indication information is used for determining a target reset object from one or more service processing circuits of the target expansion device and a peripheral controller of the target expansion device, and the target reset object may be one or more.
In one embodiment, the reset control circuit of the target expansion device may be connected to a reset control terminal of each service processing circuit of the expansion device, in addition to the peripheral controller of the target expansion device.
If the Operation type indication information included in the serial port switching instruction is reset Operation, for example, the request data byte [2] in the Force Hardware Operation of the serial port switching instruction is 0, the Operation type indication information is reset Operation, the main controller acquires the reset object indication information included in the serial port switching instruction, and determines a target reset object according to the reset object indication information, and the reset control end of the target reset object is connected with the reset control circuit of the target expansion device.
And S306, sending a second reset control instruction to the reset control circuit of the target expansion device, wherein the second reset control instruction is used for instructing the reset control circuit of the target expansion device to execute a reset operation on the target reset object.
And the second reset control instruction is used for instructing a reset control circuit of the target expansion equipment to execute reset operation on the determined target reset object.
And after determining the target reset object, the main controller sends a second reset control instruction to the reset control circuit of the target expansion device, the reset control circuit of the target expansion device responds to the second reset control instruction to generate a reset trigger signal aiming at the target reset object, and the reset trigger information is sent to the reset control end of the target reset object, so that the target reset object executes reset operation.
In one embodiment, the reset control circuit may be a PCA9555, where the PCA9555 may be as shown in table 3 above, IO1_0, IO1_1, IO1_4, IO1_5, and IO1_7 of the PCA9555 may be configured to connect a reset control terminal of each service processing circuit of the expansion device, and the PCA9555 may generate reset trigger information by pulling down a level or a pulling up a level of IO1_0, IO1_1, IO1_2, IO1_4, IO1_5, and IO1_7 in response to the second reset control instruction, so as to perform a reset operation on a target reset object connected thereto.
In one embodiment, if the main controller receives device state data sent by the peripheral controller of the target expansion device within a preset time period, that is, the target expansion device can normally respond to the heartbeat command, the main controller sends a data acquisition command to the peripheral controller of the target expansion device through the first communication link, the peripheral controller of the target expansion device responds to the data acquisition command and sends hardware configuration data, device state data and operation record data of the peripheral controller of the target expansion device to the main controller through the second communication link, the main controller further sends the hardware configuration data, the device state data and the operation record data to the management device, and a user can check the hardware configuration data, the device state data and the operation record data in the management device, so as to perform fault analysis on the target expansion device.
In one embodiment, the data acquisition instruction may be sent when the device state data sent by the peripheral controller of the target expansion device is received within a preset time length when the heartbeat command is sent again to the peripheral controller of the target expansion device after the first reset instruction is sent to the peripheral controller of the target expansion device; or sending the device state data sent by the peripheral controller of the target expansion device within a preset time length when sending the heartbeat command to the peripheral controller of the target expansion device again after sending the second reset instruction to the peripheral controller of the target expansion device.
In an embodiment, the second communication link may be a USB, and the main controller may obtain the hardware configuration data, the device state data, and the operation record data from a peripheral controller of the target expansion device through the following steps:
(1) The peripheral controller power save mode is turned off (virtual device enabled).
(2) A virtual disk is created at the peripheral controller side and mounted to a virtual directory (e.g.,/var/usb 0).
(3) And the peripheral controller copies the hardware configuration data, the equipment state data and the operation record data to a virtual directory.
(4) And mounting the virtual disk as a USB device to the main controller.
(5) The main controller copies the hardware configuration data, the equipment state data and the operation record data from the virtual directory to the relevant path.
(6) And unloading the virtual disk from the main controller.
(7) And unloading the virtual disk at the peripheral controller end and deleting the virtual disk to release the memory.
(8) The peripheral controller power save mode is turned on (virtual device disabled).
According to the method, the main controller sends the heartbeat command to the peripheral controller of the target expansion equipment through the first communication link, if the equipment state data sent by the peripheral controller of the target expansion equipment responding to the heartbeat command is not received within the preset time, the fault indication information can be sent to the management equipment, the serial port switching instruction sent by the management equipment is received, and the reset operation can be carried out on one or more service processing circuits and the peripheral controller of the target expansion equipment according to the operation type indication information included by the serial port switching instruction, so that the method can remotely carry out the reset operation on the target expansion equipment, compared with the situation that the target expansion equipment is reset directly on site manually, the method can save a large amount of time, reduce service interruption time, and is high in operation and maintenance efficiency and low in operation and maintenance cost; the method can also establish a second communication link, acquire the hardware configuration data of the target expansion equipment, the equipment state data and the operation record data of the peripheral controller of the target expansion equipment, and send the operation record data to the management equipment, so that offline deep analysis can be performed, and the fault reason can be found more conveniently.
As a specific example of the present application, as shown in fig. 4, a schematic structural diagram of another computer device is provided, where the computer device includes a main controller (BMC), an expansion device (smart network card), and the smart network card includes a peripheral controller (SMC), a reset control circuit (PCA 9555), and one or more service processing circuits (FPGA, soC, FPGA FLASH). The BMC, the SMC and the PCA9555 are connected through a first communication link (I2C), the BMC and the SMC are connected through a second communication link (USB/NCSI/UART), the FPGA, the FPGA FLASH and the SMC are connected through a Serial Peripheral Interface (SPI), the SMC and the SoC are connected through a data Interface (Low pin count Bus, LPC), the PCA9555 comprises a plurality of pins, for example, the RST FPGA can be connected with a reset control terminal of the FPGA, the RST SMC can be connected with a reset control terminal of the SMC, and the SoC RST can be connected with a reset control terminal of the SoC. The operation and maintenance management platform may be installed on the management device, and may communicate with the BMC through a wired connection or a wireless connection, and the operation and maintenance management platform may be a Tencent out band Control (TOC).
In one embodiment, the BMC may periodically send a heartbeat command to the SMC, receive the device status data sent by the SMC in response to the heartbeat command, and send a first reset control instruction to the SMC if the device status data sent by the SMC is not received within a preset time period, so that the RST SMC outputs a reset trigger signal, and sends the reset trigger signal to a reset control terminal of the SMC, so that the SMC can be automatically reset.
In one embodiment, if the SMC does not respond to the heartbeat command within the preset time length after being reset or the SMC does not respond to the heartbeat command within the preset time length, the management device may send a serial port switching instruction to the BMC through the operation and maintenance management platform, and when the serial port switching instruction indicates that a reset operation is to be executed, a second reset instruction may be generated to reset one or more of the SMC and the service processing circuit; meanwhile, when the serial port switching instruction indicates that the serial port switching operation is executed, the first communication link between the BMC and the SMC can be switched forcibly, the second communication link between the BMC and the SMC is established, and the device state data (such as temperature and power state), the hardware configuration data (such as FPGA, soC, FPGA FLASH and SMC) and the operation state data (such as register data and log data) of the intelligent network card are acquired through the second communication link, so that offline fault analysis can be conveniently provided for a user.
In one embodiment, the user may further send a login request to the BMC through a management device (e.g., an operation and maintenance management platform) to establish a connection between the management device and the serial port of the SMC, so as to obtain the operation status data of the SMC, so that the user may remotely diagnose the cause of the failure of the SMC by looking up the operation status data.
In one embodiment, when the BMC can receive the device status data returned by the SMC within the preset time period, the device status data of the intelligent network card, the hardware configuration data, and the operation status data (e.g., register data, log data, etc.) of the SMC may be acquired through the second communication link, so as to provide the user with convenience for performing offline fault analysis on the expansion device (the intelligent network card).
By the embodiment of the application, the running state of the peripheral controller (SMC) of the expansion equipment (intelligent network card) can be monitored, and the reset operation can be automatically executed on the peripheral controller (SMC), so that the service interruption time can be reduced; when the reset fails, the expansion equipment can be reset through manual intervention, in addition, a main controller (BMC) can also automatically acquire equipment state data, hardware configuration data and running state data, and meanwhile, a user can also remotely log in to check the running state data, so that the user can conveniently analyze the fault of the expansion equipment.
While the method of the embodiments of the present application has been described in detail above, to facilitate better implementation of the above-described aspects of the embodiments of the present application, the apparatus of the embodiments of the present application is provided below accordingly. Referring to fig. 5, fig. 5 is a schematic structural diagram of a device management apparatus according to an exemplary embodiment of the present application, where the apparatus 50 may include:
a sending module 501, configured to send a heartbeat command to a peripheral controller of each expansion device through the first communication link;
a receiving module 502, configured to receive device status data sent by the peripheral controller of each expansion device in response to the heartbeat command;
the sending module 501 is further configured to send a first reset control instruction to a reset control circuit of the target expansion device if the device state data sent by the peripheral controller of the target expansion device in the one or more expansion devices is not received within a preset time period, where the first reset control instruction is used to instruct the reset control circuit of the target expansion device to perform a reset operation on the peripheral controller of the target expansion device.
In an embodiment, the sending module 501 is further configured to:
generating a first reset control instruction;
and sending the first reset control instruction to a reset control circuit of the target expansion equipment.
In an embodiment, the sending module 501 is further configured to:
if the equipment state data sent by the peripheral controller of the target expansion equipment is not received within the preset time length, sending fault indication information of the target expansion equipment to management equipment;
the receiving module 502 is further configured to:
receiving a serial port switching instruction sent by the management equipment;
the sending module 501 is further configured to:
and sending the serial port switching instruction to a reset control circuit of the target expansion device, wherein the serial port switching instruction is used for indicating the reset control circuit of the target expansion device to establish a second communication link between the main controller and a peripheral controller of the target expansion device, and the second communication link is a communication link between a serial port of the peripheral controller of the target expansion device and a serial port of the main controller.
In an embodiment, the receiving module 502 is further configured to:
receiving a login request sent by the management equipment;
the sending module 501 is further configured to:
responding to the login request, and establishing network connection between the management equipment and a serial port of a peripheral controller of the target expansion equipment through a data transparent transmission port;
and sending operation record data of the peripheral controller of the target expansion equipment to the management equipment through the network connection, wherein the operation record data comprises register data and log data, and the operation record data is used for carrying out fault analysis on the peripheral controller of the target expansion equipment.
In an embodiment, the sending module 501 is further configured to:
if the operation type indication information is serial port switching operation, executing the step of sending the serial port switching instruction to a reset control circuit of the target expansion equipment;
the device management apparatus further includes a processing module 503, where the processing module 503 is configured to:
if the operation type indication information is reset operation, acquiring reset object indication information included in the serial port switching instruction, and determining a target reset object from the one or more service processing circuits and the peripheral controller of the target expansion device according to the reset object indication information;
the sending module 501 is further configured to:
and sending a second reset control instruction to the reset control circuit of the target expansion equipment, wherein the second reset control instruction is used for instructing the reset control circuit of the target expansion equipment to execute reset operation on the target reset object.
In an embodiment, the sending module 501 is further configured to:
if the device state data sent by the peripheral controller of the target expansion device is received within the preset time length, sending a data acquisition command to the peripheral controller of the target expansion device through the first communication link, wherein the data acquisition command is used for indicating the peripheral controller of the target expansion device to acquire hardware configuration data and device state data of the target expansion device and operation record data of the peripheral controller of the target expansion device;
the receiving module 502 is further configured to:
receiving the hardware configuration data, the equipment state data and the operation record data which are sent by the peripheral controller of the target expansion equipment through the second communication link;
the sending module 501 is further configured to:
and sending the hardware configuration data, the equipment state data and the operation record data to the management equipment, wherein the hardware configuration data, the equipment state data and the operation record data are used for carrying out fault analysis on the target expansion equipment.
In the embodiment of the application, the main controller can send a heartbeat command to the peripheral controller of each expansion device through the first communication link, receive device state data sent by the peripheral controller of each expansion device in response to the heartbeat command, and send a first reset control instruction to the reset control circuit of the target expansion device if the device state data sent by the peripheral controller of the target expansion device in one or more expansion devices is not received within a preset time period, so that the reset control circuit of the target expansion device executes reset operation on the peripheral controller of the target expansion device; the equipment management method can monitor the running state of the peripheral controller of the expansion equipment by judging whether the peripheral controller of the expansion equipment responds to the heartbeat command to return equipment state data, and meanwhile, when the peripheral controller of the target expansion equipment does not return the equipment state data, the main controller can send a first reset control instruction to the reset control circuit of the target expansion equipment to enable the peripheral controller of the target expansion equipment to execute reset operation, so that fault detection can be conveniently and effectively carried out on the peripheral controller of the expansion equipment, fault processing can be timely carried out, and service interruption time is effectively shortened.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device provided in an embodiment of the present application, where the computer device 60 at least includes a main controller 601, one or more expansion devices 602, a memory 603, and a communication interface 604, where the expansion device 602 at least includes a peripheral controller 6021 and a reset control circuit 6022. The main controller 601, the expansion device 602, the memory 603, and the communication interface 604 may be connected by a bus or other means, the main controller 601 and the expansion device 602 may also be connected by a first communication link (e.g., I2C), and the main controller 601, the peripheral controller 6021, and the reset control circuit 6022 may be connected by the first communication link. The communication interface 604 may be used to receive or transmit data. A computer program comprising computer instructions is stored in the memory 603. The main controller 601 is used to execute computer instructions. The main controller 601 (which is a computing core and a control core of the computer device 60, and is adapted to be loaded and executed by the main controller 601 to implement the corresponding steps in the device management method embodiments described above, and in a specific implementation, the computer instructions in the memory 603 are loaded and executed by the main controller 601 to implement the following steps:
sending a heartbeat command to the peripheral controller 6021 of each expansion device 602 over the first communication link;
receiving device status data sent by the peripheral controller 6021 of each expansion device 602 in response to the heartbeat command;
if the device status data sent by the peripheral controller 6021 of the target expansion device 602 in the one or more expansion devices 602 is not received within a preset time period, a first reset control instruction is sent to the reset control circuit 6022 of the target expansion device 602, and the first reset control instruction is used for instructing the reset control circuit 6022 of the target expansion device 602 to perform a reset operation on the peripheral controller 6021 of the target expansion device 602.
In an embodiment, the main controller 601 is further configured to:
generating a first reset control instruction;
the first reset control instruction is sent to the reset control circuit 6022 of the target expansion device 602.
In an embodiment, the main controller 601 is further configured to:
if the device state data sent by the peripheral controller 6021 of the target expansion device 602 is not received within the preset time length, sending fault indication information of the target expansion device 602 to a management device;
receiving a serial port switching instruction sent by the management equipment;
sending the serial port switching instruction to the reset control circuit 6022 of the target expansion device 602, where the serial port switching instruction is used to instruct the reset control circuit 6022 of the target expansion device 602 to establish a second communication link between the main controller and the peripheral controller 6021 of the target expansion device 602, and the second communication link is a communication link between the serial port of the peripheral controller 6021 of the target expansion device 602 and the serial port of the main controller.
In an embodiment, the main controller 601 is further configured to:
receiving a login request sent by the management equipment;
responding to the login request, and establishing network connection between the management equipment and a serial port of a peripheral controller 6021 of the target expansion equipment 602 through a data transparent transmission port;
and sending operation record data of the peripheral controller 6021 of the target expansion device 602 to the management device through the network connection, wherein the operation record data comprises register data and log data, and the operation record data is used for performing fault analysis on the peripheral controller 6021 of the target expansion device 602.
In an embodiment, the main controller 601 is further configured to:
if the operation type indication information is a serial port switching operation, the step of sending the serial port switching instruction to the reset control circuit 6022 of the target expansion device 602 is executed;
if the operation type indication information is a reset operation, acquiring reset object indication information included in the serial port switching instruction, and determining a target reset object from the one or more service processing circuits and the peripheral controller 6021 of the target expansion device 602 according to the reset object indication information;
sending a second reset control instruction to the reset control circuit 6022 of the target extension device 602, the second reset control instruction being used to instruct the reset control circuit 6022 of the target extension device 602 to perform a reset operation on the target reset object.
In an embodiment, the main controller 601 is further configured to:
if the device state data sent by the peripheral controller 6021 of the target expansion device 602 is received within the preset time period, sending a data acquisition command to the peripheral controller 6021 of the target expansion device 602 through the first communication link, where the data acquisition command is used to instruct the peripheral controller 6021 of the target expansion device 602 to acquire the hardware configuration data and the device state data of the target expansion device 602 and the operation record data of the peripheral controller 6021 of the target expansion device 602;
receiving the hardware configuration data, the device state data and the operation record data sent by the peripheral controller 6021 of the target expansion device 602 through the second communication link;
and sending the hardware configuration data, the device state data and the operation record data to the management device, where the hardware configuration data, the device state data and the operation record data are used to perform fault analysis on the target expansion device 602.
In the embodiment of the application, the main controller can send a heartbeat command to the peripheral controller of each expansion device through the first communication link, receive device state data sent by the peripheral controller of each expansion device in response to the heartbeat command, and send a first reset control instruction to the reset control circuit of the target expansion device if the device state data sent by the peripheral controller of the target expansion device in one or more expansion devices is not received within a preset time period, so that the reset control circuit of the target expansion device executes reset operation on the peripheral controller of the target expansion device; the equipment management method can monitor the running state of the peripheral controller of the expansion equipment by judging whether the peripheral controller of the expansion equipment responds to the heartbeat command to return equipment state data, and meanwhile, when the peripheral controller of the target expansion equipment does not return the equipment state data, the main controller can send a first reset control instruction to the reset control circuit of the target expansion equipment to enable the peripheral controller of the target expansion equipment to execute reset operation, so that fault detection can be conveniently and effectively carried out on the peripheral controller of the expansion equipment, fault processing can be timely carried out, and service interruption time is effectively shortened.
Embodiments of the present application also provide a computer readable storage medium (Memory), which is a Memory device in the computer device 60 and is used for storing programs and data. It is understood that the computer-readable storage medium herein can include both built-in storage media in the computer device 60 and, of course, extended storage media supported by the computer device 60. The computer readable storage medium provides storage space that stores an operating system for the computer device 60. Also, one or more computer instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the aforementioned main controller 601. It should be noted that the computer-readable storage medium may be a high-speed RAM Memory, or may be a Non-Volatile Memory (Non-Volatile Memory), such as at least one disk Memory; optionally, at least one computer readable storage medium may be located remotely from the aforementioned master controller 601.
One or more embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The host controller of the computer device reads the computer instructions from the computer-readable storage medium, and the host controller executes the computer instructions to cause the computer device to perform the steps performed in the embodiments of the methods described above.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the claims. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. The device management method is applied to a computer device, the computer device comprises a main controller and one or more expansion devices, the expansion devices comprise a reset control circuit and a peripheral controller, the main controller is connected with the peripheral controller and the reset control circuit through a first communication link, and the method comprises the following steps:
sending a heartbeat command to a peripheral controller of each expansion device over the first communication link;
receiving equipment state data sent by the peripheral controller of each expansion equipment in response to the heartbeat command;
if the device state data sent by the peripheral controller of the target expansion device in the one or more expansion devices is not received within a preset time length, sending a first reset control instruction to a reset control circuit of the target expansion device, where the first reset control instruction is used to instruct the reset control circuit of the target expansion device to execute a reset operation on the peripheral controller of the target expansion device.
2. The method of claim 1, wherein sending a first reset control instruction to a reset control circuit of the target expansion device comprises:
generating a first reset control instruction;
and sending the first reset control instruction to a reset control circuit of the target expansion equipment.
3. The method according to claim 1 or 2, characterized in that the reset control circuit of the target expansion device is connected with the reset control terminal of the peripheral controller of the target expansion device; and the reset control circuit of the target expansion equipment is used for responding to the first reset control instruction to generate a reset trigger signal aiming at the reset control end, and the reset trigger signal is used for triggering the reset of a peripheral controller of the target expansion equipment.
4. The method of claim 1, wherein after sending the first reset control instruction to the reset control circuit of the target expansion device, the method further comprises:
if the equipment state data sent by the peripheral controller of the target expansion equipment is not received within the preset time length, sending fault indication information of the target expansion equipment to management equipment;
receiving a serial port switching instruction sent by the management equipment;
and sending the serial port switching instruction to a reset control circuit of the target expansion device, wherein the serial port switching instruction is used for indicating the reset control circuit of the target expansion device to establish a second communication link between the main controller and a peripheral controller of the target expansion device, and the second communication link is a communication link between a serial port of the peripheral controller of the target expansion device and a serial port of the main controller.
5. The method of claim 4, further comprising:
receiving a login request sent by the management equipment;
responding to the login request, and establishing network connection between the management equipment and a serial port of a peripheral controller of the target expansion equipment through a data transparent transmission port;
and sending operation record data of the peripheral controller of the target expansion equipment to the management equipment through the network connection, wherein the operation record data comprises register data and log data, and the operation record data is used for carrying out fault analysis on the peripheral controller of the target expansion equipment.
6. The method according to claim 4, wherein the reset control circuit of the target expansion device is connected with a serial port switching control end of a peripheral controller of the target expansion device; the reset control circuit of the target expansion device is used for responding to the serial port switching instruction to generate a switching trigger signal aiming at the serial port switching control end, and the switching trigger signal is used for triggering a serial port of a peripheral controller of the target expansion device and a serial port of the main controller to establish a second communication link.
7. The method according to claim 4, wherein the target expansion device further includes one or more service processing circuits, the reset control circuit of the target expansion device is connected to the reset control terminal of each service processing circuit, and the serial port switching instruction includes operation type indication information; after receiving the serial port switching instruction sent by the management device and before sending the serial port switching instruction to the reset control circuit of the target expansion device, the method further includes:
if the operation type indication information is serial port switching operation, executing the step of sending the serial port switching instruction to a reset control circuit of the target expansion equipment;
if the operation type indication information is reset operation, acquiring reset object indication information included in the serial port switching instruction, and determining a target reset object from the one or more service processing circuits and the peripheral controller of the target expansion device according to the reset object indication information;
and sending a second reset control instruction to the reset control circuit of the target expansion equipment, wherein the second reset control instruction is used for instructing the reset control circuit of the target expansion equipment to execute reset operation on the target reset object.
8. The method according to any one of claims 4 to 7, wherein after the serial port switching instruction is sent to the reset control circuit of the target expansion device, the method further comprises:
if the device state data sent by the peripheral controller of the target expansion device is received within the preset time length, sending a data acquisition command to the peripheral controller of the target expansion device through the first communication link, wherein the data acquisition command is used for indicating the peripheral controller of the target expansion device to acquire hardware configuration data and device state data of the target expansion device and operation record data of the peripheral controller of the target expansion device;
receiving the hardware configuration data, the equipment state data and the operation record data which are sent by the peripheral controller of the target expansion equipment through the second communication link;
and sending the hardware configuration data, the equipment state data and the operation record data to the management equipment, wherein the hardware configuration data, the equipment state data and the operation record data are used for carrying out fault analysis on the target expansion equipment.
9. The device management apparatus is applied to a computer device, the computer device includes a main controller, one or more expansion devices, the expansion device includes a reset control circuit and a peripheral controller, the main controller establishes a connection with the peripheral controller and the reset control circuit through a first communication link, and the apparatus includes:
a sending module, configured to send a heartbeat command to a peripheral controller of each expansion device through the first communication link;
a receiving module, configured to receive device status data sent by the peripheral controller of each extension device in response to the heartbeat command;
the sending module is further configured to send a first reset control instruction to a reset control circuit of the target expansion device if the device state data sent by the peripheral controller of the target expansion device in the one or more expansion devices is not received within a preset time period, where the first reset control instruction is used to instruct the reset control circuit of the target expansion device to perform a reset operation on the peripheral controller of the target expansion device.
10. A computer-readable storage medium, characterized in that it stores a computer program adapted to be loaded by a master controller and to perform the device management method of any one of claims 1 to 8.
CN202110731449.5A 2021-06-29 2021-06-29 Equipment management method and device and computer storage medium Pending CN115543872A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110731449.5A CN115543872A (en) 2021-06-29 2021-06-29 Equipment management method and device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110731449.5A CN115543872A (en) 2021-06-29 2021-06-29 Equipment management method and device and computer storage medium

Publications (1)

Publication Number Publication Date
CN115543872A true CN115543872A (en) 2022-12-30

Family

ID=84717185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110731449.5A Pending CN115543872A (en) 2021-06-29 2021-06-29 Equipment management method and device and computer storage medium

Country Status (1)

Country Link
CN (1) CN115543872A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116743883A (en) * 2023-08-15 2023-09-12 中移(苏州)软件技术有限公司 Intelligent network card, data processing system and working method thereof
CN116932274A (en) * 2023-09-19 2023-10-24 苏州元脑智能科技有限公司 Heterogeneous computing system and server system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116743883A (en) * 2023-08-15 2023-09-12 中移(苏州)软件技术有限公司 Intelligent network card, data processing system and working method thereof
CN116743883B (en) * 2023-08-15 2023-11-03 中移(苏州)软件技术有限公司 Intelligent network card, data processing system and working method thereof
CN116932274A (en) * 2023-09-19 2023-10-24 苏州元脑智能科技有限公司 Heterogeneous computing system and server system
CN116932274B (en) * 2023-09-19 2024-01-09 苏州元脑智能科技有限公司 Heterogeneous computing system and server system

Similar Documents

Publication Publication Date Title
TWI618380B (en) Management methods, service controller devices and non-stransitory, computer-readable media
EP3575975B1 (en) Method and apparatus for operating smart network interface card
US20040228063A1 (en) IPMI dual-domain controller
EP2472402A1 (en) Remote management systems and methods for mapping operating system and management controller located in a server
CN115543872A (en) Equipment management method and device and computer storage medium
CN109189627B (en) Hard disk fault monitoring and detecting method, device, terminal and storage medium
US10691562B2 (en) Management node failover for high reliability systems
CN116719700B (en) Method and device for monitoring hardware partition of server host system
CN115599617B (en) Bus detection method and device, server and electronic equipment
CN109542198B (en) Method and equipment for controlling power-on of PCIE card
CN115098342A (en) System log collection method, system, terminal and storage medium
CN106649002A (en) Server and method for automatically overhauling baseboard management controller
JP6897145B2 (en) Information processing device, information processing system and information processing device control method
CN115509333A (en) Server collaborative power-on and power-off device, method, system and medium
CN115543746A (en) Graphics processor monitoring method, system and device and electronic equipment
CN115168146A (en) Anomaly detection method and device
CN103326897A (en) Distributed computing environment general monitoring device and failure detection method
CN112003727A (en) Multi-node server power supply testing method, system, terminal and storage medium
CN111694587A (en) Server PNOR firmware upgrading method, device, equipment and storage medium
JP2007094470A (en) Method of hotplugging information processing apparatus
CN116483613B (en) Processing method and device of fault memory bank, electronic equipment and storage medium
CN117093465B (en) Server log collection method, device, communication equipment and storage medium
US9639438B2 (en) Methods and systems of managing an interconnection
CN114328044B (en) AIC+box topology testing method, device and system
CN117555760B (en) Server monitoring method and device, substrate controller and embedded system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40079490

Country of ref document: HK