CN117251039A - Equipment resetting method and device, storage medium and electronic equipment - Google Patents

Equipment resetting method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN117251039A
CN117251039A CN202311152696.5A CN202311152696A CN117251039A CN 117251039 A CN117251039 A CN 117251039A CN 202311152696 A CN202311152696 A CN 202311152696A CN 117251039 A CN117251039 A CN 117251039A
Authority
CN
China
Prior art keywords
reset
resource pool
equipment
host
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311152696.5A
Other languages
Chinese (zh)
Inventor
郭洁
汪浩
王兴隆
郭平
马晓宇
于明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202311152696.5A priority Critical patent/CN117251039A/en
Publication of CN117251039A publication Critical patent/CN117251039A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/24Resetting means

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a device resetting method, a device, a storage medium and electronic equipment, wherein the method is applied to a baseboard management controller of any resource pool in a server; the server comprises a plurality of resource pools, each resource pool comprises a complex programmable logic device and a baseboard management controller, and the complex programmable logic device is used for transmitting a reset signal sent by the baseboard management controller of any resource pool to equipment to be reset in each resource pool; the method comprises the following steps: receiving a reset command and analyzing the reset command to obtain a reset scene corresponding to the reset command; determining a reset method corresponding to the reset scene based on the reset scene; and executing a reset method to determine the target complex programmable logic device and the target equipment to be reset, and sending a reset signal to the target complex programmable logic device to reset the target equipment to be reset. According to the method and the device, the reset of the equipment is realized in a software control mode, and the reset efficiency is improved.

Description

Equipment resetting method and device, storage medium and electronic equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a device reset method, device, storage medium, and electronic device.
Background
Along with the improvement of the complexity of application scenes such as high-performance computing, artificial intelligence and storage, in order to improve the resource utilization rate, a server resource management architecture needs to be redesigned on the basis of a traditional server hardware architecture, and in addition, in the use process of the server resource management architecture, devices in a server need to be reset in more scenes. In the related art, the device is reset by turning off the main power supply, which requires a long time and is inefficient.
Therefore, how to improve the efficiency of resetting the device is a technical problem to be solved in the industry.
Disclosure of Invention
The application provides a device resetting method, a device, a storage medium and electronic equipment, which are used for solving the technical problem of how to improve the efficiency of resetting the device in the prior art.
In a first aspect, the present application provides a device reset method, which is applied to a baseboard management controller of any resource pool in a server; the server comprises a plurality of resource pools, each resource pool comprises a complex programmable logic device and a baseboard management controller, and the complex programmable logic device is used for transmitting a reset signal sent by the baseboard management controller of any resource pool to equipment to be reset in each resource pool; the method comprises the following steps:
Receiving a reset command and analyzing the reset command to obtain a reset scene corresponding to the reset command;
determining a reset method corresponding to the reset scene based on the reset scene;
and executing the reset method to determine a target complex programmable logic device and target equipment to be reset, and sending the reset signal to the target complex programmable logic device to reset the target equipment to be reset.
In some embodiments, the respective resource pools include a computing resource pool, a swap resource pool, a storage resource pool, and a heterogeneous acceleration resource pool; the computing resource pool is connected with the storage resource pool and the heterogeneous acceleration resource pool through the exchange resource pool; the computing resource pool comprises a host, wherein the host comprises at least one first device; before receiving the reset command, the method further comprises:
when the server is powered on, sending an energizing signal to the baseboard management controllers of the resource pools, and receiving an energizing completion signal sent by each baseboard management controller;
transmitting a reset signal to the storage resource pool, the heterogeneous acceleration resource pool and the computing resource pool connected with the exchange resource pool;
Resetting the first device, the storage resource pool and the second device of the heterogeneous acceleration resource pool based on the reset signal;
and establishing connection between the switching equipment in the switching resource pool and the first equipment and the second equipment after reset.
In some embodiments, after the connection between the switching device in the switching resource pool and the first device and the second device after the reset, the method further includes:
acquiring connection relations between each switching device in the switching resource pool and the first device and the second device, and acquiring corresponding relations between the host and the second device;
controlling the host to start up, and resetting the second equipment corresponding to the host based on the corresponding relation;
storing the connection relation and the corresponding relation in a baseboard management controller of any resource pool;
sending a limiting instruction to a first baseboard management controller, wherein the limiting instruction is used for limiting the first baseboard management controller to control equipment in a resource pool of the first baseboard management controller;
wherein the first baseboard management controller is a baseboard management controller of a resource pool other than any one of the resource pools.
In some embodiments, the controlling the host to start up and resetting the second device corresponding to the host based on the correspondence relationship includes:
starting the host based on a starting signal;
transmitting a reset signal to the complex programmable logic device of the exchange resource pool under the condition that the reset signal transmitted by the host is monitored;
and based on the complex programmable logic device, sending the reset signal to the second equipment corresponding to the host, controlling the second equipment corresponding to the host to reset, and establishing connection between the host and the second equipment corresponding to the host.
In some embodiments, when the reset scenario is any host restart, and the second device corresponding to the any host is reset, the executing the reset method includes:
determining a host to be restarted, and restarting the host to be restarted;
determining a connection port of a switching device connected with a second device corresponding to the host to be restarted based on the connection relationship and the corresponding relationship;
determining the target complex programmable logic device based on the connection port;
And sending the reset signal to the target complex programmable logic device so as to reset the target device to be reset, wherein the target device to be reset is the second device corresponding to the host to be restarted.
In some embodiments, in a case that the reset scenario is reassigning the second device corresponding to the host, the executing the reset method includes:
determining equipment to be distributed, and taking the equipment to be distributed as the target equipment to be reset;
sending a disconnection command to the switching device, and disconnecting the device to be distributed from the host based on the disconnection command;
sending the reset signal to the target complex programmable logic device so as to reset the equipment to be distributed after disconnection;
establishing connection between the disconnected equipment to be distributed and a new host;
and updating the connection relation and the corresponding relation based on the connection relation between the equipment to be distributed and the new host and the switching equipment and the corresponding relation between the equipment to be distributed and the new host.
In some embodiments, in a case that the reset scenario is that there is an abnormality in any switching device, resetting a device in a resource pool connected to the any switching device, the executing the reset method includes:
Taking the host and the second equipment connected with any switching equipment as target equipment to be reset;
restarting the host connected with any switching device;
sending the reset signal to the target complex programmable logic device so as to reset the target device to be reset;
connecting the target equipment to be reset with other switching equipment except any switching equipment in the switching resource pool;
and updating the connection relation based on the connection result.
In a second aspect, the present application provides an apparatus reset device, which is applied to a baseboard management controller of any resource pool in a server; the server comprises a plurality of resource pools, each resource pool comprises a complex programmable logic device and a baseboard management controller, and the complex programmable logic device is used for transmitting a reset signal sent by the baseboard management controller of any resource pool to equipment to be reset in each resource pool; the baseboard management controller includes:
the receiving module is used for receiving the reset command and analyzing the reset command to obtain a reset scene corresponding to the reset command;
the determining module is used for determining a resetting method corresponding to the resetting scene based on the resetting scene;
And the reset module is used for executing the reset method to determine a target complex programmable logic device and target equipment to be reset, and sending the reset signal to the target complex programmable logic device to reset the target equipment to be reset.
In a third aspect, the present application provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method described above.
In a fourth aspect, the present application provides an electronic device comprising a memory having a computer program stored therein and a processor arranged to implement the above-described method when the program is executed by the computer program.
According to the equipment resetting method, the device, the storage medium and the electronic equipment, the reset signals are transmitted in each resource pool through the complex programmable logic device, so that the baseboard management controller of any resource pool can transmit the reset signals to equipment to be reset in each resource pool through the complex programmable logic device, unified resetting and management of each resource pool can be realized through the baseboard management controller of any resource pool, and the resetting is carried out without manually closing a power supply; the reset scene corresponding to the reset command is determined by the reset scene corresponding to the reset command, so that the reset of the target equipment to be reset in the resource pool can be rapidly realized according to the reset methods corresponding to different reset scenes, and the reset efficiency is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the technical solutions of the present application or the prior art, the following description will briefly introduce the drawings used in the embodiments or the description of the prior art, and it is obvious that, in the following description, the drawings are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a device reset method provided in an embodiment of the present application;
fig. 2 is a schematic structural diagram of a server according to an embodiment of the present application;
FIG. 3 is a second schematic diagram of a server according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a device reset apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like herein are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules that are expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a schematic flow chart of a device reset method according to an embodiment of the present application, as shown in fig. 1, the method includes step 110, step 120, and step 130. The method flow steps are only one possible implementation of the present application.
Step 110, receiving a reset command and analyzing the reset command to obtain a reset scene corresponding to the reset command.
Specifically, the execution body of the device reset method provided in the embodiment of the present application is a device reset apparatus, and the apparatus may be a baseboard management controller (Baseboard Management Controller, BMC) of any resource pool in a server; the server comprises a plurality of resource pools, wherein each resource pool comprises a complex programmable logic device (Complex Programmable Logic Device, CPLD) and a baseboard management controller; and the complex programmable logic device is used for transmitting the reset signal sent by the baseboard management controller of any resource pool to the equipment to be reset in each resource pool. The device to be reset is a device ready for resetting.
The BMC can be connected to the CPLD through an integrated circuit bus (Inter-Integrated Circuit, I2C), and can send a reset signal to the CPLD through an I2C instruction to realize the transmission of the reset signal.
The equipment in each resource pool is connected with the BMC through the CPLD, and the BMC controls the equipment to reset through the CPLD transmitting the reset signal. The CPLD can transmit a reset signal to the devices in each resource pool through a CD-shaped Pluggable (CDFP) port, and after each device receives the reset signal, the reset operation is started after the reset signal is confirmed to be effective, so that the reset of each device is realized. Therefore, the embodiment of the application can reset the devices of other resource pools through unified control of the BMC in any resource pool.
The reset command is a command received by the BMC and used for indicating the BMC control device to reset.
The reset scene is an application scene where the equipment is reset, for example, after a server is powered on, the whole power-on reset of all the equipment is needed, the reset after the equipment is replaced, the equipment reset after PCIe resources are released and redistributed in the actual use process of the server, the reset after a Host (Host) is restarted, the reset of abnormal equipment and nodes when the abnormal equipment is exchanged, and the like.
By analyzing the reset command, a reset scene corresponding to the reset command can be identified.
Step 120, determining a reset method corresponding to the reset scene based on the reset scene.
Specifically, the reset methods in different reset scenes have differences, the reset methods corresponding to each reset scene can be packaged and stored in the BMC, and when the BMC receives a reset command, the corresponding reset method is called according to the reset scene corresponding to the reset command.
The processing interfaces of different reset scenes can be set through an ipmi command or a redfish interface, and different reset methods can be called under different reset scenes.
And 130, executing a reset method to determine the target complex programmable logic device and the target equipment to be reset, and sending a reset signal to the target complex programmable logic device to reset the target equipment to be reset.
Specifically, the reset signal may be a PCIe device reset (Peripheral Component Interconnect Express Reset, PERST) signal, which is also a global reset signal; or a Platform Reset (PLTRST) signal. The kind of the reset signal can be determined according to different reset scenarios.
After the BMC invokes the reset method, the reset method can be analyzed, so that the target CPLD and the target equipment to be reset are determined, and the transmission path of the reset signal can be determined according to the target CPLD and the target equipment to be reset.
The target device to be reset is a device which is ready to be reset in the current reset scene.
The target CPLD is a CPLD directly connected with the target device to be reset.
When the reset method is executed, the BMC sends a reset signal to the CPLD of the resource pool where the BMC is located, if the CPLD is not the target CPLD, the CPLD can send the reset signal to the target CPLD through a CDFP port, the target CPLD transmits the reset signal to the target equipment to be reset, and the target equipment to be reset starts to reset after receiving the reset signal.
According to the equipment resetting method, the reset signals are transmitted in each resource pool through the complex programmable logic device, so that the baseboard management controller of any resource pool can transmit the reset signals to equipment to be reset in each resource pool through the complex programmable logic device, unified resetting and management of each resource pool can be realized through the baseboard management controller of any resource pool, and the resetting is carried out without manually closing a power supply; the reset scene corresponding to the reset command is determined by the reset scene corresponding to the reset command, so that the reset of the target equipment to be reset in the resource pool can be rapidly realized according to the reset methods corresponding to different reset scenes, and the reset efficiency is improved.
It should be noted that each embodiment of the present application may be freely combined, permuted, or executed separately, and does not need to rely on or rely on a fixed execution sequence.
In some embodiments, each resource pool includes a computing resource pool, a swap resource pool, a storage resource pool, and a heterogeneous acceleration resource pool; the computing resource pool is connected with the storage resource pool and the heterogeneous acceleration resource pool through the exchange resource pool; the computing resource pool comprises a host, wherein the host comprises at least one first device; before receiving the reset command, the method further comprises:
when the server is powered on, sending an energizing signal to the baseboard management controllers of the resource pools, and receiving an energizing completion signal sent by each baseboard management controller;
transmitting a reset signal to a storage resource pool, a heterogeneous acceleration resource pool and a computing resource pool which are connected with the exchange resource pool;
resetting the first device, and a second device storing the resource pool and the heterogeneous acceleration resource pool based on the reset signal;
establishing connection between the exchange equipment in the exchange resource pool and the first equipment and the second equipment after reset;
obtaining connection relations between each exchange device in the exchange resource pool and the first device and the second device, and obtaining corresponding relations between a host and the second device;
The control host machine starts up and resets the second equipment corresponding to the host machine based on the corresponding relation;
storing the connection relation and the corresponding relation in a baseboard management controller of any resource pool;
sending a limiting instruction to the first baseboard management controller, wherein the limiting instruction is used for limiting the first baseboard management controller to control equipment in a resource pool of the first baseboard management controller;
wherein the first baseboard management controller is a baseboard management controller of a resource pool except any resource pool;
the control host machine starts up and resets the second equipment corresponding to the host machine based on the corresponding relation, comprising:
starting the host based on the starting signal;
under the condition that a reset signal sent by a host is monitored, sending the reset signal to a complex programmable logic device exchanging a resource pool;
and based on the complex programmable logic device, sending a reset signal to the second equipment corresponding to the host, controlling the second equipment corresponding to the host to reset, and establishing connection between the host and the second equipment corresponding to the host.
Specifically, fig. 2 is one of schematic structural diagrams of a server according to an embodiment of the present application; as shown in fig. 2, the server of the embodiment of the present application includes a computing resource pool, a switching resource pool, a storage resource pool, and a heterogeneous acceleration resource pool. The computing resource pool is connected with the storage resource pool and the heterogeneous acceleration resource pool through the exchange resource pool.
The computing resource pool includes hosts. The device in the host is referred to as a first device, which may include a central processing unit (Central Processing Unit, CPU). The Switch resource pool includes a Switch device, which may be a Switch Board.
Devices in the storage resource pool and the heterogeneous acceleration resource pool are referred to as second devices. The devices in the storage resource pool may include Solid State Disks (SSDs) using flash class storage protocols (Non-Volatile Memory Express, NVME); devices in the heterogeneous acceleration resource pool may include graphics processors (Graphics Processing Unit, GPUs).
After the server is powered on, the BMC and the CPLD in each resource pool are in a power-on state, and the devices in each resource pool are in a power-off state, so that the devices in each resource pool need to be powered on according to the connection information of the BMC and the CPLD in each resource pool.
Taking the BMC with the execution body as the exchange resource pool as an example. The BMC of the exchange resource pool can send out a Power on signal, the Power on signal is sent to the BMCs of the computing resource pool, the storage resource pool and the heterogeneous acceleration resource pool through the network and the acquired connection information, the BMCs of the computing resource pool, the storage resource pool and the heterogeneous acceleration resource pool control the devices in the respective resource pools to be powered on, the GPU, the CPU and the SSD are in a Power on state after the Power on, and the BMCs of the computing resource pool, the storage resource pool and the heterogeneous acceleration resource pool send a Power Good signal to the BMCs of the exchange resource pool.
And after the BMC of the exchange resource pool receives the Power Good signals of other resource pools, powering on the exchange equipment. After the switching device is powered on, because the first device and the second device are both in a powered on state, the switching device can identify the devices in other resource pools, namely, identify the first device and the second device.
Although the switching device may identify devices of other resource pools, the first device and the second device need to be in an initial state before the switching device establishes a connection (Link) with the first device and before the switching device establishes a Link with the second device, so that the first device and the second device need to be reset.
The switching device may send a PERST signal to the CPLD, and after the CPLD receives the PERST signal, send the PERST signal to all CDFP ports of the switching resource pool, and perform first PERST on devices in all resource pools connected to the switching resource pool, so as to ensure that the switching device can Link to the first device and the second device.
The BMC may obtain a connection relationship between each switching device and the first device in the switching resource pool, and a connection relationship between each switching device and the second device. The connection relationship includes a connection relationship between a CDFP Port corresponding to the first device and a Port (Port) number of the switch device, and a connection relationship between a CDFP Port corresponding to the second device and a Port (Port) number of the switch device.
The BMC may then obtain the Host to second device correspondence via an asynchronous transceiver command (Universal Asynchronous Receiver/Transmitter Command, UART CMD). The second equipment distributed by the Host is obtained on which Port numbers of which switching equipment, the second equipment managed by each Host is determined, and the corresponding relation is stored in the BMC.
The connection relationship and the correspondence relationship may be saved in json format. For example, the file of the connection relationship may be CDFP-port. Json; the file of the correspondence may be downStream-upstream.
Taking Port numbers of the switching devices as Si, i=0-40 as an example, the specific contents stored are as follows, wherein characters after Si are numbers identified by the BMC:
FIG. 3 is a second schematic diagram of a server according to an embodiment of the present disclosure; in order to make the internal structure of each device clearer, some devices are omitted in fig. 3, for example, since the BMC of any resource pool can perform reset control on the device of each resource pool, only the BMC of the exchange resource pool is shown in fig. 3. As shown in fig. 3, fig. 3 is an execution body of a BMC that exchanges resource pools.
After the BMC exchanging the resource pool obtains the connection relation and the corresponding relation, notifying all hosts to start. After the Host is started, the Host CPU sends PLTRST signals to the CPLD, and the CPLD sends PERST signals to the CPLD of the exchange resource pool through the CDFP port.
Alternatively, with m Host, the BMC sends m I2C commands to the CPLD, resetting only one Host at a time corresponding to all second devices.
And the BMC exchanging the resource pool monitors the PERST signal sent by the Host CPU and received by the CPLD in real time, and controls the CPLD to send the PERST signal to the second equipment corresponding to the Host.
For example, after Host0 sends a PERST signal, the BMC monitors the PERST signal and notifies the CPLD to reset the second device corresponding to Host0 through the I2C command. The CDFP ports of the storage resource pool and the heterogeneous acceleration resource pool receive the PERST signals transmitted by the CDFP ports of the exchange resource pool, and send the PERST signals to the CPLD, and the CPLD sends the PERST signals to the GPU or the SSD.
I2C commands that the BMC interacts with the CPLD may be defined to implement signal monitoring. For example, the switch resource pool has 40 CDFP ports, thus defining 40 bits corresponding thereto, i.e., 5 bytes. The bit value of the row port changes from 0 to 1, representing that the CDFP port connected to the Host sent PERST to CPLD.
After the BMC monitors the change of the bit value of the CDFP port connected by the Host, the PERST signal of the second device corresponding to the Host is pushed up, namely the BMC sends an I2C command to enable the bit of the CDFP port of the corresponding second device to be 1. The CPLD receives the PERST signal sent by the BMC in real time through the I2C and transmits the PERST signal to the output end, so that the second device is reset. By this reset, a connection can be established between the host and its corresponding second device.
After all the devices in each resource pool are communicated, the BMC exchanging the resource pool needs to send a limiting instruction to the BMCs in other resource pools, and the BMCs in the other resource pools are limited to independently control the power on and power off of the respective devices. The problem that the corresponding Host is down due to the fact that the GPU is powered off due to the fact that the BMC of the heterogeneous acceleration resource pool is in misoperation in the using process can be avoided.
Optionally, the embodiment of the application can be matched with a method for independently controlling the power on and power off of the BMC of each resource pool, and the power on and power off and reset control of each resource can be realized.
According to the device resetting method, the devices in the resource pools are reset twice, so that connection can be established between the devices in the resource pools, the reset of the devices can be controlled through the BMC of any resource pool, and the resetting efficiency is improved.
In some embodiments, step 130 comprises:
and under the condition that the reset scene is any host computer restart, resetting the second equipment corresponding to any host computer, executing a reset method, and comprising the following steps:
determining a host to be restarted, and restarting the host to be restarted;
determining a connection port of the switching equipment connected with the second equipment corresponding to the host to be restarted based on the connection relation and the corresponding relation;
Determining a target complex programmable logic device based on the connection port;
and sending a reset signal to the target complex programmable logic device so as to reset the target equipment to be reset, wherein the target equipment to be reset is second equipment corresponding to the host to be restarted.
Specifically, when the reset scene is reset after the Host in the computing resource pool is restarted, the reset method is as follows:
taking the BMC with the execution body as the exchange resource pool as an example. The BMC exchanging the resource pool determines a to-be-restarted Host, and sends a restarting command to the BMC of the to-be-restarted Host to inform the BMC of the to-be-restarted Host to restart;
the BMC of the exchange resource pool determines the Port number of the exchange equipment connected with the second equipment allocated by the Host to be restarted through the stored connection relation and the corresponding relation; the Port numbers of the second device and the target CPLD are the same, and the target CPLD can be determined according to the Port numbers.
The BMC of the exchange resource pool sends PERST signals to the ports through the CPLD of the exchange resource pool, the PERST signals are transmitted to the target CPLD through the CDFP Port, and reset of the second device corresponding to the Host to be restarted is achieved. And the second equipment corresponding to the Host to be restarted is the target equipment to be reset.
According to the device resetting method, when the host needs to be restarted and reset, the BMC of any resource pool sends the reset signal to control the second device corresponding to the host to reset, so that the connection between the restarted host and the second device corresponding to the host is reestablished, manual restarting and resetting are not needed, and resetting efficiency is improved.
In some embodiments, step 130 comprises:
and executing a reset method under the condition that the reset scene is that the second equipment corresponding to the host is reassigned, wherein the reset method comprises the following steps:
determining equipment to be distributed, and taking the equipment to be distributed as target equipment to be reset;
sending a disconnection command to the switching device, and disconnecting the device to be distributed from the host based on the disconnection command;
sending a reset signal to the target complex programmable logic device so as to reset the equipment to be distributed after disconnection;
the equipment to be distributed after disconnection is connected with a new host;
and updating the connection relation and the corresponding relation based on the connection relation between the equipment to be distributed and the new host and the exchange equipment and the corresponding relation between the equipment to be distributed and the new host.
In particular, the device to be allocated may be a second device that needs to be reallocated. When the second device needs to be reassigned, that is, a part of the GPU or the SSD needs to be reassigned in the use process, for example, the GPU1 on the Host1 is reassigned to the Host0, and the device to be assigned is the GPU1.
The reset method is as follows:
the CPU1 is set as a target reset device.
In order to be able to reset the CPU1 smoothly, all processes on the Host1 need to be ended before reset.
The BMC disconnects the GPU1 from the Host1 and updates the stored corresponding relationship and the connection relationship by sending a disconnection command to the GPU1 from the switching device. The disconnect command may be a UART CMD command. After disconnection, GPU1 is reset.
For example, the BMC exchanging the resource pool thermally removes GPU1 on Host1 by UART CMD command, i.e. set Secondary Bus Reset of Bridge Control Register to 1, resets GPU1 again, and returns the running state to the default value.
If the second device cannot be reset by sending the UART CMD command through the switching device, the BMC of the switching resource pool may notify the target CPLD to pull the PERST pin of the corresponding CDFP port through the I2C command, so as to complete the reset.
After the GPU1 is reset, the BMC exchanging the resource pool redistributes the GPU1 to the Host0 through a UART CMD command, and the GPU1 is connected with the Host0, namely the GPU1 is hot-plugged into the Host0.
And the BMC exchanging the resource pool updates the stored connection relation and the corresponding relation and adds the CPU1 to the second equipment corresponding to the Host0.
For example, the newly generated Link state is confirmed by UART CMD command, and down stream-up stream.
According to the device resetting method provided by the embodiment of the application, under the condition that the second device needs to be reassigned, the devices are reset in a command sending mode, and therefore the assignment efficiency is improved.
In some embodiments, step 130 comprises:
in the case that the resetting scenario is that any switching device has an abnormality, resetting a device in a resource pool connected to any switching device, the executing resetting method includes:
the host and the second equipment connected with any switching equipment are used as target equipment to be reset;
restarting a host connected with any switching device;
sending a reset signal to the target complex programmable logic device so as to reset the target device to be reset;
connecting the target equipment to be reset with other exchange equipment except any exchange equipment in the exchange resource pool;
and updating the connection relation based on the connection result.
Specifically, when there is an abnormality in the switching devices in the switching resource pool, the abnormal switching devices may be individually reset by the BMC. Taking BMC for exchanging resource pools as an execution main body as an example, the reset method comprises the following steps:
the BMC of the exchange resource pool determines abnormal exchange equipment, and can determine a Host and a second equipment connected with the exchange equipment according to the stored connection relation and the corresponding relation. And taking the Host and the second equipment connected with the switching equipment as target equipment to be reset. The CPLD directly connected with the target device to be reset is the target CPLD.
The Host is restarted. After the Host is restarted, the BMC of the exchange resource pool sends a PERST signal to the CPLD of the exchange resource pool, the CPLD sends the PERST signal to the Host connected with the exchange equipment and the CDFP port of the second equipment, and the CDFP port transmits the PERST signal to the target CPLD so that the target equipment to be reset is reset.
After the target equipment to be reset is reset, the target equipment to be reset is redistributed, and the target equipment to be reset can be connected with other exchange equipment except the abnormal exchange equipment in the exchange resource pool.
According to the equipment resetting method, under the condition that any switching equipment is abnormal, the host and the second equipment connected with the switching equipment can be independently reset and redistributed, normal use of other switching equipment, the host and the second equipment cannot be affected, and resetting efficiency is improved.
The device resetting device provided in the embodiments of the present application is described below, and the device resetting device described below and the device resetting method described above may be referred to correspondingly.
Fig. 4 is a schematic structural diagram of a device reset apparatus according to an embodiment of the present application, and as shown in fig. 4, the apparatus includes a receiving module 410, a determining module 420, and a reset module 430.
The receiving module is used for receiving the reset command and analyzing the reset command to obtain a reset scene corresponding to the reset command;
the determining module is used for determining a resetting method corresponding to the resetting scene based on the resetting scene;
and the reset module is used for executing a reset method to determine the target complex programmable logic device and the target equipment to be reset, and sending a reset signal to the target complex programmable logic device to reset the target equipment to be reset.
The device is applied to a baseboard management controller of any resource pool in the server; the server comprises a plurality of resource pools, each resource pool comprises a complex programmable logic device and a baseboard management controller, and the complex programmable logic device is used for transmitting a reset signal sent by the baseboard management controller of any resource pool to equipment to be reset in each resource pool.
Specifically, according to an embodiment of the present application, any of the receiving module, the determining module, and the resetting module may be combined and implemented in one module, or any of the modules may be split into a plurality of modules.
Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module.
According to embodiments of the present application, at least one of the receiving module, determining module and resetting module may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware, such as any other reasonable way of integrating or packaging the circuits, or in any one of or a suitable combination of three of software, hardware and firmware.
Alternatively, at least one of the receiving module, the determining module and the resetting module may be at least partially implemented as a computer program module which, when executed, may perform the respective functions.
According to the equipment resetting device, the reset signals are transmitted in each resource pool through the complex programmable logic device, so that the baseboard management controller of any resource pool can transmit the reset signals to equipment to be reset in each resource pool through the complex programmable logic device, unified resetting and management of each resource pool can be realized through the baseboard management controller of any resource pool, and the resetting is carried out without manually closing a power supply; the reset scene corresponding to the reset command is determined by the reset scene corresponding to the reset command, so that the reset of the target equipment to be reset in the resource pool can be rapidly realized according to the reset methods corresponding to different reset scenes, and the reset efficiency is improved.
In some embodiments, each resource pool includes a computing resource pool, a swap resource pool, a storage resource pool, and a heterogeneous acceleration resource pool; the computing resource pool is connected with the storage resource pool and the heterogeneous acceleration resource pool through the exchange resource pool; the computing resource pool comprises a host, wherein the host comprises at least one first device; the device resetting device further comprises a first connection module, wherein the first connection module is used for:
when the server is powered on, sending an energizing signal to the baseboard management controllers of the resource pools, and receiving an energizing completion signal sent by each baseboard management controller;
transmitting a reset signal to a storage resource pool, a heterogeneous acceleration resource pool and a computing resource pool which are connected with the exchange resource pool;
resetting the first device, and a second device storing the resource pool and the heterogeneous acceleration resource pool based on the reset signal;
and establishing connection between the switching equipment in the switching resource pool and the first equipment and the second equipment after reset.
In some embodiments, the device reset apparatus further comprises a second connection module for:
obtaining connection relations between each exchange device in the exchange resource pool and the first device and the second device, and obtaining corresponding relations between a host and the second device;
The control host machine starts up and resets the second equipment corresponding to the host machine based on the corresponding relation;
storing the connection relation and the corresponding relation in a baseboard management controller of any resource pool;
sending a limiting instruction to the first baseboard management controller, wherein the limiting instruction is used for limiting the first baseboard management controller to control equipment in a resource pool of the first baseboard management controller;
the first baseboard management controller is a baseboard management controller of a resource pool except any resource pool.
In some embodiments, the second connection module is specifically configured to:
starting the host based on the starting signal;
under the condition that a reset signal sent by a host is monitored, sending the reset signal to a complex programmable logic device exchanging a resource pool;
and based on the complex programmable logic device, sending a reset signal to the second equipment corresponding to the host, controlling the second equipment corresponding to the host to reset, and establishing connection between the host and the second equipment corresponding to the host.
In some embodiments, when the reset scenario is any host computer restart, and the second device corresponding to any host computer is reset, the reset module is specifically configured to:
determining a host to be restarted, and restarting the host to be restarted;
Determining a connection port of the switching equipment connected with the second equipment corresponding to the host to be restarted based on the connection relation and the corresponding relation;
determining a target complex programmable logic device based on the connection port;
and sending a reset signal to the target complex programmable logic device so as to reset the target equipment to be reset, wherein the target equipment to be reset is second equipment corresponding to the host to be restarted.
In some embodiments, in the case that the reset scenario is reassigning the second device corresponding to the host, the reset module is specifically configured to:
determining equipment to be distributed, and taking the equipment to be distributed as target equipment to be reset;
sending a disconnection command to the switching device, and disconnecting the device to be distributed from the host based on the disconnection command;
sending a reset signal to the target complex programmable logic device so as to reset the equipment to be distributed after disconnection;
the equipment to be distributed after disconnection is connected with a new host;
and updating the connection relation and the corresponding relation based on the connection relation between the equipment to be distributed and the new host and the exchange equipment and the corresponding relation between the equipment to be distributed and the new host.
In some embodiments, in a case that the reset scenario is that there is an abnormality in any switching device, resetting a device in a resource pool connected to any switching device, the reset module is specifically configured to:
The host and the second equipment connected with any switching equipment are used as target equipment to be reset;
restarting a host connected with any switching device;
sending a reset signal to the target complex programmable logic device so as to reset the target device to be reset;
connecting the target equipment to be reset with other exchange equipment except any exchange equipment in the exchange resource pool;
and updating the connection relation based on the connection result.
It should be noted that, the device reset apparatus provided in this embodiment of the present application can implement all the method steps implemented in the device reset method embodiment, and can achieve the same technical effects, and specific details of the same parts and beneficial effects as those of the method embodiment in this embodiment are not described herein.
Fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application, and as shown in fig. 5, the electronic device may include: processor (Processor) 510, communication interface (Communications Interface) 520, memory (Memory) 530, and communication bus (Communications Bus) 540, wherein Processor 510, communication interface 520, memory 530 complete communication with each other via communication bus 540. Processor 510 may invoke the logic commands in memory 530 to perform the method described above as applied to baseboard management controllers of any one of the resource pools in the server; the server comprises a plurality of resource pools, each resource pool comprises a complex programmable logic device and a baseboard management controller, and the complex programmable logic device is used for transmitting a reset signal sent by the baseboard management controller of any resource pool to equipment to be reset in each resource pool; the method comprises the following steps:
Receiving a reset command and analyzing the reset command to obtain a reset scene corresponding to the reset command;
determining a reset method corresponding to the reset scene based on the reset scene;
and executing a reset method to determine the target complex programmable logic device and the target equipment to be reset, and sending a reset signal to the target complex programmable logic device to reset the target equipment to be reset.
In addition, the logic commands in the memory described above may be implemented in the form of software functional modules and stored in a computer readable storage medium when sold or used as a stand alone product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several commands for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The processor in the electronic device provided by the embodiment of the present application may call the logic instruction in the memory to implement the above method, and the specific implementation manner of the processor is consistent with the implementation manner of the foregoing method, and may achieve the same beneficial effects, which are not described herein again.
The present application also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the methods provided by the above embodiments.
The specific embodiment is consistent with the foregoing method embodiment, and the same beneficial effects can be achieved, and will not be described herein.
Embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. The equipment resetting method is characterized by being applied to a baseboard management controller of any resource pool in a server; the server comprises a plurality of resource pools, each resource pool comprises a complex programmable logic device and a baseboard management controller, and the complex programmable logic device is used for transmitting a reset signal sent by the baseboard management controller of any resource pool to equipment to be reset in each resource pool; the method comprises the following steps:
receiving a reset command and analyzing the reset command to obtain a reset scene corresponding to the reset command;
determining a reset method corresponding to the reset scene based on the reset scene;
and executing the reset method to determine a target complex programmable logic device and target equipment to be reset, and sending the reset signal to the target complex programmable logic device to reset the target equipment to be reset.
2. The device reset method of claim 1, wherein each of the resource pools comprises a computing resource pool, a swap resource pool, a storage resource pool, and a heterogeneous acceleration resource pool; the computing resource pool is connected with the storage resource pool and the heterogeneous acceleration resource pool through the exchange resource pool; the computing resource pool comprises a host, wherein the host comprises at least one first device; before receiving the reset command, the method further comprises:
When the server is powered on, sending an energizing signal to the baseboard management controllers of the resource pools, and receiving an energizing completion signal sent by each baseboard management controller;
transmitting a reset signal to the storage resource pool, the heterogeneous acceleration resource pool and the computing resource pool connected with the exchange resource pool;
resetting the first device, the storage resource pool and the second device of the heterogeneous acceleration resource pool based on the reset signal;
and establishing connection between the switching equipment in the switching resource pool and the first equipment and the second equipment after reset.
3. The device resetting method according to claim 2, wherein after the connection between the switching device in the switching resource pool and the first device and the second device after the resetting, the method further comprises:
acquiring connection relations between each switching device in the switching resource pool and the first device and the second device, and acquiring corresponding relations between the host and the second device;
controlling the host to start up, and resetting the second equipment corresponding to the host based on the corresponding relation;
Storing the connection relation and the corresponding relation in a baseboard management controller of any resource pool;
sending a limiting instruction to a first baseboard management controller, wherein the limiting instruction is used for limiting the first baseboard management controller to control equipment in a resource pool of the first baseboard management controller;
wherein the first baseboard management controller is a baseboard management controller of a resource pool other than any one of the resource pools.
4. The device resetting method as claimed in claim 3, wherein said controlling the host to start up and resetting the second device corresponding to the host based on the correspondence relationship comprises:
starting the host based on a starting signal;
transmitting a reset signal to the complex programmable logic device of the exchange resource pool under the condition that the reset signal transmitted by the host is monitored;
and based on the complex programmable logic device, sending the reset signal to the second equipment corresponding to the host, controlling the second equipment corresponding to the host to reset, and establishing connection between the host and the second equipment corresponding to the host.
5. The device reset method according to claim 3, wherein, in the case that the reset scenario is any host restart, and the second device corresponding to the any host is reset, the executing the reset method includes:
Determining a host to be restarted, and restarting the host to be restarted;
determining a connection port of a switching device connected with a second device corresponding to the host to be restarted based on the connection relationship and the corresponding relationship;
determining the target complex programmable logic device based on the connection port;
and sending the reset signal to the target complex programmable logic device so as to reset the target device to be reset, wherein the target device to be reset is the second device corresponding to the host to be restarted.
6. The device reset method of claim 3, wherein said executing the reset method in the case that the reset scenario is to reassign the second device corresponding to the host comprises:
determining equipment to be distributed, and taking the equipment to be distributed as the target equipment to be reset;
sending a disconnection command to the switching device, and disconnecting the device to be distributed from the host based on the disconnection command;
sending the reset signal to the target complex programmable logic device so as to reset the equipment to be distributed after disconnection;
establishing connection between the disconnected equipment to be distributed and a new host;
And updating the connection relation and the corresponding relation based on the connection relation between the equipment to be distributed and the new host and the switching equipment and the corresponding relation between the equipment to be distributed and the new host.
7. A device reset method according to claim 3, wherein in the case where the reset scenario is that there is an abnormality in any switching device, resetting a device in a resource pool connected to the any switching device, the executing the reset method includes:
taking the host and the second equipment connected with any switching equipment as target equipment to be reset;
restarting the host connected with any switching device;
sending the reset signal to the target complex programmable logic device so as to reset the target device to be reset;
connecting the target equipment to be reset with other switching equipment except any switching equipment in the switching resource pool;
and updating the connection relation based on the connection result.
8. A device reset apparatus, characterized by a baseboard management controller applied to any resource pool in a server; the server comprises a plurality of resource pools, each resource pool comprises a complex programmable logic device and a baseboard management controller, and the complex programmable logic device is used for transmitting a reset signal sent by the baseboard management controller of any resource pool to equipment to be reset in each resource pool; the baseboard management controller includes:
The receiving module is used for receiving the reset command and analyzing the reset command to obtain a reset scene corresponding to the reset command;
the determining module is used for determining a resetting method corresponding to the resetting scene based on the resetting scene;
and the reset module is used for executing the reset method to determine a target complex programmable logic device and target equipment to be reset, and sending the reset signal to the target complex programmable logic device to reset the target equipment to be reset.
9. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the device reset method according to any one of claims 1 to 7.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the device reset method of any of claims 1 to 7 by means of the computer program.
CN202311152696.5A 2023-09-07 2023-09-07 Equipment resetting method and device, storage medium and electronic equipment Pending CN117251039A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311152696.5A CN117251039A (en) 2023-09-07 2023-09-07 Equipment resetting method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311152696.5A CN117251039A (en) 2023-09-07 2023-09-07 Equipment resetting method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117251039A true CN117251039A (en) 2023-12-19

Family

ID=89132248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311152696.5A Pending CN117251039A (en) 2023-09-07 2023-09-07 Equipment resetting method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117251039A (en)

Similar Documents

Publication Publication Date Title
US9872205B2 (en) Method and system for sideband communication architecture for supporting manageability over wireless LAN (WLAN)
US10693813B1 (en) Enabling and disabling links of a networking switch responsive to compute node fitness
CN106603265B (en) Management method, network device, and non-transitory computer-readable medium
US9934187B2 (en) Hot-pluggable computing system
US10515043B2 (en) Smart interface card control method and apparatus through a virtualized management interface
WO2016037503A1 (en) Configuration method and device of pcie topology
US11392417B2 (en) Ultraconverged systems having multiple availability zones
JP2006072591A (en) Virtual computer control method
JP2006201881A (en) Information processing device and system bus control method
CN104899170A (en) Distributed intelligent platform management bus (IPMB) connection method and ATCA (Advanced Telecom Computing Architecture) machine frame
US9779037B2 (en) Establishing connectivity of modular nodes in a pre-boot environment
CN115905094A (en) Electronic equipment and PCIe topology configuration method and device thereof
US10261937B2 (en) Method and system for communication of device information
CN117251039A (en) Equipment resetting method and device, storage medium and electronic equipment
CN112015690A (en) Intelligent device management method and device, network device and readable storage medium
CN115509333A (en) Server collaborative power-on and power-off device, method, system and medium
CN112615739B (en) Method and system for adapting OCP3.0 network card in multi-host application environment
CN109308234B (en) Method for controlling multiple controllers on board card to carry out active/standby switching
US8732331B2 (en) Managing latencies in a multiprocessor interconnect
CN108701117B (en) Interconnection system, interconnection control method and device
US20200209947A1 (en) Information processing system with a plurality of platforms
JP2007094470A (en) Method of hotplugging information processing apparatus
CN117807003A (en) Electronic equipment, processor, data transmission method and device
JP6841876B2 (en) Flexible connection of processor modules
US20230176986A1 (en) USB Terminal Server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination