CN117112296A

CN117112296A - Fault processing method and device for redundant system, electronic equipment and storage medium

Info

Publication number: CN117112296A
Application number: CN202311013917.0A
Authority: CN
Inventors: 张顺顺; 王晓松; 刘振; 徐通
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2023-08-11
Filing date: 2023-08-11
Publication date: 2023-11-24

Abstract

The invention provides a fault processing method, a device, electronic equipment and a storage medium of a redundant system, which are applied to a system management device, wherein the method comprises the following steps: monitoring all processors in the redundant system according to the monitoring link; in response to monitoring a fault processor, determining whether a working server which is working is mounted on a first switch chip corresponding to the fault processor; if yes, a first unloading instruction is sent to the first switch chip, a first mounting instruction is sent to a second switch chip corresponding to the target processor, a restarting instruction is sent to the fault processor in response to the completion of mounting the working server on the second switch chip, a second unloading instruction is sent to the second switch chip in response to the successful restarting of the fault processor, and a second mounting instruction is sent to the first switch chip. When a certain CPU fails, the system service is seamlessly switched to a normal CPU so as to meet the requirement of high reliability of a redundant system.

Description

Fault processing method and device for redundant system, electronic equipment and storage medium

Technical Field

The present invention relates to the field of fault processing technologies, and in particular, to a fault processing method and apparatus for a redundant system, an electronic device, and a storage medium.

Background

With more and more businesses going on a network, the more and more data are needed to be carried by a server, the more data are needed to prove that the risk of bearing is larger, the data with a large size are subjected to continuous interactive calculation every day, the data are lost for a plurality of reasons, and importantly, when a production system fails, the data recovery and the business takeover can be effectively and rapidly carried out, the system is ensured not to stop, and therefore the continuity of the businesses is ensured, which is a problem that every enterprise needs to face. When a server is subject to network attacks, intrusions, power failures, or operational errors, the data deployed by the enterprise on the server will be lost or no longer exist, which is a significant business impact for the enterprise. Therefore, the redundancy of the system has the meaning that when all accidents happen, the original system can be quickly and safely recovered, and the normal operation of the service is ensured in a certain range.

The existing double-path or multi-path server is not truly designed in a redundancy mode, the system is guaranteed not to be powered off only when the main CPU fails, key control rights of the system are switched to the secondary CPU, however, equipment hung under a PCIE data link is offline when the main CPU fails, and related processing operation of a user cannot be completed. The first scheme of the current application is that a CPLD is used for monitoring any CPU module, the abnormality monitoring of any CPU module is realized by a third party CPLD, the CPLD controls an electronic switch, and a management signal link of a management system of the intelligent cabinet is switched to a master management module or a slave management module. However, as long as the main management module fails, the device hung under the module is offline until the module failure is processed, and the user request cannot be operated. And when the master device has no fault, the slave device is always in an idle state, which has negative influence on the densification of the required device, and causes the problem of computing resource waste.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a failure processing method, apparatus, electronic device, and storage medium that can realize a redundant system with high concurrency and high reliability.

In a first aspect, a fault handling method of a redundant system is provided, and the fault handling method is applied to a system management device, and the method includes:

monitoring all processors in the redundant system according to the monitoring link;

in response to monitoring a fault processor, determining whether a working server which is working is mounted on a first switch chip corresponding to the fault processor;

if yes, a first unloading instruction is sent to the first switch chip, and a first mounting instruction is sent to a second switch chip corresponding to a target processor in the redundant system;

responding to the completion of the mounting of the second switch chip on the working server, and sending a restarting instruction to the fault processor;

and responding to the successful restarting of the fault processor, sending a second unloading instruction to the second switch chip and sending a second mounting instruction to the first switch chip so as to mount the working server.

In one embodiment, the monitoring all processors in the redundant system according to the monitoring link includes:

The monitoring link comprises a heartbeat monitoring link, an interrupt alarm link and an abnormal information routing link;

responding to the alarm of one monitoring link corresponding to the alarm processor, and continuously monitoring the alarm processor according to the other two monitoring links;

responding to the two monitoring links corresponding to the alarm processor to alarm, and continuously monitoring the alarm processor according to the other monitoring link;

and determining that the alarm processor fails in response to determining that three monitoring links corresponding to the alarm processor alarm.

In one embodiment, the sending the first offload instruction to the first switch chip and sending the first mount instruction to the second switch chip corresponding to the target processor in the redundant system includes:

determining the number of working servers of the working servers corresponding to the fault processor;

determining a target processor from all the processors according to the number of the working servers;

sending the first unloading instruction to the first switch chip;

responsive to the first switch chip unloading the working server being completed, modifying a register configuration of the first switch chip and sending a first mounting instruction to the second switch chip;

And responding to the completion of the second switch chip mounting of the working server, and sending reset information to the working server.

In one embodiment, the determining the target processor from the all processors according to the number of the working servers includes:

determining the number of idle servers corresponding to each processor;

and determining the processors with the number of the idle servers not smaller than the number of the working servers as the target processors.

In one embodiment, the sending the second offload instruction to the second switch chip and the sending the second mount instruction to the first switch chip include:

sending the second unloading instruction to the second switch chip;

responding to the completion of unloading the working server by the second switch chip, and sending a second mounting instruction to the first switch chip;

and modifying the register configuration and sending the reset information to the working server in response to the completion of the re-mounting of the working server by the first switch chip.

In one embodiment, there is also provided a fault handling method of a redundancy system, applied to a first switch chip, the method including:

Releasing work port resources in the fault processor in response to receiving a first unloading instruction sent by the system management device;

and in response to receiving a second mounting instruction sent by the system management device, reallocating the work port resources to the work server according to the first high-speed serial computer expansion bus between the work port resources and the fault processor.

In one embodiment, there is also provided a fault handling method of a redundant system, applied to a second switch chip, the method including:

in response to receiving a first mounting instruction sent by the system management device, allocating work port resources to the work server according to a second high-speed serial computer expansion bus with the target processor;

and responding to receiving a second unloading instruction sent by the system management device, and releasing the work port resources in the target processor.

In another aspect, there is provided a fault handling apparatus for a redundant system, for use in a system management apparatus, the apparatus comprising:

the monitoring module monitors all processors in the redundant system according to the monitoring link;

the determining module is used for determining whether a working server which works is mounted on a first switch chip corresponding to the fault processor or not in response to the monitoring of the fault processor;

A first sending module, if yes, configured to send a first unloading instruction to the first switch chip and send a first mounting instruction to a second switch chip corresponding to a target processor in the redundant system,

a second sending module, configured to send a restart instruction to the failure processor in response to completion of mounting the second switch chip on the working server,

and the third sending module is used for responding to the restarting success of the fault processor, sending a second unloading instruction to the second switch chip and sending a second mounting instruction to the first switch chip so as to mount the working server.

In one embodiment, the monitoring module monitors all processors in the redundant system according to the monitoring link, including:

In one embodiment, the sending, by the first sending module, the first offload instruction to the first switch chip and the sending, by the first sending module, the first mount instruction to the second switch chip corresponding to the target processor in the redundant system includes:

sending the first unloading instruction to the first switch chip;

In one embodiment, the determining, by the first sending module, the target processor from the all processors according to the number of working servers includes:

determining the number of idle servers corresponding to each processor;

In one embodiment, the sending, by the second sending module, a second offload instruction to the second switch chip and sending, by the first switch chip, a second mount instruction includes:

sending the second unloading instruction to the second switch chip;

In one embodiment, there is also provided a fault handling apparatus of a redundancy system, applied to a first switch chip, the apparatus including:

the first releasing module is used for responding to the first unloading instruction sent by the system management device and releasing the work port resources in the fault processor;

and the first allocation module is used for responding to the second mounting instruction sent by the system management device and reallocating the work port resources to the work server according to the first high-speed serial computer expansion bus between the first mounting instruction and the fault processor.

In one embodiment, there is also provided a fault handling apparatus of a redundancy system, applied to a second switch chip, the apparatus including:

the second allocation module is used for responding to the first mounting instruction sent by the system management device and allocating work port resources to the work server according to the second high-speed serial computer expansion bus with the target processor;

and the second releasing module is used for responding to the second unloading instruction sent by the system management device and releasing the work port resources in the target processor.

In yet another aspect, an electronic device is provided comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

In one embodiment, the processor, when executing the computer program, performs the steps of:

the monitoring of all processors in the redundant system according to the monitoring link comprises:

The sending the first unloading instruction to the first switch chip and sending the first mounting instruction to the second switch chip corresponding to the target processor in the redundant system includes:

sending the first unloading instruction to the first switch chip;

said determining a target processor from said all processors according to said number of working servers comprises:

determining the number of idle servers corresponding to each processor;

the sending the second unloading instruction to the second switch chip and the sending the second mounting instruction to the first switch chip include:

sending the second unloading instruction to the second switch chip;

In yet another aspect, a computer readable storage medium is provided, having stored thereon a computer program which when executed by a processor performs the steps of:

In one embodiment, the computer program when executed by a processor performs the steps of:

Sending the first unloading instruction to the first switch chip;

determining the number of idle servers corresponding to each processor;

sending the second unloading instruction to the second switch chip;

Monitoring all processors in the redundant system according to the monitoring link; in response to monitoring a fault processor, determining whether a working server which is working is mounted on a first switch chip corresponding to the fault processor; if yes, a first unloading instruction is sent to the first switch chip, and a first mounting instruction is sent to a second switch chip corresponding to a target processor in the redundant system, so that the work server is mounted to the target processor; and sending a restarting instruction to the fault processor in response to the completion of the mounting of the working server by the second switch chip, and sending a second unloading instruction to the second switch chip and a second mounting instruction to the first switch chip in response to the successful restarting of the fault processor so as to realize the re-mounting of the working server to the repaired fault processor. The CPU works simultaneously to meet the high concurrent calculation requirement, and meanwhile, under the condition that a certain CPU is down, the system service can be seamlessly switched to another CPU to ensure that the server system can realize the high concurrent data calculation and meet the requirement of high reliability.

Drawings

FIG. 1 is a system topology of a fault handling method for a redundant system;

FIG. 2 is a schematic diagram illustrating steps of a fault handling method for a redundant system of a system management device;

FIG. 3 is a system topology of a multiple switch chip interconnect system;

FIG. 4 is a schematic diagram of a failure handling device of a redundant system applied to a system management device;

fig. 5 is an internal structural diagram of a computer device in an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The system Management Device can be composed of a BMC, an mCPU (Management CPU as a Management center in the server) and a CPLD, wherein after the working state of the CPU module is monitored by the BMC and the fault state is notified to the mCPU through an LPC/IIC (integrated circuit bus) signal, the mCPU can be communicated with PCIE FabricSwitch through UART (universal asynchronous transceiver universal serial data bus) for asynchronous communication, the bidirectional communication of the bus can realize full duplex transmission and reception and the CPU configuration is modified through PCIE links, and after the CPLD is reset through the CPLD, the CPLD is reset, the CPLD is notified to the user after the CPLD is reset through the CPLD, and the CPLD is reset, and the user can finish the fault state notification after the CPLD is reset through the CPLD. When the Device uninstallation-installation needs to be completed again, the mCPU communicates with PCIE FabricSwitch through UART and modifies the register configuration, which will not be described in detail later.

In one embodiment, as shown in fig. 2, the present invention provides a fault handling method of a redundant system, applied to a system management device, the method comprising:

s201, monitoring all processors in a redundant system according to a monitoring link;

s202, determining whether a working server which is working is mounted on a first switch chip corresponding to a fault processor or not in response to monitoring the fault processor;

s203, if yes, a first unloading instruction is sent to the first switch chip, and a first mounting instruction is sent to a second switch chip corresponding to a target processor in the redundant system;

s204, responding to completion of mounting the working server on the second switch chip, and sending a restarting instruction to the fault processor;

s205, responding to the restarting success of the fault processor, sending a second unloading instruction to the second switch chip and sending a second mounting instruction to the first switch chip so as to mount the working server.

Specifically, in the redundant system, the redundant system may include a plurality of CPUs and PCIE FabricSwitch corresponding to each CPU, where in a normal working state, the CPUs are interconnected with devices through PCIE links and PCIE FabricSwitch and perform tasks such as related data processing, control management, and high performance computing, and since PCIE FabricSwitch uses and accesses computing resources in the CPUs, the devices mounted on PCIE FabricSwitch are equivalent to those mounted on the corresponding CPUs. The CPU is respectively provided with 1 path of heartbeat monitoring (mcpu_heart error), 1 path of abnormal interrupt alarming SMI_GPIO and 1 path of abnormal information routing MDI signal which are connected with the processing device, the processing device monitors the working state of each CPU in real time through a heartbeat monitoring link, an interrupt alarming link and an abnormal information routing link, when the processing device monitors a certain path of CPU fault, whether the fault CPU mounts a working device which works under the condition that the fault CPU is firstly determined, if the mounted device is idle, the fault CPU is idle, a restarting instruction is directly sent to the fault CPU at the moment, and the fault CPU is restarted and can not influence the mounted device. If the working server equipment is mounted, equipment switching needs to be completed, and meanwhile, after the fault CPU is restarted and normal work is successfully recovered, the server equipment is switched back. Therefore, when no fault exists, the multi-path CPU works simultaneously to provide high calculation power to meet high concurrency, and when a certain path of CPU breaks down, the non-inductive switching can be performed, so that the normal work of the downlink equipment is ensured.

Specifically, the CPU is respectively provided with 1-path heartbeat monitoring (mcpu_heartbeat error), 1-path abort alarm smi_gpio and 1-path abort information routing MDI signals, and the processing device is connected to the processing device, and monitors the working states of the CPUs in real time through the heartbeat monitoring link, the abort alarm link and the abort information routing link, monitors the working states of the CPU modules, and in order to prevent erroneous judgment caused by the single link being interfered, the processing device waits for three-path monitoring feedback signals of the CPU, and only when the three-path monitoring feedback signals (MDI, mcpu_ heartError, SMI _gpio) are all alarmed, the processing device can determine that the CPU is in a fault state.

sending the first unloading instruction to the first switch chip;

Specifically, assuming that CPU0 fails, the processing Device sends an offload instruction to PCIE FabricSwitch0, and then changes PCIE FabricSwitch the internal register configuration, so that PCIE FabricSwitch allows three server devices mounted under itself to be mounted under PCIE FabricSwitch1 (second switch chip) at the same time when the ports are interconnected, and at this time, three devices originally mounted under CPU0 are mounted under CPU 1. After the unloading-loading operation is completed, the processing Device sends a PERST signal to the corresponding working server Device, and after the Device is reset, the devices are all loaded under the CPU1 and work normally. As shown in the drawing, the liquid crystal display device,

determining the number of idle servers corresponding to each processor;

Specifically, in the redundant system, since the redundant system includes a plurality of CPUs and PCIE FabricSwitch corresponding to each CPU, that is, a switch chip, when the CPU0 fails, it is necessary to determine that the target CPU, for example, PCIE FabricSwitch0 corresponding to the failed CPU0, has four devices mounted thereon, wherein only three of the devices are working, that is, the number of working servers is 3, and at this time, the number of PCIE FabricSwitch mounted on the target CPU that needs to be determined is also not less than 3, and the more the number of idle servers, it is stated that the CPU corresponding to PCIE FabricSwitch has sufficient resources allocated to the three working servers of the failed CPU0, so when selecting the target CPU, the CPU corresponding to PCIE FabricSwitch of the target CPU can be selected to have the largest number of mounted servers and the largest number of mounted idle servers. The more servers that can be mounted indicate that the CPU performance is strongest, the more idle servers that are currently mounted indicate that they can allocate more processor resources.

sending the second unloading instruction to the second switch chip;

Specifically, the processing device restarts the failed CPU0 after completing the mounting of the working server device, and when the failed CPU0 is restarted successfully, the upper layer user device is notified to complete the restart. During the restart of the CPU0, all devices work normally through the CPU1, and business processing is not affected. When the fault module is successfully restarted, the PCIE FabricSwitch0 and the CPU0 are successfully reconnected, an unloading instruction is sent to PCIE FabricSwitch1, PCIE FabricSwitch1 releases port resources, so that three devices mounted under the CPU1 are unloaded, then a mounting instruction is sent to PCIE FabricSwitch0, at this time PCIE FabricSwitch0 allocates task resources in the CPU0 to the currently working devices through a PCIE connection line with the CPU0, then the register configuration of the original Fabricswitch0 is modified back, namely, a server hung under the CPU0 is not allowed to be mounted under the CPU1 through port interconnection between the Fabricswitch0 and the Fabricswitch1, then reset information is sent to the devices, and the devices are formally started to work after reset.

Specifically, when the first unloading instruction sent by the processing Device is received, the PCIE FabricSwitch0 releases the work port resources corresponding to the three ports S4, S5 and S6, so as to unload the three devices mounted under the CPU 0. Then, after the processing device modifies its own register configuration, the device originally mounted on the CPU0 is mounted on the CPU1 through port communication with the PCIE FabricSwitch 1. Then, when receiving the second mounting instruction sent by the processing device, at this time, PCIE FabricSwitch again allocates the work port resource in CPU0 to the work device through the PCIE line successfully connected with CPU0 again.

Specifically, as described above, when the first mount instruction sent by the processing apparatus is received, the PCIE FabricSwitch1 mounts the device originally mounted on the CPU0 onto the CPU1 through the port communication with PCIE FabricSwitch and the PCIE link between itself and the CPU1, so that the work port resources in the CPU1 can be allocated to the three work devices. When receiving the second offload instruction sent by the processing apparatus, PCIE FabricSwitch1 re-releases the work port resources originally allocated to the three work devices, thereby offloading the three work devices.

Fig. 3 is a topology diagram of interconnection of multiple switch chips, and SW0, 2, 4, and 6 in column a are regarded as uplink SW in a 2×4 topology; the SW1, 3, 5, 7 of column B is regarded as a downstream SW in the 2×4 topology, and through interconnection of multiple switch chips, more upstream host can be connected and more downstream devices can be connected. And then redundant backup and switching can be performed between host and device, and the method can be applied to a cluster server or a data center to improve the stability and efficiency of the cluster server or the data center.

The scheme of the application has the following beneficial effects:

1) The device is not divided into a master device and a slave device under the current redundant system, so that the device can work simultaneously to meet the requirement of high concurrency calculation, and meanwhile, under the condition that a certain CPU is down, the system service can be seamlessly switched to another CPU to ensure that the server system can realize the high concurrency data calculation and can meet the requirement of high reliability;

2) When a master device such as a CPU fails, a slave device such as a device is always in an idle state, which has negative influence on the densification of the required device, and the problem of computing resource waste can be effectively solved.

It should be understood that, although the steps in the flowchart of fig. 2 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

In one embodiment, as shown in fig. 4, a fault handling apparatus of a redundant system is applied to a system management apparatus, the apparatus includes:

the monitoring module 401 monitors all processors in the redundant system according to the monitoring link;

a determining module 402, configured to determine, in response to monitoring a fault processor, whether a working server that is working is mounted on a first switch chip corresponding to the fault processor;

the first sending module 403 is configured to send a first unloading instruction to the first switch chip and send a first mounting instruction to a second switch chip corresponding to the target processor in the redundant system if the first unloading instruction is received;

a second sending module 404, configured to send a restart instruction to the failure processor in response to the second switch chip mounting the working server being completed;

and the third sending module 405 is configured to send a second unloading instruction to the second switch chip and send a second mount instruction to the first switch chip in response to the restart success of the fault processor, so as to mount the working server.

sending the first unloading instruction to the first switch chip;

determining the number of idle servers corresponding to each processor;

sending the second unloading instruction to the second switch chip;

For specific limitations on the fault handling means of the redundant system, reference may be made to the above limitation on the fault handling method of the redundant system, and no further description is given here. The respective modules in the fault handling apparatus of the redundant system described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements an alert information processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in FIG. 5 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, an electronic device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of:

if so, a first unloading instruction is sent to the first switch chip, a first mounting instruction is sent to a second switch chip corresponding to a target processor in the redundant system,

responsive to the second switch chip mounting the working server being completed, sending a restart instruction to the failure processor,

And responding to the successful restarting of the fault processor, sending a second unloading instruction to the second switch chip and sending a second mounting instruction to the first switch chip.

sending the first unloading instruction to the first switch chip;

determining the number of idle servers corresponding to each processor;

Sending the second unloading instruction to the second switch chip;

In one embodiment, a computer readable storage medium is provided having stored thereon a computer program which when executed by a processor performs the steps of:

sending the first unloading instruction to the first switch chip;

determining the number of idle servers corresponding to each processor;

sending the second unloading instruction to the second switch chip;

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A fault handling method for a redundant system, applied to a system management device, the method comprising:

2. The method of claim 1, wherein monitoring all processors in the redundant system based on the monitoring link comprises:

3. The method of claim 1, wherein the sending a first offload instruction to the first switch chip and a first mount instruction to a second switch chip corresponding to a target processor in the redundant system comprises:

sending the first unloading instruction to the first switch chip;

4. A method according to claim 3, wherein said determining a target processor from said all processors based on said number of working servers comprises:

determining the number of idle servers corresponding to each processor;

5. The method of claim 3, wherein the sending a second offload instruction to the second switch chip and the first switch chip sending a second mount instruction comprises:

Sending the second unloading instruction to the second switch chip;

6. A fault handling method for a redundant system, applied to a first switch chip, the method comprising:

7. A fault handling method of a redundant system, applied to a second switch chip, the method comprising:

8. A fault handling device for a redundant system, for use in a system management device, the device comprising:

the first sending module is used for sending a first unloading instruction to the first switch chip and sending a first mounting instruction to a second switch chip corresponding to a target processor in the redundant system if the first unloading instruction is received;

the second sending module is used for responding to the completion of the mounting of the second switch chip on the working server and sending a restarting instruction to the fault processor;

9. An electronic device, comprising:

One or more processors; and a memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the method of any of claims 1-7.

10. A computer storage medium, characterized in that it has stored thereon a computer program, wherein the program, when executed by a processor, implements the method according to any of claims 1-7.